python spark kmeans demo
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python spark kmeans demo相关的知识,希望对你有一定的参考价值。
官方的demo
from numpy import array from math import sqrt from pyspark import SparkContext from pyspark.mllib.clustering import KMeans, KMeansModel sc = SparkContext(appName="clusteringExample") # Load and parse the data data = sc.textFile("/root/spark-2.1.1-bin-hadoop2.6/data/mllib/kmeans_data.txt") parsedData = data.map(lambda line: array([float(x) for x in line.split(‘ ‘)])) # Build the model (cluster the data) clusters = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random") # Evaluate clustering by computing Within Set Sum of Squared Errors def error(point): center = clusters.centers[clusters.predict(point)] return sqrt(sum([x**2 for x in (point - center)])) WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y) print("Within Set Sum of Squared Error = " + str(WSSSE)) # Save and load model #clusters.save(sc, "target/org/apache/spark/PythonKMeansExample/KMeansModel") #sameModel = KMeansModel.load(sc, "target/org/apache/spark/PythonKMeansExample/KMeansModel")
以上是关于python spark kmeans demo的主要内容,如果未能解决你的问题,请参考以下文章
Spark::KMeans 调用 takeSample() 两次?