python spark kmeans demo

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python spark kmeans demo相关的知识,希望对你有一定的参考价值。

官方的demo

from numpy import array
from math import sqrt

from pyspark import SparkContext

from pyspark.mllib.clustering import KMeans, KMeansModel

sc = SparkContext(appName="clusteringExample")
# Load and parse the data
data = sc.textFile("/root/spark-2.1.1-bin-hadoop2.6/data/mllib/kmeans_data.txt")
parsedData = data.map(lambda line: array([float(x) for x in line.split( )]))

# Build the model (cluster the data)
clusters = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random")

# Evaluate clustering by computing Within Set Sum of Squared Errors
def error(point):
    center = clusters.centers[clusters.predict(point)]
    return sqrt(sum([x**2 for x in (point - center)]))

WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
print("Within Set Sum of Squared Error = " + str(WSSSE))

# Save and load model
#clusters.save(sc, "target/org/apache/spark/PythonKMeansExample/KMeansModel")
#sameModel = KMeansModel.load(sc, "target/org/apache/spark/PythonKMeansExample/KMeansModel")

 

以上是关于python spark kmeans demo的主要内容,如果未能解决你的问题,请参考以下文章

在 Spark 上训练 Kmeans 算法失败

Spark::KMeans 调用 takeSample() 两次?

使用 Spark MLlib KMeans 从数据中预测集群

如何设置 Spark Kmeans 初始中心

spark.mllib源码阅读-聚类算法1-KMeans

spark.mllib源码阅读-聚类算法1-KMeans