K-means 聚类:根据聚类中数据点的最大大小确定最佳质心数
Posted
技术标签:
【中文标题】K-means 聚类:根据聚类中数据点的最大大小确定最佳质心数【英文标题】:K-means Clustering: Determine optimal number of centroids based on maximum size of data point in a cluster 【发布时间】:2019-01-25 17:27:07 【问题描述】:我需要找到一组 96 条线的最佳位置和质心数。线条由 (x,y,z) 中两组起点和终点坐标之间的距离表示。
每个质心不得超过 10 行。质心坐标的 z 分量也需要设置为零。我的代码如下所示返回质心位置并为每个质心分配标签(即线),但有些质心需要超过 10 个,这不是我们想要的。
以下是我尝试的代码的代码和解决方案:
为了解决这个问题,我使用了 python 2.7 中 Scikit 包中提供的 k-mean 聚类。
直线在x,y,z的起点坐标如下图所示:
Start=[[ 3.08534969e+02 3.25666185e+03 -2.20581200e+00]
[ 3.09533447e+02 3.25965789e+03 -2.17015040e+00]
[ 3.08584658e+02 3.25954029e+03 -2.19941120e+00]
[ 3.07585711e+02 3.25667680e+03 -2.12869760e+00]
[ 3.11562101e+02 3.25982736e+03 -1.94063600e+00]
[ 3.11859006e+02 3.25702733e+03 -7.63193600e-01]
[ 3.08145280e+02 3.25572271e+03 -2.19514400e+00]
[ 3.08377680e+02 3.25856225e+03 -2.16009200e+00]
[ 3.10441338e+02 3.25829875e+03 -2.63984720e+00]
[ 3.10367962e+02 3.25389346e+03 -2.11223840e+00]
[ 3.10137910e+02 3.25741340e+03 -2.46794000e+00]
[ 3.09537281e+02 3.25892338e+03 -2.28932720e+00]
[ 3.09344406e+02 3.25543844e+03 -2.40515120e+00]
[ 3.10450645e+02 3.25749069e+03 -2.33870480e+00]
[ 3.10430410e+02 3.25851708e+03 -2.33870480e+00]
[ 3.07783327e+02 3.25704679e+03 -1.12438160e+00]
[ 3.12930506e+02 3.25741935e+03 -1.11980960e+00]
[ 3.12705650e+02 3.25455548e+03 -1.10548400e+00]
[ 3.11434863e+02 3.26262004e+03 -2.07596720e+00]
[ 3.09685407e+02 3.26189987e+03 -2.38137680e+00]
[ 3.08528861e+02 3.26162489e+03 -2.13967040e+00]
[ 3.11014726e+02 3.26106711e+03 -2.12564960e+00]
[ 3.09174753e+02 3.26233603e+03 -2.18020880e+00]
[ 3.12973636e+02 3.26208751e+03 -2.17898960e+00]
[ 3.13502547e+02 3.26047741e+03 -2.09882720e+00]
[ 3.13546812e+02 3.26050827e+03 -2.13052640e+00]
[ 3.09981100e+02 3.26250428e+03 -2.15552000e+00]
[ 3.10161877e+02 3.26164522e+03 -2.20215440e+00]
[ 3.13995539e+02 3.26372098e+03 -2.16801680e+00]
[ 3.11358342e+02 3.26370646e+03 -1.36791680e+00]
[ 3.10769621e+02 3.25945517e+03 -2.30974880e+00]
[ 3.09862741e+02 3.26403583e+03 -2.28201200e+00]
[ 3.08458161e+02 3.25817564e+03 -2.15552000e+00]
[ 3.09435541e+02 3.25825584e+03 -2.28719360e+00]
[ 3.08634345e+02 3.25732574e+03 -1.27922000e+00]
[ 3.09930506e+02 3.25441935e+03 -1.11980960e+00]
[ 3.09705650e+02 3.25755548e+03 -1.10548400e+00]
[ 3.15314631e+02 3.27539678e+03 -2.22013760e+00]
[ 3.14496076e+02 3.27450937e+03 -1.77543440e+00]
[ 3.19773285e+02 3.27483172e+03 -2.36888000e+00]
[ 3.20160936e+02 3.27174365e+03 -1.46758640e+00]
[ 3.17632108e+02 3.27333074e+03 -1.54531040e+00]
[ 3.20633806e+02 3.27121131e+03 -1.73763920e+00]
[ 3.15134081e+02 3.26918375e+03 -1.91716640e+00]
[ 3.14833043e+02 3.27344651e+03 -2.14759520e+00]
[ 3.15386873e+02 3.27457986e+03 -2.80382960e+00]
[ 3.15389504e+02 3.27056319e+03 -3.79961120e+00]
[ 3.09928864e+02 3.25862972e+03 -2.28262160e+00]
[ 3.12981100e+02 3.25850428e+03 -2.15552000e+00]
[ 3.13161877e+02 3.26164522e+03 -2.20215440e+00]
[ 3.09995539e+02 3.26272098e+03 -2.16801680e+00]
[ 3.08358342e+02 3.26470646e+03 -1.36791680e+00]
[ 3.10769621e+02 3.26145517e+03 -2.30974880e+00]
[ 3.09862741e+02 3.26403583e+03 -2.28201200e+00]
[ 3.09458161e+02 3.25917564e+03 -2.15552000e+00]
[ 3.09435541e+02 3.25825584e+03 -2.28719360e+00]
[ 3.06297341e+02 3.25738852e+03 -1.09207280e+00]
[ 3.08746854e+02 3.25800712e+03 -2.22013760e+00]
[ 3.07296526e+02 3.25420303e+03 -2.36095520e+00]
[ 3.09912102e+02 3.25860281e+03 -1.15181360e+00]
[ 3.07497596e+02 3.25625977e+03 -1.15181360e+00]
[ 3.07817203e+02 3.25477321e+03 -2.68404320e+00]
[ 3.05817919e+02 3.25780380e+03 -2.70659840e+00]
[ 3.06587855e+02 3.25373025e+03 -7.75385600e-01]
[ 3.05538448e+02 3.25854594e+03 -2.20611680e+00]
[ 3.06360464e+02 3.25706283e+03 -7.85748800e-01]
[ 3.10994093e+02 3.25396928e+03 -1.83639440e+00]
[ 3.06823412e+02 3.25784599e+03 -1.82328800e+00]
[ 3.04608725e+02 3.25721917e+03 -1.70502560e+00]
[ 3.09470284e+02 3.25645552e+03 -2.23994960e+00]
[ 3.19844699e+02 3.27955840e+03 -3.10040000e+00]
[ 3.19818319e+02 3.27689903e+03 -2.35577360e+00]
[ 3.19928996e+02 3.27901237e+03 -1.79250320e+00]
[ 3.16798820e+02 3.28194784e+03 -2.20855520e+00]
[ 3.15507571e+02 3.27824837e+03 -2.18600000e+00]
[ 3.19601448e+02 3.27198482e+03 -1.69771040e+00]
[ 3.05286609e+02 3.25691985e+03 -2.66697440e+00]
[ 3.01580345e+02 3.25622715e+03 -2.66209760e+00]
[ 3.02511985e+02 3.25794144e+03 -2.67855680e+00]
[ 3.01371515e+02 3.25342679e+03 -1.84249040e+00]
[ 3.00572376e+02 3.25445421e+03 -2.70446480e+00]
[ 3.01669594e+02 3.25676087e+03 -2.26220000e+00]
[ 3.02519127e+02 3.25346957e+03 -2.26220000e+00]
[ 3.05003400e+02 3.25737359e+03 -2.24726480e+00]
[ 3.04039856e+02 3.25785807e+03 -2.60967200e+00]
[ 3.05039933e+02 3.25685696e+03 -2.65173440e+00]
[ 3.04015052e+02 3.25476038e+03 -2.23476800e+00]
[ 3.05000290e+02 3.25470957e+03 -2.26037120e+00]
[ 3.01464646e+02 3.25791521e+03 -2.59290800e+00]
[ 3.04415035e+02 3.25351240e+03 -1.48800800e+00]
[ 3.05436737e+02 3.25702401e+03 -2.25915200e+00]
[ 3.02497966e+02 3.25370431e+03 -2.87149520e+00]
[ 3.05248617e+02 3.25693437e+03 -2.21648000e+00]
[ 3.07342867e+02 3.25343450e+03 -1.94216000e+00]
[ 3.06608725e+02 3.25721917e+03 -1.70502560e+00]
[ 3.06470284e+02 3.25445552e+03 -2.23994960e+00]]
x,y,z中的终点坐标如下图:
End=[[ 3.10533447e+02 3.25765789e+03 -2.27015040e+00]
[ 3.10584658e+02 3.25754029e+03 -2.29941120e+00]
[ 3.09585711e+02 3.25767680e+03 -2.22869760e+00]
[ 3.10562101e+02 3.25782736e+03 -2.04063600e+00]
[ 3.09859006e+02 3.25502733e+03 -8.63193600e-01]
[ 3.10145280e+02 3.25772271e+03 -2.29514400e+00]
[ 3.09377680e+02 3.25656225e+03 -2.26009200e+00]
[ 3.09441338e+02 3.25729875e+03 -2.73984720e+00]
[ 3.09367962e+02 3.25589346e+03 -2.21223840e+00]
[ 3.12137910e+02 3.25641340e+03 -2.56794000e+00]
[ 3.11537281e+02 3.25692338e+03 -2.38932720e+00]
[ 3.10344406e+02 3.25743844e+03 -2.50515120e+00]
[ 3.12450645e+02 3.25649069e+03 -2.43870480e+00]
[ 3.12430410e+02 3.25651708e+03 -2.43870480e+00]
[ 3.09783327e+02 3.25504679e+03 -1.22438160e+00]
[ 3.10930506e+02 3.25541935e+03 -1.21980960e+00]
[ 3.10705650e+02 3.25555548e+03 -1.20548400e+00]
[ 3.10434863e+02 3.26062004e+03 -2.17596720e+00]
[ 3.10685407e+02 3.26089987e+03 -2.48137680e+00]
[ 3.10528861e+02 3.26062489e+03 -2.23967040e+00]
[ 3.13014726e+02 3.26306711e+03 -2.22564960e+00]
[ 3.11174753e+02 3.26033603e+03 -2.28020880e+00]
[ 3.10973636e+02 3.26008751e+03 -2.27898960e+00]
[ 3.12502547e+02 3.26147741e+03 -2.19882720e+00]
[ 3.12546812e+02 3.26150827e+03 -2.23052640e+00]
[ 3.11981100e+02 3.26050428e+03 -2.25552000e+00]
[ 3.11161877e+02 3.26064522e+03 -2.30215440e+00]
[ 3.11995539e+02 3.26172098e+03 -2.26801680e+00]
[ 3.10358342e+02 3.26270646e+03 -1.46791680e+00]
[ 3.09769621e+02 3.26045517e+03 -2.40974880e+00]
[ 3.11862741e+02 3.26203583e+03 -2.38201200e+00]
[ 3.10458161e+02 3.26017564e+03 -2.25552000e+00]
[ 3.10435541e+02 3.26025584e+03 -2.38719360e+00]
[ 3.10634345e+02 3.25532574e+03 -1.37922000e+00]
[ 3.10930506e+02 3.25541935e+03 -1.21980960e+00]
[ 3.10705650e+02 3.25555548e+03 -1.20548400e+00]
[ 3.17314631e+02 3.27439678e+03 -2.32013760e+00]
[ 3.16496076e+02 3.27350937e+03 -1.87543440e+00]
[ 3.18773285e+02 3.27683172e+03 -2.46888000e+00]
[ 3.18160936e+02 3.27274365e+03 -1.56758640e+00]
[ 3.15632108e+02 3.27233074e+03 -1.64531040e+00]
[ 3.18633806e+02 3.27221131e+03 -1.83763920e+00]
[ 3.17134081e+02 3.27118375e+03 -2.01716640e+00]
[ 3.16833043e+02 3.27144651e+03 -2.24759520e+00]
[ 3.13386873e+02 3.27257986e+03 -2.90382960e+00]
[ 3.13389504e+02 3.27256319e+03 -3.89961120e+00]
[ 3.10928864e+02 3.26062972e+03 -2.38262160e+00]
[ 3.11981100e+02 3.26050428e+03 -2.25552000e+00]
[ 3.11161877e+02 3.26064522e+03 -2.30215440e+00]
[ 3.11995539e+02 3.26172098e+03 -2.26801680e+00]
[ 3.10358342e+02 3.26270646e+03 -1.46791680e+00]
[ 3.09769621e+02 3.26045517e+03 -2.40974880e+00]
[ 3.11862741e+02 3.26203583e+03 -2.38201200e+00]
[ 3.10458161e+02 3.26017564e+03 -2.25552000e+00]
[ 3.10435541e+02 3.26025584e+03 -2.38719360e+00]
[ 3.07297341e+02 3.25538852e+03 -1.19207280e+00]
[ 3.06746854e+02 3.25700712e+03 -2.32013760e+00]
[ 3.08296526e+02 3.25520303e+03 -2.46095520e+00]
[ 3.08912102e+02 3.25660281e+03 -1.25181360e+00]
[ 3.05497596e+02 3.25525977e+03 -1.25181360e+00]
[ 3.06817203e+02 3.25677321e+03 -2.78404320e+00]
[ 3.06817919e+02 3.25680380e+03 -2.80659840e+00]
[ 3.07587855e+02 3.25573025e+03 -8.75385600e-01]
[ 3.07538448e+02 3.25654594e+03 -2.30611680e+00]
[ 3.07360464e+02 3.25606283e+03 -8.85748800e-01]
[ 3.08994093e+02 3.25596928e+03 -1.93639440e+00]
[ 3.08823412e+02 3.25584599e+03 -1.92328800e+00]
[ 3.05608725e+02 3.25521917e+03 -1.80502560e+00]
[ 3.08470284e+02 3.25545552e+03 -2.33994960e+00]
[ 3.17844699e+02 3.28055840e+03 -3.20040000e+00]
[ 3.18818319e+02 3.27789903e+03 -2.45577360e+00]
[ 3.17928996e+02 3.28001237e+03 -1.89250320e+00]
[ 3.18798820e+02 3.27994784e+03 -2.30855520e+00]
[ 3.17507571e+02 3.28024837e+03 -2.28600000e+00]
[ 3.17601448e+02 3.27398482e+03 -1.79771040e+00]
[ 3.03286609e+02 3.25591985e+03 -2.76697440e+00]
[ 3.03580345e+02 3.25522715e+03 -2.76209760e+00]
[ 3.03511985e+02 3.25594144e+03 -2.77855680e+00]
[ 3.03371515e+02 3.25542679e+03 -1.94249040e+00]
[ 3.01572376e+02 3.25545421e+03 -2.80446480e+00]
[ 3.03669594e+02 3.25576087e+03 -2.36220000e+00]
[ 3.03519127e+02 3.25546957e+03 -2.36220000e+00]
[ 3.04003400e+02 3.25537359e+03 -2.34726480e+00]
[ 3.03039856e+02 3.25585807e+03 -2.70967200e+00]
[ 3.03039933e+02 3.25585696e+03 -2.75173440e+00]
[ 3.03015052e+02 3.25576038e+03 -2.33476800e+00]
[ 3.04000290e+02 3.25570957e+03 -2.36037120e+00]
[ 3.03464646e+02 3.25591521e+03 -2.69290800e+00]
[ 3.03415035e+02 3.25551240e+03 -1.58800800e+00]
[ 3.03436737e+02 3.25602401e+03 -2.35915200e+00]
[ 3.03497966e+02 3.25570431e+03 -2.97149520e+00]
[ 3.03248617e+02 3.25593437e+03 -2.31648000e+00]
[ 3.08342867e+02 3.25543450e+03 -2.04216000e+00]
[ 3.05608725e+02 3.25521917e+03 -1.80502560e+00]
[ 3.08470284e+02 3.25545552e+03 -2.33994960e+00]]
我尝试的代码如下所示:
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
numberOfLines = len(Start)
approximateLinesPerCentroid = (numberOfLines/10) +1
kmeans = KMeans(n_clusters=approximateLinesPerCentroid, random_state=0).fit(Start)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for n in range(approximateLinesPerCentroid):
kmeans.cluster_centers_[n][2]=0
for n in range(approximateLinesPerCentroid):
ax.scatter(kmeans.cluster_centers_[n][0], kmeans.cluster_centers_[n][1], kmeans.cluster_centers_[n][2],color='green', marker='^',s=60)
for line in range(numberOfLines):
ax.plot([Start[line][0],End[line][0]], [Start[line][1],End[line][1]], [Start[line][2],End[line][2]], color='grey')
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
for line in range(numberOfLines):
centroid = kmeans.labels_[line]
CentroidToLabelMap[centroid].append(line)
print CentroidToLabelMap
plt.show()
附件是我尝试的代码解决方案的图片:Side View
Top View
代码中的CentroidToLabelMap定义为将每一行分配给列表格式的列表中的质心。每个列表表示附加到其质心的线。可以看出,一些质心超过 10 条线。
[[0, 1, 2, 3, 4, 6, 8, 10, 11, 12, 13, 14], [37, 38, 40, 41, 42, 43, 44, 75], [76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92], [21, 24, 25, 28, 29, 31, 50, 51, 53], [5, 15, 16, 17, 34, 35, 36], [39, 70, 71, 72, 73, 74], [56, 57, 60, 61, 62, 63, 64, 65, 68, 94], [18, 19, 20, 22, 23, 26, 27, 30, 32, 33, 47, 48, 49, 52, 54, 55], [45, 46], [7, 9, 58, 59, 66, 67, 69, 93, 95]]
【问题讨论】:
【参考方案1】:常规 KMeans 不能这样工作。不能对每个集群中的点数做出任何保证或限制。 See this relevant SO question
但是,有一个“变体”可以以这种方式工作。 See this。不过我之前没试过。 据我所知,目前没有 scikit-learn 算法可以做到这一点。
【讨论】:
以上是关于K-means 聚类:根据聚类中数据点的最大大小确定最佳质心数的主要内容,如果未能解决你的问题,请参考以下文章