KMeans
Posted Thank CAT
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了KMeans相关的知识,希望对你有一定的参考价值。
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
x, y = make_blobs(n_samples=500, n_features=2, centers=4, random_state=1)
color = ["red", "pink", "orange", "gray"]
fig, ax1 = plt.subplots(1)
for i in range(4):
ax1.scatter(x[y == i, 0], x[y == i, 1], marker="o", s=8, c=color[i])
plt.show()
from sklearn.cluster import KMeans
n_clusters = 3
cluster = KMeans(n_clusters=n_clusters, n_init="auto", random_state=1).fit(x)
# 聚类预测结果
y_predict = cluster.labels_
y_predict
array([2, 2, 0, 1, 0, 1, 0, 0, 0, 0, 2, 2, 0, 1, 0, 2, 0, 2, 1, 0, 0, 0,
0, 1, 0, 0, 1, 1, 0, 0, 2, 1, 0, 2, 0, 2, 0, 0, 2, 0, 0, 0, 1, 0,
0, 2, 0, 0, 1, 1, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
2, 0, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1,
0, 0, 1, 2, 0, 0, 1, 2, 2, 0, 2, 1, 1, 2, 1, 0, 1, 0, 0, 1, 1, 0,
0, 2, 1, 0, 1, 0, 1, 0, 1, 0, 0, 2, 2, 0, 0, 0, 1, 2, 2, 0, 1, 0,
0, 0, 0, 2, 1, 0, 1, 1, 0, 2, 0, 1, 1, 1, 0, 0, 2, 2, 0, 0, 1, 2,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 1, 2, 0, 0, 2, 1, 0,
0, 0, 0, 2, 0, 0, 1, 2, 2, 0, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 0, 2,
2, 1, 2, 0, 1, 0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 1,
0, 2, 0, 0, 0, 0, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1, 0, 2, 1, 2, 0, 0,
2, 2, 2, 2, 0, 0, 2, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0,
1, 0, 2, 2, 0, 0, 0, 0, 1, 1, 0, 1, 0, 2, 1, 2, 1, 2, 2, 1, 2, 1,
1, 0, 0, 0, 0, 0, 0, 0, 2, 1, 2, 2, 2, 0, 0, 0, 2, 0, 2, 2, 0, 2,
2, 0, 1, 2, 0, 0, 1, 1, 0, 2, 1, 1, 0, 2, 1, 1, 0, 0, 1, 0, 0, 2,
2, 1, 0, 2, 0, 1, 1, 0, 0, 0, 2, 0, 1, 1, 0, 1, 1, 1, 1, 2, 2, 0,
1, 0, 0, 2, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 2, 1, 2, 2, 2, 2, 2,
2, 0, 2, 1, 2, 1, 1, 0, 1, 0, 0, 0, 2, 1, 0, 1, 0, 2, 0, 0, 2, 0,
0, 1, 1, 2, 0, 0, 1, 0, 0, 2, 2, 0, 2, 0, 0, 2, 0, 2, 0, 1, 2, 1,
0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 0, 0, 0, 0, 2, 1, 2, 0, 1, 2, 2, 2,
0, 1, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 0, 1, 0,
1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 2, 0, 1, 0, 2, 1, 2, 1, 2, 0, 1, 1,
2, 0, 0, 2, 0, 0, 0, 2, 0, 1, 0, 0, 2, 2, 2, 0], dtype=int32)
# 质心的位置
centroid = cluster.cluster_centers_
centroid
array([[-8.0807047 , -3.50729701],
[-1.54234022, 4.43517599],
[-7.11207261, -8.09458846]])
color = ["red", "pink", "orange", "gray"]
fig, ax1 = plt.subplots(1)
for i in range(n_clusters):
ax1.scatter(x[y_predict == i, 0], x[y_predict == i, 1], marker="o", s=8, c=color[i])
ax1.scatter(centroid[:, 0], centroid[:, 1], marker="x", s=100, c="black")
plt.show()
cluster.inertia_
1903.5607664611762
n_clusters = 4
cluster = KMeans(n_clusters=n_clusters, n_init="auto", random_state=1).fit(x)
cluster.inertia_
908.3855684760615
n_clusters = 100
cluster = KMeans(n_clusters=n_clusters, n_init="auto", random_state=1).fit(x)
cluster.inertia_
34.70849858088455
# 轮廓系数
from sklearn.metrics import silhouette_score
from sklearn.metrics import silhouette_samples
silhouette_score(x, y_predict)
0.5882004012129721
silhouette_score(x, cluster.labels_)
0.3626791469009942
silhouette_samples(x, y_predict)
array([ 0.62982017, 0.5034877 , 0.56148795, 0.84881844, 0.56034142,
0.78740319, 0.39254042, 0.4424015 , 0.48582704, 0.41586457,
0.62497924, 0.75540751, 0.50080674, 0.8452256 , 0.54730432,
0.60232423, 0.54574988, 0.68789747, 0.86605921, 0.25389678,
0.49316173, 0.47993065, 0.2222642 , 0.8096265 , 0.54091189,
0.30638567, 0.88557311, 0.84050532, 0.52855895, 0.49260117,
0.65291019, 0.85602282, 0.47734375, 0.60418857, 0.44210292,
0.6835351 , 0.44776257, 0.423086 , 0.6350923 , 0.4060121 ,
0.54540657, 0.5628461 , 0.78366733, 0.37063114, 0.35132112,
0.74493029, 0.53691616, 0.36724842, 0.87717083, 0.79594363,
0.84641859, 0.38341344, 0.42043012, 0.4024608 , 0.64639537,
0.46244151, 0.31853572, 0.10047008, 0.37909034, 0.56424494,
0.86153448, 0.82630007, 0.53288582, 0.35699772, 0.86994617,
0.52259763, 0.71296285, 0.5269434 , 0.42375504, 0.3173951 ,
0.67512993, 0.47574584, 0.44493897, 0.70152025, 0.37911024,
0.44338293, 0.75528756, 0.23339973, 0.48832955, 0.36920643,
0.84872127, 0.87346766, 0.53069113, 0.85553096, 0.85764386,
0.47306874, 0.02036611, 0.83126042, 0.38759022, 0.49233068,
0.74566044, 0.60466216, 0.56741342, 0.43416703, 0.83602352,
0.72477786, 0.65632253, 0.53058775, 0.60023269, 0.77641023,
0.84703763, 0.70993659, 0.7801523 , 0.46161604, 0.84373446,
0.39295281, 0.46052385, 0.88273449, 0.87440032, 0.48304623,
0.53380475, 0.75891465, 0.85876382, 0.38558097, 0.85795763,
0.39785899, 0.85219954, 0.53642823, 0.86038619, 0.43699704,
0.38829633, 0.54291415, 0.69030671, 0.43887074, 0.51384962,
0.51912781, 0.83667847, 0.76248539, 0.69612144, 0.51530997,
0.86167552, 0.55346107, 0.56205672, 0.49273512, 0.38805592,
0.57038854, 0.68677314, 0.20332654, 0.75659329, 0.82280178,
0.51078711, 0.56655943, 0.39855324, 0.87777997, 0.81846156,
0.85011915, 0.53745726, 0.48476499, 0.57083761, 0.62520973,
0.48791422, 0.57163867, 0.80710385, 0.75753237, 0.80107683,
0.50370862, 0.49411065, 0.56270422, 0.46054445, 0.46870708,
0.53443711, 0.52806612, 0.54696216, 0.38036632, 0.8439417 ,
0.43517732, 0.74914748, 0.64728736, 0.41663216, 0.8823285 ,
0.65599758, 0.56449485, 0.51988053, 0.62928512, 0.88015404,
0.56872777, 0.39189978, 0.49345531, 0.46686063, 0.59723997,
0.44721036, 0.30721342, 0.75113026, 0.50932716, 0.73578982,
-0.11420488, 0.41858652, 0.75882296, 0.7275962 , -0.04073665,
0.80153593, 0.87004395, 0.68206941, 0.43331808, 0.46482802,
0.84659276, 0.50866477, 0.68601103, 0.74449975, 0.83022338,
0.73707965, 0.27681202, 0.66098479, 0.28977719, 0.51863521,
0.63445046, 0.40559979, 0.14818081, 0.76068525, 0.23252498,
0.53021521, 0.47737535, 0.20930573, 0.73655361, 0.40050939,
0.38201296, 0.53131423, 0.8300432 , 0.57416668, 0.83002234,
0.43809863, 0.72601129, 0.30355831, 0.36933954, 0.48245049,
0.50126688, 0.50360422, 0.87011861, 0.56950365, 0.83076761,
0.71764725, 0.53645163, 0.7001754 , 0.50522187, 0.87888555,
0.77936165, 0.10535855, 0.73083257, 0.87808798, 0.66433392,
0.46478475, 0.37703473, 0.73374533, 0.74890043, 0.73918627,
0.63932594, 0.09590229, 0.56398421, 0.65471361, 0.32850826,
0.50686886, 0.82252268, 0.8784639 , 0.50307722, 0.55480534,
0.87909816, 0.47641098, 0.31311959, 0.52686075, 0.88545307,
0.20448704, 0.80778118, 0.44642434, 0.40574811, 0.88056023,
0.4973487 , 0.69311101, 0.72625355, 0.48589387, 0.4978385 ,
0.55313636, 0.50253656, 0.87260952, 0.86131163, 0.40383223,
0.86877735, 0.47545049, 0.55504965, 0.88434796, 0.70495153,
0.88081422, 0.73413228, 0.74319485, 0.86247661, 0.68152552,
0.87029291, 0.81761732, 0.55085702, 0.49102505, 0.55389601,
0.124766 , 0.4404892 , 0.53977082, 0.57674226, 0.52475521,
0.71693971, 0.59037229, 0.27134864, 0.55075649, 0.5305809 ,
0.45997724, 0.52098416, 0.69242901, 0.42370109, 0.55411474,
0.56138849, 0.53447704, 0.69329183, 0.54368936, 0.32886853,
0.86126399, 0.71469113, 0.49146367, 0.50494774, 0.82158862,
0.86861319, 0.54403438, 0.73940315, 0.81462808, 0.84352203,
0.48207009, 0.7354327 , 0.78085872, 0.87875202, 0.04033208,
0.50804578, 0.80938918, 0.51061604, 0.38053425, 0.64455589,
0.67957545, 0.87709406, 0.54770971, 0.49617626, 0.06631062,
0.82052164, 0.85247897, 0.4986702 , 0.41583248, 0.53794955,
0.73049329, 0.28601778, 0.87874615, 0.86432778, 0.53085921,
0.81504707, 0.80902757, 0.73654387, 0.79629133, 0.69825831,
0.71042076, 0.37753505, 0.87392688, 0.36052199, 0.53293388,
0.65652301, 0.8590337 , 0.37778142, 0.88171647, 0.55744616,
0.72988524, 0.47205379, 0.25321102, 0.36665898, 0.87510459,
0.54567292, 0.4377203 , 0.69836179, 0.88279947, 0.73712769,
0.7571288 , 0.64200399, 0.71414246, 0.66105524, 0.64924985,
-0.03393189, 0.67879166, 0.87717775, 0.70483203, 0.81570721,
0.88445546, 0.42536337, 0.84352976, 0.19940384, 0.33446675,
-0.05200008, 0.63729057, 0.86077417, 0.29232998, 0.85936207,
0.01230106, 0.74072871, 0.54572786, 0.4226642 , 0.75803727,
0.41490286, 0.47701084, 0.81796862, 0.80656788, 0.63246787,
0.43149716, 0.47554846, 0.67481449, 0.29491288, 0.47884262,
0.73531065, 0.74909774, 0.53905722, 0.60853703, 0.41799506,
0.26889856, 0.65941878, 0.57469934, 0.74695893, 0.53566443,
0.87031783, 0.55546256, 0.74959292, 0.52013136, 0.48602131,
0.84252024, 0.5553399 , 0.32396765, 0.83121787, 0.6507822 ,
0.40589711, 0.81861161, 0.85537229, 0.51500612, 0.46370284,
0.35233694, 0.41423309, 0.66647621, 0.87838551, 0.55564776,
0.52172866, 0.80216634, 0.74626963, 0.70305507, 0.727976 ,
0.4315848 , 0.71546113, -0.14042082, 0.70475791, 0.54510442,
0.49963818, 0.50497552, 0.5260391 , 0.7371355 , 0.39249758,
0.47181954, 0.51361169, 0.4902578 , 0.42402416, 0.54710266,
0.42517899, 0.54612333, 0.40920498, 0.73864644, 0.5056526 ,
0.87463183, 0.41531738, 0.88324604, 0.4574416 , 0.50326717,
0.56519891, 0.86397315, 0.84031419, 0.81795975, 0.55956891,
0.43032946, 0.28423933, 0.75002919, 0.53694244, 0.86418082,
0.50509088, 0.75702551, 0.85123063, 0.47073065, 0.85904201,
0.69214588, 0.32746785, 0.87507056, 0.77556871, 0.47820639,
0.37692453, 0.23345891, 0.46482472, 0.36325517, 0.17966353,
0.31925836, 0.67652463, 0.35889712, 0.87965911, 0.3907438 ,
0.5748237 , 0.74655924, 0.57403918, 0.69733646, 0.52992071])
from sklearn.metrics import calinski_harabasz_score
calinski_harabasz_score(x, y_predict)
1809.991966958033
from time import time
now = time()
calinski_harabasz_score(x, y_predict)
time() - now
0.0034482479095458984
now = time()
silhouette_score(x, y_predict)
time() - now
0.008353948593139648
import datetime
datetime.datetime.fromtimestamp(time()).strftime(r"%Y-%m-%d %H:%M:%S")
\'2023-04-21 00:14:24\'
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
for n_clusters in [2, 3, 4, 5, 6, 7]:
n_clusters = n_clusters
# 设置画布和子画布
fig, (ax1, ax2) = plt.subplots(1, 2)
# 设置画布尺寸
fig.set_size_inches(18, 7)
# 设置子ax1的X轴刻度
ax1.set_xlim([-0.1, 1])
# 设置子ax2的Y轴刻度 0 ——(500 + (2 + 1)* 10)
ax1.set_ylim([0, x.shape[0] + (n_clusters + 1) * 10])
# 实例化KMeans
clusterer = KMeans(n_clusters=n_clusters, n_init="auto", random_state=10).fit(x)
# 每个样本点对应的标签
cluster_labels = clusterer.labels_
# 计算轮廓系数的均值
silhouette_avg = silhouette_score(x, cluster_labels)
print(
"For n_clusters =",
n_clusters,
"The average silhouette_score is :",
silhouette_avg,
)
# 计算数据集中每个样本自己的轮廓系数
sample_silhouette_values = silhouette_samples(x, cluster_labels)
# 为了不让图形紧贴X轴
y_lower = 10
for i in range(n_clusters):
# 取出每个样本对应标签 i 的数组,并进行排序
ith_cluster_silhouette_values = sample_silhouette_values[cluster_labels == i]
ith_cluster_silhouette_values.sort()
# 取出每个样本对应标签 i 的数组的 总记录数
size_cluster_i = ith_cluster_silhouette_values.shape[0]
# 10 + 每个样本对应标签 i 的数组的 总记录数
y_upper = y_lower + size_cluster_i
# 随机颜色
color = cm.nipy_spectral(float(i) / n_clusters)
ax1.fill_betweenx(
np.arange(y_lower, y_upper), # X轴
ith_cluster_silhouette_values, # Y轴
facecolor=color,
alpha=0.7, # 透明度
)
# Y轴上的标签
ax1.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
# 更新下一个簇的位置
y_lower = y_upper + 10
# 设置标签
ax1.set_title("The silhouette plot for the various clusters.")
ax1.set_xlabel("The silhouette coefficient values")
ax1.set_ylabel("Cluster label")
# 画出平均线
ax1.axvline(x=silhouette_avg, color="red", line)
# 清空Y轴坐标
ax1.set_yticks([])
ax1.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])
colors = cm.nipy_spectral(cluster_labels.astype(float) / n_clusters)
# 画第二个画布的散点图
ax2.scatter(x[:, 0], x[:, 1], marker="o", s=8, c=colors)
# 画出质心
centers = clusterer.cluster_centers_
ax2.scatter(centers[:, 0], centers[:, 1], marker="x", c="red", alpha=1, s=200)
ax2.set_title("The visualization of the clustered data.")
ax2.set_xlabel("Feature space for the 1st feature")
ax2.set_ylabel("Feature space for the 2nd feature")
plt.suptitle(
(
"Silhouette analysis for KMeans clustering on sample data "
"with n_clusters = %d" % n_clusters
),
fontsize=14,
fontweight="bold",
)
plt.show()
For n_clusters = 2 The average silhouette_score is : 0.7049787496083262
For n_clusters = 3 The average silhouette_score is : 0.5882004012129721
For n_clusters = 4 The average silhouette_score is : 0.6505186632729437
For n_clusters = 5 The average silhouette_score is : 0.5662344175321901
For n_clusters = 6 The average silhouette_score is : 0.4358297989156284
For n_clusters = 7 The average silhouette_score is : 0.3685767770971513
重要参数 init & random_state & n_init
x
array([[-6.92324165e+00, -1.06695320e+01],
[-8.63062033e+00, -7.13940564e+00],
[-9.63048069e+00, -2.72044935e+00],
[-2.30647659e+00, 5.30797676e+00],
[-7.57005366e+00, -3.01446491e+00],
[-1.00051011e+00, 2.77905153e+00],
[-4.81826839e+00, -2.77214822e+00],
[-5.33964799e+00, -1.27625764e+00],
[-7.94308840e+00, -3.89993901e+00],
[-5.54924525e+00, -3.41298968e+00],
[-5.14508990e+00, -9.54492198e+00],
[-7.09669936e+00, -8.04074036e+00],
[-5.82641512e+00, -1.96346196e+00],
[-1.83198811e+00, 3.52863145e+00],
[-7.34267235e+00, -3.16546482e+00],
[-7.34072825e+00, -6.92427252e+00],
[-7.94653906e+00, -3.36768655e+00],
[-8.24598536e+00, -8.61315821e+00],
[-1.98197711e+00, 4.02243551e+00],
[-4.35098035e+00, -3.69476678e+00],
[-1.04768696e+01, -3.60318139e+00],
[-1.10195984e+01, -3.15882031e+00],
[-5.17255904e+00, -4.31835971e+00],
[-2.40671820e+00, 6.09894447e+00],
[-6.72149498e+00, -2.88440806e+00],
[-6.58935963e+00, -4.43379548e+00],
[-1.46126019e+00, 4.52549851e+00],
[-9.19003455e-01, 3.45278927e+00],
[-1.04093517e+01, -2.67482046e+00],
[-6.36722809e+00, -3.32666072e+00],
[-6.72766125e+00, -7.14516267e+00],
[-2.27956075e+00, 5.10452190e+00],
[-5.84887560e+00, -3.03970506e+00],
[-6.07993051e+00, -7.08197568e+00],
[-5.26682929e+00, -2.69645055e+00],
[-6.05367512e+00, -9.62979077e+00],
[-1.00822205e+01, -4.25071043e+00],
[-1.18708735e+01, -3.03273343e+00],
[-5.37107307e+00, -7.95635833e+00],
[-9.37590900e+00, -4.55315308e+00],
[-6.63401987e+00, -2.58340356e+00],
[-9.54609655e+00, -2.84917422e+00],
[-1.69825542e+00, 2.79071751e+00],
[-5.60217602e+00, -6.59908490e-01],
[-6.03429022e+00, -4.08821196e+00],
[-6.37230784e+00, -8.63190046e+00],
[-1.02264783e+01, -2.33998717e+00],
[-5.95678148e+00, -3.97905701e+00],
[-1.42706535e+00, 5.08904128e+00],
[-6.20735304e-01, 6.59346952e+00],
[-3.28102793e-01, 4.11918201e+00],
[-1.06230545e+01, -4.54719161e+00],
[-9.12674270e+00, -4.46180568e+00],
[-5.24134497e+00, -3.23505873e+00],
[-7.19967531e+00, -7.10400981e+00],
[-1.01136977e+01, -4.12880752e+00],
[-1.03416132e+01, -4.95351774e+00],
[-1.25041532e+01, -6.06751247e+00],
[-9.32331640e+00, -4.67574045e+00],
[-7.32033002e+00, -2.73350095e+00],
[-2.20533407e+00, 4.20765201e+00],
[-5.27930518e-01, 5.92630669e+00],
[-8.87430034e+00, -3.64808151e+00],
[-6.66948545e+00, -4.26059884e+00],
[-1.37397258e+00, 5.29163103e+00],
[-6.60085708e+00, -3.11969688e+00],
[-7.99175412e+00, -8.33564851e+00],
[-6.22447869e+00, -2.43846224e+00],
[-1.11054250e+01, -3.97106687e+00],
[-8.95762335e+00, -4.87178859e+00],
[-6.65461644e+00, -7.29335713e+00],
[-1.09531378e+01, -3.36743812e+00],
[-5.11351008e+00, -2.01881992e+00],
[-7.24251438e+00, -9.66368448e+00],
[-5.34929456e+00, -3.54577332e+00],
[-6.29261332e+00, -3.68892426e+00],
[-7.35387953e+00, -8.54504434e+00],
[-5.78423473e+00, -4.48406848e+00],
[-5.66256325e+00, -2.34390092e+00],
[-8.60893311e+00, -4.61469279e+00],
[-2.52019906e-01, 4.53559145e+00],
[-1.92744799e+00, 4.93684534e+00],
[-9.41306589e+00, -3.62907430e+00],
[-8.48608233e-01, 5.45093196e+00],
[-8.66753040e-01, 3.78295914e+00],
[-1.01842915e+01, -4.01017303e+00],
[-7.93192918e+00, -5.42450547e+00],
[-2.75447175e+00, 4.57587230e+00],
[-1.17171070e+01, -3.89622755e+00],
[-8.85081213e+00, -4.00305113e+00],
[-1.34392496e+00, 2.38428865e+00],
[-8.16203654e+00, -7.31459336e+00],
[-9.18886814e+00, -2.16359386e+00],
[-7.13229260e+00, -4.02296730e+00],
[-4.26103071e-02, 4.90923075e+00],
[-7.24449448e+00, -7.65150300e+00],
[-8.13784646e+00, -7.65806949e+00],
[-6.73451345e+00, -1.38330194e+00],
[-8.96369424e+00, -9.27033880e+00],
[ 8.68765801e-01, 4.15785509e+00],
[-5.45176929e-01, 3.81996593e+00],
[-8.01694428e+00, -8.67137366e+00],
[-3.33375571e+00, 5.23151969e+00],
[-1.14385885e+01, -2.72109548e+00],
[-2.52087627e+00, 5.08120139e+00],
[-6.84394443e+00, -4.15058222e+00],
[-5.87619738e+00, -3.28078916e+00],
[-1.21819546e+00, 4.30633464e+00],
[-2.00341358e+00, 4.45008673e+00],
[-1.01077040e+01, -3.94479960e+00],
[-7.03045854e+00, -1.23734756e+00],
[-6.95685137e+00, -8.12381049e+00],
[-2.33022219e+00, 4.78405366e+00],
[-9.98435983e+00, -4.64804214e+00],
[-2.33080604e+00, 4.39382527e+00],
[-1.07796242e+01, -4.39085753e+00],
[-2.03484486e+00, 3.76775946e+00],
[-7.16744245e+00, -3.24998378e+00],
[-4.99221336e-01, 4.77598259e+00],
[-5.76681144e+00, -3.41281779e+00],
[-1.06990569e+01, -4.49057157e+00],
[-7.28729621e+00, -6.68306776e+00],
[-8.17831829e+00, -8.22063813e+00],
[-9.14443128e+00, -4.36637786e+00],
[-7.22323543e+00, -3.51226376e+00],
[-9.71296439e+00, -3.69088110e+00],
[-3.19091528e-02, 4.74450157e+00],
[-7.10406044e+00, -8.38198228e+00],
[-7.52482501e+00, -7.50887444e+00],
[-6.31161343e+00, -2.97641697e+00],
[-5.38142198e-01, 4.81539041e+00],
[-9.58041050e+00, -3.16857790e+00],
[-9.53106924e+00, -2.91966168e+00],
[-1.07650223e+01, -3.27877784e+00],
[-9.54658956e+00, -4.64826945e+00],
[-7.39393373e+00, -6.80612264e+00],
[-2.99151157e+00, 2.64580131e+00],
[-5.67558254e+00, -4.55902255e+00],
[-3.51754177e+00, 5.64265390e+00],
[-9.98539618e-01, 6.19864808e+00],
[-5.96497901e+00, -2.03746469e+00],
[-8.85279507e+00, -7.79138079e+00],
[-4.64310426e+00, -2.22789422e+00],
[-1.35938959e+00, 4.05424002e+00],
[-5.25790464e-01, 3.30659860e+00],
[-1.15637509e+00, 5.69971575e+00],
[-6.42530010e+00, -2.17328619e+00],
[-5.70183305e+00, -2.63083838e+00],
[-6.04632971e+00, -6.92266990e+00],
[-8.14559288e+00, -7.42775410e+00],
[-9.15685095e+00, -4.05623576e+00],
[-9.16170778e+00, -2.40998944e+00],
[-1.46864442e+00, 6.50674501e+00],
[-6.74672798e+00, -8.17245974e+00],
[-1.98605940e+00, 3.06381408e+00],
[-1.03289957e+01, -3.56680940e+00],
[-9.34313235e+00, -4.00453699e+00],
[-9.55954616e+00, -2.83102023e+00],
[-1.01659113e+01, -4.12752889e+00],
[-9.84144865e+00, -4.14356957e+00],
[-1.02768102e+01, -2.33049946e+00],
[-1.01030572e+01, -3.32315288e+00],
[-9.90228742e+00, -3.03189848e+00],
[-9.72121320e+00, -4.68662015e+00],
[-1.85139546e+00, 3.51886090e+00],
[-6.69321189e+00, -6.30021862e+00],
[-6.53371839e+00, -8.14922726e+00],
[-8.46369500e+00, -8.07146029e+00],
[-5.75004528e+00, -3.56590967e+00],
[-1.17104176e+00, 4.33091816e+00],
[-8.52628579e+00, -8.66957601e+00],
[-9.23890684e+00, -3.06843973e+00],
[-6.12803051e+00, -2.51698058e+00],
[-8.10406451e+00, -7.42020487e+00],
[-1.61589091e+00, 4.18017563e+00],
[-8.98758533e+00, -3.03333061e+00],
[-1.19410359e+01, -3.60085418e+00],
[-1.04399418e+01, -3.62982119e+00],
[-1.14242679e+01, -2.18538860e+00],
[-9.00992914e+00, -9.06865247e+00],
[-6.47435649e+00, -3.74338863e+00],
[-9.63138049e+00, -4.99793793e+00],
[ 5.26015501e-01, 3.00999353e+00],
[-9.76324393e+00, -9.36656623e+00],
[-6.27965526e+00, -8.81809587e+00],
[-9.46883276e+00, -6.19043506e+00],
[-5.77336618e+00, -3.56739953e+00],
[-6.69242533e+00, -8.30171791e+00],
[-7.44439970e+00, -9.16803180e+00],
[-7.11478469e+00, -5.38699134e+00],
[-3.85803976e-01, 6.37359162e+00],
[-2.00454712e+00, 4.17565013e+00],
[-5.75517628e+00, -9.30821074e+00],
[-9.14168421e+00, -7.20572694e+00],
[-5.92092535e+00, -3.27574048e+00],
[-2.35122066e+00, 4.00973634e+00],
[-5.91907851e+00, -2.23919861e+00],
[-5.62200526e+00, -8.69290967e+00],
[-7.54246304e+00, -8.12722811e+00],
[-2.41395785e+00, 5.65935802e+00],
[-6.37151596e+00, -8.91129543e+00],
[-1.21401792e+01, -4.78351741e+00],
[-4.45264491e+00, 6.34401868e+00],
[-5.59698820e+00, -4.19535853e+00],
[-6.07503622e+00, -2.15606405e+00],
[-7.24828238e+00, -7.05222790e+00],
[-4.77891101e+00, -2.41333165e+00],
[-1.24112155e+01, -5.73091492e+00],
[-6.75264349e+00, -8.34654975e+00],
[-5.05492139e+00, -4.22257749e+00],
[-1.03825448e+01, -2.49524031e+00],
[-7.22570502e+00, -3.79313579e+00],
[-1.19498178e+01, -5.35567769e+00],
[-7.62867092e+00, -8.06354170e+00],
[-4.61767113e+00, -1.67111145e+00],
[-5.12219664e+00, -3.31302123e+00],
[-6.29225072e+00, -2.35738294e+00],
[ 2.42271161e-04, 5.14853403e+00],
[-8.79988166e+00, -2.24875438e+00],
[-2.77687025e+00, 4.64090557e+00],
[-6.39694979e+00, -3.76963703e+00],
[-6.92263081e+00, -7.63972262e+00],
[-1.15768688e+01, -4.78197653e+00],
[-5.66824737e+00, -3.82607509e+00],
[-1.11578826e+01, -2.60324173e+00],
[-1.04730854e+01, -3.47573837e+00],
[-9.98118494e+00, -3.77616083e+00],
[-1.04102078e+00, 3.96331794e+00],
[-9.32856015e+00, -2.60893309e+00],
[-1.13898357e+00, 3.26214848e+00],
[-6.17905638e+00, -7.96336646e+00],
[-1.02356544e+01, -2.79806066e+00],
[-5.77133256e+00, -8.59222577e+00],
[-9.14500844e+00, -3.91798845e+00],
[-1.61734616e+00, 4.98930508e+00],
[-2.77867530e+00, 6.36256877e+00],
[-9.54642849e+00, -5.63740853e+00],
[-6.91486590e+00, -7.68969378e+00],
[-1.84612968e+00, 4.30474400e+00],
[-5.52834586e+00, -8.15360311e+00],
[-6.00915337e+00, -3.34925152e+00],
[-8.54628324e+00, -4.57138540e+00],
[-7.31655639e+00, -7.77051293e+00],
[-7.20423399e+00, -8.88176559e+00],
[-7.55600732e+00, -8.01885499e+00],
[-5.67856792e+00, -7.60509852e+00],
[-5.21446826e+00, -4.79995312e+00],
[-9.37662980e+00, -2.99722684e+00],
[-5.31844709e+00, -8.92829839e+00],
[-1.08278844e+01, -4.83392615e+00],
[-6.06569910e+00, -1.53376946e+00],
[-2.34673261e+00, 3.56128423e+00],
[-1.25606826e+00, 5.00006839e+00],
[-5.83979745e+00, -2.17836186e+00],
[-6.87088211e+00, -2.22716236e+00],
[-1.79600465e+00, 4.28743568e+00],
[-9.37972697e+00, -4.13752487e+00],
[-7.23605937e+00, -4.54710992e+00],
[-1.02794488e+01, -1.89699302e+00],
[-1.41689046e+00, 4.60832005e+00],
[-5.78045412e+00, -4.58297922e+00],
[ 8.52518583e-02, 3.64528297e+00],
[-9.20268641e+00, -4.32778687e+00],
[-9.56818636e+00, -4.56034695e+00],
[-1.16434858e+00, 4.23178671e+00],
[-6.16345851e+00, -3.10830802e+00],
[-6.32152564e+00, -9.66280079e+00],
[-7.52099974e+00, -9.13311836e+00],
[-9.22029330e+00, -4.07211972e+00],
[-1.08491682e+01, -2.95246712e+00],
[-9.86366431e+00, -2.75129369e+00],
[-6.79715224e+00, -3.45804136e+00],
[-9.79490066e-01, 4.08668827e+00],
[-2.06043810e+00, 5.23049549e+00],
[-5.66839183e+00, -7.95067847e-01],
[-7.57969185e-01, 4.90898421e+00],
[-1.04205695e+01, -3.86688414e+00],
[-7.12425009e+00, -6.70423870e+00],
[-1.37889483e+00, 4.33337717e+00],
[-6.61466444e+00, -7.52579102e+00],
[-1.34052081e+00, 4.15711949e+00],
[-6.21160000e+00, -8.29293984e+00],
[-7.56885613e+00, -8.13527221e+00],
[-1.77000693e+00, 3.78912781e+00],
[-7.36585834e+00, -7.34577219e+00],
[-1.49952284e+00, 5.28265879e+00],
[-2.85882794e+00, 5.26983519e+00],
[-7.73884935e+00, -3.24327665e+00],
[-1.08201797e+01, -3.23163726e+00],
[-8.53682012e+00, -3.36087575e+00],
[-1.20349137e+01, -5.89593773e+00],
[-5.26910909e+00, -2.73521824e+00],
[-6.71299604e+00, -2.90324984e+00],
[-8.36118634e+00, -2.72698382e+00],
[-5.48941428e+00, -6.94662021e+00],
[ 5.31139823e-01, 2.51012895e+00],
[-5.64126775e+00, -7.24922893e+00],
[-9.48263889e+00, -6.73588302e+00],
[-7.53103704e+00, -6.76823676e+00],
[-6.31078595e+00, -2.05174648e+00],
[-8.70233178e+00, -4.19462540e+00],
[-6.11013071e+00, -2.31061128e+00],
[-5.83972633e+00, -9.20677418e+00],
[-1.17536381e+01, -3.23855895e+00],
[-9.29199482e+00, -9.85256171e+00],
[-7.85568214e+00, -6.92950589e+00],
[-1.01967107e+01, -2.08687717e+00],
[-7.96356538e+00, -7.83357116e+00],
[-6.77680402e+00, -6.65511992e+00],
[-1.08749940e+01, -4.82113577e+00],
[-1.84048021e+00, 3.80256924e+00],
[-7.98067403e+00, -8.56048015e+00],
[-6.32066246e+00, -3.30751892e+00],
[-6.17979966e+00, -3.00803447e+00],
[-2.17665436e+00, 3.40946304e+00],
[-6.73224718e-01, 4.62002377e+00],
[-8.93892171e+00, -3.51521408e+00],
[-7.48937497e+00, -8.88475909e+00],
[-2.89641328e+00, 5.28232880e+00],
[-8.13399258e-01, 3.54697393e+00],
[-5.77752667e+00, -2.85145276e+00],
[-6.24883850e+00, -8.76563508e+00],
[-3.10367371e+00, 3.90202401e+00],
[-1.05724063e+00, 4.82677207e+00],
[-5.73215048e+00, -5.04695454e+00],
[-9.93696231e+00, -3.74222379e+00],
[-3.03267723e+00, 4.72164926e+00],
[-1.07035530e+01, -2.76066248e+00],
[-5.68475631e+00, -3.76816924e+00],
[-8.62182374e+00, -8.76567023e+00],
[-6.67177294e+00, -9.97714796e+00],
[-1.92577841e+00, 4.43910442e+00],
[-8.16299488e+00, -3.38896569e+00],
[-3.74380343e+00, -8.75345344e+00],
[-5.66601211e+00, -4.97019633e+00],
[-2.88961804e+00, 4.95702736e+00],
[-2.35995841e+00, 4.20309542e+00],
[-6.80491557e+00, -3.49602548e+00],
[-7.10480676e+00, -4.10830531e+00],
[-6.96685539e+00, -3.12876392e+00],
[-6.31354495e+00, -8.01283267e+00],
[-4.47120679e+00, -3.54131043e+00],
[-1.53940095e+00, 5.02369298e+00],
[-1.60875215e+00, 3.76949422e+00],
[-1.01927698e+01, -3.14795512e+00],
[-2.80207810e+00, 4.05714715e+00],
[ 2.45098802e-01, 5.51754657e+00],
[-3.31028117e+00, 3.51593428e+00],
[-2.84187803e+00, 3.74073535e+00],
[-5.75867612e+00, -8.75783107e+00],
[-5.99591056e+00, -8.11285667e+00],
[-4.98360687e+00, -3.20522961e+00],
[-1.86845414e+00, 4.99311306e+00],
[-9.71503679e+00, -4.77944598e+00],
[-6.47373322e+00, -2.78682541e+00],
[-6.99263028e+00, -7.14344077e+00],
[-1.53773863e+00, 5.53597378e+00],
[-1.04464505e+01, -4.62579659e+00],
[-1.09679881e+00, 4.64722696e+00],
[-7.25256877e+00, -2.91682833e+00],
[-1.97451969e-01, 2.34634916e+00],
[-1.00670412e+01, -4.06174061e+00],
[-6.13468589e+00, -4.50793424e+00],
[-1.03725172e+01, -4.70331816e+00],
[-1.88188805e+00, 4.20573180e+00],
[-7.15498484e+00, -3.10778598e+00],
[-6.14254799e+00, -3.65202206e+00],
[-7.42749427e+00, -9.63838456e+00],
[-1.13009458e+00, 4.54419108e+00],
[-6.28485505e+00, -8.78266971e+00],
[-7.33325349e+00, -8.28490373e+00],
[-6.40320111e+00, -7.16687592e+00],
[-7.22187586e+00, -9.48843083e+00],
[-6.09834293e+00, -7.44017905e+00],
[-7.20807793e+00, -7.12024433e+00],
[-9.68744022e+00, -6.04759636e+00],
[-7.87372938e+00, -7.59578865e+00],
[-1.14663009e+00, 4.10839703e+00],
[-5.90344220e+00, -8.18075749e+00],
[-2.76017908e+00, 5.55121358e+00],
[-1.23606555e+00, 4.48382994e+00],
[-9.97584967e+00, -4.42202236e+00],
[-2.10668847e+00, 5.63099757e+00],
[-4.73558876e+00, -4.23748969e+00],
[-1.07233096e+01, -4.82111722e+00],
[-8.26074369e+00, -5.64724782e+00],
[-6.88384344e+00, -7.04605265e+00],
[-2.15777347e+00, 4.09550489e+00],
[-7.85988444e+00, -4.73888254e+00],
[-4.60642026e-01, 4.59164629e+00],
[-5.05685487e+00, -5.02946642e+00],
[-7.66055006e+00, -8.46234942e+00],
[-8.41923982e+00, -3.45834788e+00],
[-1.09947323e+01, -4.06014253e+00],
[-6.71376529e+00, -8.22199857e+00],
[-1.07972600e+01, -4.24494314e+00],
[-8.23746328e+00, -4.01400104e+00],
[-2.93211866e+00, 4.72003759e+00],
[-1.66145139e+00, 3.00986944e+00],
[-7.65734347e+00, -1.04581360e+01],
[-9.98054778e+00, -4.38249083e+00],
[-5.51940374e+00, -2.38780334e+00],
[-1.96967668e+00, 1.97165210e+00],
[-3.88464981e+00, -2.84336261e+00],
[-5.82969906e+00, -2.99067321e+00],
[-6.66700176e+00, -9.14923899e+00],
[-6.62889599e+00, -8.84071550e+00],
[-6.48944961e+00, -2.06753733e+00],
[-7.17134231e+00, -1.09442245e+01],
[-1.13042466e+01, -3.87696807e+00],
[-9.53654840e+00, -5.12933122e+00],
[-6.09866132e+00, -7.42731125e+00],
[-8.78925618e+00, -2.83764674e+00],
[-7.32386504e+00, -7.96393491e+00],
[-1.00330804e+01, -1.84274349e+00],
[-1.03619773e+00, 3.97153319e+00],
[-6.42829877e+00, -6.74397472e+00],
[-2.87930430e+00, 6.85585852e+00],
[-1.05299465e+01, -2.83521515e+00],
[-6.11423078e+00, -3.20893543e+00],
[-1.78245013e+00, 3.47072043e+00],
[-8.95271809e+00, -3.34483385e+00],
[-5.16617901e+00, -3.79170586e+00],
[-1.64215050e+00, 3.28447114e+00],
[-8.33534296e+00, -7.87023257e+00],
[-6.31107706e+00, -3.92118081e+00],
[-1.78002448e+00, 3.17336913e+00],
[-1.68417686e+00, 3.63132825e+00],
[-1.05552072e+01, -3.01417980e+00],
[-5.34354009e+00, -2.13897664e+00],
[-1.15365057e+01, -4.40124373e+00],
[-4.89503758e+00, -2.48633456e+00],
[-5.44396990e+00, -8.95941292e+00],
[-1.58173878e+00, 5.02487013e+00],
[-7.02993859e+00, -6.69931052e+00],
[-6.17074238e+00, -2.56078204e+00],
[-2.22186534e+00, 6.36136794e+00],
[-7.57385446e+00, -8.31971406e+00],
[-7.65822594e+00, -7.64292051e+00],
[-6.89501293e+00, -9.31723608e+00],
[-1.11141825e+01, -3.87242145e+00],
[-7.94152277e-01, 2.10495117e+00],
[-6.42803193e+00, -5.52129397e+00],
[-5.89780702e+00, -8.19289680e+00],
[-6.59169697e+00, -2.44779959e+00],
[-6.45785776e+00, -3.30981436e+00],
[-1.07755713e+01, -2.83750744e+00],
[-1.02341495e+01, -3.22553505e+00],
[-6.26681839e+00, -8.25516014e+00],
[-5.20580980e+00, -3.29853839e+00],
[-5.46045264e+00, -2.30831553e+00],
[-7.04259952e+00, -3.45332351e+00],
[-6.09962804e+00, -3.14226915e+00],
[-5.66006950e+00, -3.43776965e+00],
[-7.08097398e+00, -3.03972377e+00],
[-8.41264712e+00, -6.68248825e+00],
[-7.36513410e+00, -1.38859731e+00],
[-1.04166504e+01, -4.43253346e+00],
[-6.41623854e+00, -8.04588481e+00],
[-5.88919348e+00, -2.37049472e+00],
[-1.42946517e+00, 5.16850105e+00],
[-6.56118069e+00, -3.95967311e+00],
[-1.47299851e+00, 4.81654152e+00],
[-5.88100804e+00, -3.31692615e+00],
[-1.04125594e+01, -3.50140251e+00],
[-8.55209377e+00, -3.15841000e+00],
[-7.90673749e-01, 5.15690151e+00],
[-1.00754365e-01, 4.51589257e+00],
[-1.30901393e+00, 3.09420646e+00],
[-9.54755699e+00, -2.18801345e+00],
[-5.32030011e+00, -2.99303869e+00],
[-9.48229870e+00, -5.06821960e+00],
[-6.74361627e+00, -8.87844303e+00],
[-1.02518924e+01, -2.55350460e+00],
[-1.96576392e+00, 5.23446451e+00],
[-5.88036774e+00, -2.36326290e+00],
[-7.34774574e+00, -8.41955499e+00],
[-7.58703957e-01, 3.72276201e+00],
[-8.41357863e+00, -6.85069257e+00],
[-8.20576492e-01, 5.33759195e+00],
[-7.93489041e+00, -7.78403764e+00],
[-5.69446566e+00, -4.06205304e+00],
[-8.57698874e-01, 4.45305717e+00],
[ 1.50975008e-01, 3.10076295e+00],
[-6.55394441e+00, -6.44256627e+00],
[-1.09316272e+01, -4.48636887e+00],
[-6.50155596e+00, -4.65329331e+00],
[-6.93650519e+00, -6.39281292e+00],
[-1.01336898e+01, -4.75061833e+00],
[-9.89148978e+00, -5.47902886e+00],
[-8.89871617e+00, -4.85498304e+00],
[-8.11394993e+00, -7.83656921e+00],
[-5.29078354e+00, -3.64846688e+00],
[-1.41076074e+00, 4.10984872e+00],
[-9.50537595e+00, -4.63402669e+00],
[-7.82749456e+00, -2.51032104e+00],
[-6.38088086e+00, -8.50663809e+00],
[-8.96014913e+00, -8.06349899e+00],
[-7.66603898e+00, -7.59715459e+00],
[-6.46534407e+00, -2.85544633e+00]])
案例:图片矢量量化
# 导入库
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_sample_image
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances_argmin
from sklearn.utils import shuffle
china = load_sample_image("china.jpg")
plt.axis(False)
plt.imshow(china)
<matplotlib.image.AxesImage at 0x7fec1c379e10>
china.shape
(427, 640, 3)
china.dtype
dtype(\'uint8\')
china
array([[[174, 201, 231],
[174, 201, 231],
[174, 201, 231],
...,
[250, 251, 255],
[250, 251, 255],
[250, 251, 255]],
[[172, 199, 229],
[173, 200, 230],
[173, 200, 230],
...,
[251, 252, 255],
[251, 252, 255],
[251, 252, 255]],
[[174, 201, 231],
[174, 201, 231],
[174, 201, 231],
...,
[252, 253, 255],
[252, 253, 255],
[252, 253, 255]],
...,
[[ 88, 80, 7],
[147, 138, 69],
[122, 116, 38],
...,
[ 39, 42, 33],
[ 8, 14, 2],
[ 6, 12, 0]],
[[122, 112, 41],
[129, 120, 53],
[118, 112, 36],
...,
[ 9, 12, 3],
[ 9, 15, 3],
[ 16, 24, 9]],
[[116, 103, 35],
[104, 93, 31],
[108, 102, 28],
...,
[ 43, 49, 39],
[ 13, 21, 6],
[ 15, 24, 7]]], dtype=uint8)
china[0][0]
array([174, 201, 231], dtype=uint8)
import pandas as pd
pd.DataFrame(china.reshape(427 * 640, 3)).drop_duplicates().shape
(96615, 3)
n_clusters = 64
china = np.array(china, dtype="float64") / china.max()
w, h, d = original_shape = tuple(china.shape)
w
427
h
640
d
3
assert d == 3, "d 必须为 3"
image_array = np.reshape(china, (427 * 640, 3))
image_array.shape
(273280, 3)
image_array_sample = shuffle(image_array, random_state=0)[:1000]
kmeans = KMeans(n_clusters=n_clusters, n_init="auto", random_state=0).fit(
image_array_sample
)
# 质心的坐标
kmeans.cluster_centers_
array([[0.97323103, 0.97706735, 0.99369139],
[0.32053664, 0.29638803, 0.25180599],
[0.70375817, 0.7504902 , 0.74052288],
[0.06169935, 0.06196078, 0.04235294],
[0.50718954, 0.53594771, 0.40043573],
[0.83529412, 0.86349206, 0.89505135],
[0.40612745, 0.40612745, 0.22377451],
[0.81568627, 0.53803922, 0.35529412],
[0.22527233, 0.16034858, 0.13420479],
[0.50028011, 0.54789916, 0.57478992],
[0.73524384, 0.82021116, 0.91925591],
[0.90313725, 0.90333333, 0.90607843],
[0.26381462, 0.26773619, 0.1144385 ],
[0.72268908, 0.36022409, 0.25210084],
[0.38867102, 0.46230937, 0.42788671],
[0.88687783, 0.91463047, 0.94932127],
[0.97777778, 0.77254902, 0.60261438],
[0.80999367, 0.82530044, 0.84845035],
[0.61497326, 0.67593583, 0.71265597],
[0.1120915 , 0.13888889, 0.13398693],
[0.48714597, 0.49215686, 0.26143791],
[0.33832442, 0.36684492, 0.31764706],
[0.51372549, 0.33333333, 0.19529412],
[0.8127451 , 0.89264706, 0.98071895],
[0.14323063, 0.10718954, 0.07656396],
[0.76068627, 0.85617647, 0.9604902 ],
[0.45065359, 0.32581699, 0.28562092],
[0.16127451, 0.24068627, 0.24215686],
[0.33986928, 0.26339869, 0.09477124],
[0.61699346, 0.59836601, 0.54052288],
[0.20555556, 0.22287582, 0.08137255],
[0.93776091, 0.9368754 , 0.9485136 ],
[0.40392157, 0.16627451, 0.10156863],
[0.89411765, 0.63764706, 0.43529412],
[0.40606061, 0.44278075, 0.12121212],
[0.225 , 0.07034314, 0.06446078],
[0.28683473, 0.44593838, 0.43305322],
[0.59176471, 0.55215686, 0.43137255],
[0.5827451 , 0.55098039, 0.32078431],
[0.20588235, 0.3379085 , 0.33202614],
[0.83071895, 0.79150327, 0.7254902 ],
[0.72679739, 0.56339869, 0.44575163],
[0.03006536, 0.02538126, 0.01372549],
[0.9 , 0.94498911, 0.99368192],
[0.54980392, 0.44627451, 0.43294118],
[0.74871795, 0.79140271, 0.79803922],
[0.3025641 , 0.33182504, 0.18793363],
[0.54836601, 0.63137255, 0.63529412],
[0.69346405, 0.70653595, 0.64901961],
[0.56339869, 0.40130719, 0.30718954],
[0.93368192, 0.96104575, 0.99616558],
[0.05784314, 0.17156863, 0.2127451 ],
[0.11960784, 0.04191176, 0.0370098 ],
[0.26039216, 0.23581699, 0.20156863],
[0.52679739, 0.53431373, 0.49477124],
[0.0799253 , 0.10644258, 0.054155 ],
[0.71540616, 0.43473389, 0.32268908],
[0.40627451, 0.40235294, 0.33960784],
[0.33604827, 0.34690799, 0.12217195],
[0.84684685, 0.91944886, 0.99194489],
[0.46784314, 0.4372549 , 0.37607843],
[0.16265173, 0.16190476, 0.12380952],
[0.43071895, 0.24183007, 0.18627451],
[0.31176471, 0.15392157, 0.13578431]])
# 质心的索引
label = kmeans.predict(image_array)
label
array([10, 10, 10, ..., 61, 3, 3], dtype=int32)
kmeans.cluster_centers_[1]
array([0.32053664, 0.29638803, 0.25180599])
image_kmeans = image_array.copy()
for i in range(w * h):
image_kmeans[i] = kmeans.cluster_centers_[label[i]]
image_kmeans
array([[0.73524384, 0.82021116, 0.91925591],
[0.73524384, 0.82021116, 0.91925591],
[0.73524384, 0.82021116, 0.91925591],
...,
[0.16265173, 0.16190476, 0.12380952],
[0.06169935, 0.06196078, 0.04235294],
[0.06169935, 0.06196078, 0.04235294]])
image_kmeans = image_kmeans.reshape(w, h, d)
image_kmeans.shape
(427, 640, 3)
# 随机取出64个质心
centroid_random = shuffle(image_array)[:n_clusters]
# 函数pairwise_distances_argmin(x1,x2,axis) #x1和x2分别是序列
# 用来计算x2中的每个样本到x1中的每个样本点的距离,并返回和x2相同形状的,x1中对应的最近的样本点的索引
labels_random = pairwise_distances_argmin(centroid_random, image_array, axis=0)
image_random = image_array.copy()
for i in range(w * h):
image_random[i] = centroid_random[labels_random[i]]
image_random = image_random.reshape(w, h, d)
image_random.shape
(427, 640, 3)
labels_random
array([55, 55, 55, ..., 52, 60, 60])
plt.figure(figsize=(10, 10))
plt.axis("off")
plt.title("Original image (96,615 colors)")
plt.imshow(china)
plt.figure(figsize=(10, 10))
plt.axis("off")
plt.title("Quantized image (64 colors, K-Means)")
plt.imshow(image_kmeans)
plt.figure(figsize=(10, 10))
plt.axis("off")
plt.title("Quantized image (64 colors, Random)")
plt.imshow(image_random)
plt.show()
算法笔记:Kmeans聚类算法简介
1. Kmeans算法简介
Kmeans算是非常经典的一个聚类算法了,早已经被写到教科书里面了,不过很不幸的是,最近干活遇到了这个,然后我发现我已经忘得差不多一干二净了……
所以这里就过来挖个坟,考个古,把这玩意拉出来复习一下。
如前所述,Kmeans算法是一个聚类算法,具体来说,我们输入一个包含 N N N个点的点集,我们的目的是要将这 N N N个点分为 K K K个簇,使得每个点到各自的簇的中心距离之和最小。
用公式来表达的话就是:
s = ∑ i = 1 N m i n j ∈ 1 , . . . , K ( d ( x i , u j ) ) s = \\sum_i=1^N \\mathopmin\\limits_j \\in \\1, ..., K\\(d(x_i, u_j)) s=i=1∑Nj∈1,...,Kmin(d(xi,uj))
要找到一组 u j u_j uj使得 s s s最大。
其中, d ( x , y ) d(x, y) d(x,y)表示 x , y x,y x,y两点间的距离,一般我们在这里使用欧氏距离。
2. Kmeans算法细节
Kmeans算法的核心思路是迭代。
首先,我们随机从 N N N个点当中选出 K K K个点作为簇的中心点。
然后,根据全部的 N N N个点到这 K K K个中心点之间的距离,我们就可以将这全部的 N N N个点进行分类,分配到这 K K K个簇当中。
而后,我们更新这 K K K个簇的中心,具体来说,我们取这 K K K个点的均值点作为这 K K K个簇的新的中心。
我们不断地重复上述两个步骤,直到达到迭代上限或者簇的中心点不再发生变化即可。
具体的,我们可以给出上述Kmeans算法的算法整理如下:
- step 1: 从 N N N个给定点当中随机 K K K个点作为 K K K个簇的中心点;
- step 2: 计算每一个点到这 K K K个簇的中心点之间的欧式距离,将其分配到最小的那个簇当中,从而对所有的点进行聚类;
- step 3: 对于2中得到的每一个簇,更新其中心点为所有点的均值,即 u = ∑ i x i n \\boldu = \\frac\\sum_i \\boldx_in u=n∑ixi;
- step 4: 重复上述2-3两步,直到迭代次数达到上限或者簇的中心不再发生变化。
而Kmeans的算法的优缺点因此也就比较明显:
- 优点
- 易实现,易debug
- 缺点
- 迭代非常耗时,对于大数据量尤其明显;
- 较依赖于初始化中心的选择,不同初始化中心点的选择会带来较大的结果差异;
3. Kmeans算法收敛性证明
现在,给出了kmeans聚类算法之后,我们来考察一下kmeans算法的收敛性,也就是说,为什么kmeans算法的迭代是有效的。
我们使用原始的kmeans算法进行说明,即是说,使用欧式距离来对两点间的距离进行描述,此时,前述提到的loss函数就可以表达为:
s = ∑ i = 1 N m i n j ∈ 1 , . . . , K ∣ ∣ x i , u j ∣ ∣ s = \\sum_i=1^N \\mathopmin\\limits_j \\in \\1, ..., K\\ ||x_i, u_j|| s=i=1∑Nj∈1,...,Kmin∣∣xi,uj∣∣
具体到第 k k k次迭代上,即有:
s k = ∑ i = 1 N m i n j ∣ ∣ x i , u j k ∣ ∣ s^k = \\sum_i=1^N \\mathopmin\\limits_j ||x_i, u_j^k|| sk=i=1∑Njmin∣∣xi,ujk∣∣
显然, s k s^k sk是一个大于0的数列,因此,我们只需要证明 s k s^k sk递减,那么数列 s k s^k sk必然收敛。
因此,我们只需要证明 s k + 1 ≤ s k s^k+1 \\leq s^k sk+1≤sk即可。
我们考察第 k k k次迭代,它分为两步:
- 对于上一次分类完成的簇,更新簇的中心从
u
k
u^k
uk到
u
k
+
1
u^k+1
uk+1;
s k + 1 ′ = ∑ i = 1 N ∣ ∣ x i , u j k + 1 ∣ ∣ s^k+1' = \\sum_i=1^N ||x_i, u_j^k+1|| sk+1′=i=1∑N∣∣xi,ujk+1∣∣ - 使用新的簇中心
u
k
+
1
u^k+1
uk+1对所有的点进行更新;
s k + 1 = ∑ i = 1 N m i n j ∣ ∣ x i , u j k + 1 ∣ ∣ s^k+1 = \\sum_i=1^N \\mathopmin\\limits_j ||x_i, u_j^k+1|| sk+1=i=1∑Njmin∣∣xi,ujk+1∣∣
其中,对于步骤二,显然有 s k + 1 ≤ s k + 1 ′ s^k+1 \\leq s^k+1' sk+1≤sk+1′。因此,我们只要说明步骤一当中的聚类中心变换之后获得的新的 s k + 1 ′ s^k+1' sk+1′小于等于 s k s^k sk即可。
而在这步骤一当中,由于簇的成员都没有发生改变,因此,我们要证明的问题也就是:
- 对一系列点 x 1 , . . . , x n i \\boldx_1, ..., \\boldx_n_i x1,...,xni, s = ∑ j = 1 n i ∣ ∣ x j − μ ∣ ∣ s = \\sum\\limits_j=1^n_i ||\\boldx_j - \\bold\\mu|| s=j=1∑ni∣∣xj−μ∣∣在 μ = 1 n i ∑ j = 1 n i x j \\bold\\mu = \\frac1n_i\\sum\\limits_j=1^n_i \\boldx_j μ=nKmeans算法的过程是什么?Kmeans算法的缺陷主要有哪些?