KMeans

Posted Thank CAT

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了KMeans相关的知识,希望对你有一定的参考价值。

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
x, y = make_blobs(n_samples=500, n_features=2, centers=4, random_state=1)
color = ["red", "pink", "orange", "gray"]
fig, ax1 = plt.subplots(1)
for i in range(4):
    ax1.scatter(x[y == i, 0], x[y == i, 1], marker="o", s=8, c=color[i])
plt.show()


from sklearn.cluster import KMeans

n_clusters = 3

cluster = KMeans(n_clusters=n_clusters, n_init="auto", random_state=1).fit(x)
# 聚类预测结果
y_predict = cluster.labels_
y_predict
array([2, 2, 0, 1, 0, 1, 0, 0, 0, 0, 2, 2, 0, 1, 0, 2, 0, 2, 1, 0, 0, 0,
       0, 1, 0, 0, 1, 1, 0, 0, 2, 1, 0, 2, 0, 2, 0, 0, 2, 0, 0, 0, 1, 0,
       0, 2, 0, 0, 1, 1, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
       2, 0, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1,
       0, 0, 1, 2, 0, 0, 1, 2, 2, 0, 2, 1, 1, 2, 1, 0, 1, 0, 0, 1, 1, 0,
       0, 2, 1, 0, 1, 0, 1, 0, 1, 0, 0, 2, 2, 0, 0, 0, 1, 2, 2, 0, 1, 0,
       0, 0, 0, 2, 1, 0, 1, 1, 0, 2, 0, 1, 1, 1, 0, 0, 2, 2, 0, 0, 1, 2,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 1, 2, 0, 0, 2, 1, 0,
       0, 0, 0, 2, 0, 0, 1, 2, 2, 0, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 0, 2,
       2, 1, 2, 0, 1, 0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 1,
       0, 2, 0, 0, 0, 0, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1, 0, 2, 1, 2, 0, 0,
       2, 2, 2, 2, 0, 0, 2, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0,
       1, 0, 2, 2, 0, 0, 0, 0, 1, 1, 0, 1, 0, 2, 1, 2, 1, 2, 2, 1, 2, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 2, 1, 2, 2, 2, 0, 0, 0, 2, 0, 2, 2, 0, 2,
       2, 0, 1, 2, 0, 0, 1, 1, 0, 2, 1, 1, 0, 2, 1, 1, 0, 0, 1, 0, 0, 2,
       2, 1, 0, 2, 0, 1, 1, 0, 0, 0, 2, 0, 1, 1, 0, 1, 1, 1, 1, 2, 2, 0,
       1, 0, 0, 2, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 2, 1, 2, 2, 2, 2, 2,
       2, 0, 2, 1, 2, 1, 1, 0, 1, 0, 0, 0, 2, 1, 0, 1, 0, 2, 0, 0, 2, 0,
       0, 1, 1, 2, 0, 0, 1, 0, 0, 2, 2, 0, 2, 0, 0, 2, 0, 2, 0, 1, 2, 1,
       0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 0, 0, 0, 0, 2, 1, 2, 0, 1, 2, 2, 2,
       0, 1, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 0, 1, 0,
       1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 2, 0, 1, 0, 2, 1, 2, 1, 2, 0, 1, 1,
       2, 0, 0, 2, 0, 0, 0, 2, 0, 1, 0, 0, 2, 2, 2, 0], dtype=int32)
# 质心的位置
centroid = cluster.cluster_centers_
centroid
array([[-8.0807047 , -3.50729701],
       [-1.54234022,  4.43517599],
       [-7.11207261, -8.09458846]])
color = ["red", "pink", "orange", "gray"]
fig, ax1 = plt.subplots(1)
for i in range(n_clusters):
    ax1.scatter(x[y_predict == i, 0], x[y_predict == i, 1], marker="o", s=8, c=color[i])
ax1.scatter(centroid[:, 0], centroid[:, 1], marker="x", s=100, c="black")
plt.show()


cluster.inertia_
1903.5607664611762
n_clusters = 4
cluster = KMeans(n_clusters=n_clusters, n_init="auto", random_state=1).fit(x)
cluster.inertia_
908.3855684760615
n_clusters = 100
cluster = KMeans(n_clusters=n_clusters, n_init="auto", random_state=1).fit(x)
cluster.inertia_
34.70849858088455
# 轮廓系数
from sklearn.metrics import silhouette_score
from sklearn.metrics import silhouette_samples
silhouette_score(x, y_predict)
0.5882004012129721
silhouette_score(x, cluster.labels_)
0.3626791469009942
silhouette_samples(x, y_predict)
array([ 0.62982017,  0.5034877 ,  0.56148795,  0.84881844,  0.56034142,
        0.78740319,  0.39254042,  0.4424015 ,  0.48582704,  0.41586457,
        0.62497924,  0.75540751,  0.50080674,  0.8452256 ,  0.54730432,
        0.60232423,  0.54574988,  0.68789747,  0.86605921,  0.25389678,
        0.49316173,  0.47993065,  0.2222642 ,  0.8096265 ,  0.54091189,
        0.30638567,  0.88557311,  0.84050532,  0.52855895,  0.49260117,
        0.65291019,  0.85602282,  0.47734375,  0.60418857,  0.44210292,
        0.6835351 ,  0.44776257,  0.423086  ,  0.6350923 ,  0.4060121 ,
        0.54540657,  0.5628461 ,  0.78366733,  0.37063114,  0.35132112,
        0.74493029,  0.53691616,  0.36724842,  0.87717083,  0.79594363,
        0.84641859,  0.38341344,  0.42043012,  0.4024608 ,  0.64639537,
        0.46244151,  0.31853572,  0.10047008,  0.37909034,  0.56424494,
        0.86153448,  0.82630007,  0.53288582,  0.35699772,  0.86994617,
        0.52259763,  0.71296285,  0.5269434 ,  0.42375504,  0.3173951 ,
        0.67512993,  0.47574584,  0.44493897,  0.70152025,  0.37911024,
        0.44338293,  0.75528756,  0.23339973,  0.48832955,  0.36920643,
        0.84872127,  0.87346766,  0.53069113,  0.85553096,  0.85764386,
        0.47306874,  0.02036611,  0.83126042,  0.38759022,  0.49233068,
        0.74566044,  0.60466216,  0.56741342,  0.43416703,  0.83602352,
        0.72477786,  0.65632253,  0.53058775,  0.60023269,  0.77641023,
        0.84703763,  0.70993659,  0.7801523 ,  0.46161604,  0.84373446,
        0.39295281,  0.46052385,  0.88273449,  0.87440032,  0.48304623,
        0.53380475,  0.75891465,  0.85876382,  0.38558097,  0.85795763,
        0.39785899,  0.85219954,  0.53642823,  0.86038619,  0.43699704,
        0.38829633,  0.54291415,  0.69030671,  0.43887074,  0.51384962,
        0.51912781,  0.83667847,  0.76248539,  0.69612144,  0.51530997,
        0.86167552,  0.55346107,  0.56205672,  0.49273512,  0.38805592,
        0.57038854,  0.68677314,  0.20332654,  0.75659329,  0.82280178,
        0.51078711,  0.56655943,  0.39855324,  0.87777997,  0.81846156,
        0.85011915,  0.53745726,  0.48476499,  0.57083761,  0.62520973,
        0.48791422,  0.57163867,  0.80710385,  0.75753237,  0.80107683,
        0.50370862,  0.49411065,  0.56270422,  0.46054445,  0.46870708,
        0.53443711,  0.52806612,  0.54696216,  0.38036632,  0.8439417 ,
        0.43517732,  0.74914748,  0.64728736,  0.41663216,  0.8823285 ,
        0.65599758,  0.56449485,  0.51988053,  0.62928512,  0.88015404,
        0.56872777,  0.39189978,  0.49345531,  0.46686063,  0.59723997,
        0.44721036,  0.30721342,  0.75113026,  0.50932716,  0.73578982,
       -0.11420488,  0.41858652,  0.75882296,  0.7275962 , -0.04073665,
        0.80153593,  0.87004395,  0.68206941,  0.43331808,  0.46482802,
        0.84659276,  0.50866477,  0.68601103,  0.74449975,  0.83022338,
        0.73707965,  0.27681202,  0.66098479,  0.28977719,  0.51863521,
        0.63445046,  0.40559979,  0.14818081,  0.76068525,  0.23252498,
        0.53021521,  0.47737535,  0.20930573,  0.73655361,  0.40050939,
        0.38201296,  0.53131423,  0.8300432 ,  0.57416668,  0.83002234,
        0.43809863,  0.72601129,  0.30355831,  0.36933954,  0.48245049,
        0.50126688,  0.50360422,  0.87011861,  0.56950365,  0.83076761,
        0.71764725,  0.53645163,  0.7001754 ,  0.50522187,  0.87888555,
        0.77936165,  0.10535855,  0.73083257,  0.87808798,  0.66433392,
        0.46478475,  0.37703473,  0.73374533,  0.74890043,  0.73918627,
        0.63932594,  0.09590229,  0.56398421,  0.65471361,  0.32850826,
        0.50686886,  0.82252268,  0.8784639 ,  0.50307722,  0.55480534,
        0.87909816,  0.47641098,  0.31311959,  0.52686075,  0.88545307,
        0.20448704,  0.80778118,  0.44642434,  0.40574811,  0.88056023,
        0.4973487 ,  0.69311101,  0.72625355,  0.48589387,  0.4978385 ,
        0.55313636,  0.50253656,  0.87260952,  0.86131163,  0.40383223,
        0.86877735,  0.47545049,  0.55504965,  0.88434796,  0.70495153,
        0.88081422,  0.73413228,  0.74319485,  0.86247661,  0.68152552,
        0.87029291,  0.81761732,  0.55085702,  0.49102505,  0.55389601,
        0.124766  ,  0.4404892 ,  0.53977082,  0.57674226,  0.52475521,
        0.71693971,  0.59037229,  0.27134864,  0.55075649,  0.5305809 ,
        0.45997724,  0.52098416,  0.69242901,  0.42370109,  0.55411474,
        0.56138849,  0.53447704,  0.69329183,  0.54368936,  0.32886853,
        0.86126399,  0.71469113,  0.49146367,  0.50494774,  0.82158862,
        0.86861319,  0.54403438,  0.73940315,  0.81462808,  0.84352203,
        0.48207009,  0.7354327 ,  0.78085872,  0.87875202,  0.04033208,
        0.50804578,  0.80938918,  0.51061604,  0.38053425,  0.64455589,
        0.67957545,  0.87709406,  0.54770971,  0.49617626,  0.06631062,
        0.82052164,  0.85247897,  0.4986702 ,  0.41583248,  0.53794955,
        0.73049329,  0.28601778,  0.87874615,  0.86432778,  0.53085921,
        0.81504707,  0.80902757,  0.73654387,  0.79629133,  0.69825831,
        0.71042076,  0.37753505,  0.87392688,  0.36052199,  0.53293388,
        0.65652301,  0.8590337 ,  0.37778142,  0.88171647,  0.55744616,
        0.72988524,  0.47205379,  0.25321102,  0.36665898,  0.87510459,
        0.54567292,  0.4377203 ,  0.69836179,  0.88279947,  0.73712769,
        0.7571288 ,  0.64200399,  0.71414246,  0.66105524,  0.64924985,
       -0.03393189,  0.67879166,  0.87717775,  0.70483203,  0.81570721,
        0.88445546,  0.42536337,  0.84352976,  0.19940384,  0.33446675,
       -0.05200008,  0.63729057,  0.86077417,  0.29232998,  0.85936207,
        0.01230106,  0.74072871,  0.54572786,  0.4226642 ,  0.75803727,
        0.41490286,  0.47701084,  0.81796862,  0.80656788,  0.63246787,
        0.43149716,  0.47554846,  0.67481449,  0.29491288,  0.47884262,
        0.73531065,  0.74909774,  0.53905722,  0.60853703,  0.41799506,
        0.26889856,  0.65941878,  0.57469934,  0.74695893,  0.53566443,
        0.87031783,  0.55546256,  0.74959292,  0.52013136,  0.48602131,
        0.84252024,  0.5553399 ,  0.32396765,  0.83121787,  0.6507822 ,
        0.40589711,  0.81861161,  0.85537229,  0.51500612,  0.46370284,
        0.35233694,  0.41423309,  0.66647621,  0.87838551,  0.55564776,
        0.52172866,  0.80216634,  0.74626963,  0.70305507,  0.727976  ,
        0.4315848 ,  0.71546113, -0.14042082,  0.70475791,  0.54510442,
        0.49963818,  0.50497552,  0.5260391 ,  0.7371355 ,  0.39249758,
        0.47181954,  0.51361169,  0.4902578 ,  0.42402416,  0.54710266,
        0.42517899,  0.54612333,  0.40920498,  0.73864644,  0.5056526 ,
        0.87463183,  0.41531738,  0.88324604,  0.4574416 ,  0.50326717,
        0.56519891,  0.86397315,  0.84031419,  0.81795975,  0.55956891,
        0.43032946,  0.28423933,  0.75002919,  0.53694244,  0.86418082,
        0.50509088,  0.75702551,  0.85123063,  0.47073065,  0.85904201,
        0.69214588,  0.32746785,  0.87507056,  0.77556871,  0.47820639,
        0.37692453,  0.23345891,  0.46482472,  0.36325517,  0.17966353,
        0.31925836,  0.67652463,  0.35889712,  0.87965911,  0.3907438 ,
        0.5748237 ,  0.74655924,  0.57403918,  0.69733646,  0.52992071])
from sklearn.metrics import calinski_harabasz_score

calinski_harabasz_score(x, y_predict)
1809.991966958033
from time import time

now = time()
calinski_harabasz_score(x, y_predict)
time() - now
0.0034482479095458984
now = time()
silhouette_score(x, y_predict)
time() - now
0.008353948593139648
import datetime

datetime.datetime.fromtimestamp(time()).strftime(r"%Y-%m-%d %H:%M:%S")
\'2023-04-21 00:14:24\'
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np

for n_clusters in [2, 3, 4, 5, 6, 7]:
    n_clusters = n_clusters
    # 设置画布和子画布
    fig, (ax1, ax2) = plt.subplots(1, 2)
    # 设置画布尺寸
    fig.set_size_inches(18, 7)
    # 设置子ax1的X轴刻度
    ax1.set_xlim([-0.1, 1])
    # 设置子ax2的Y轴刻度 0 ——(500 + (2 + 1)* 10)
    ax1.set_ylim([0, x.shape[0] + (n_clusters + 1) * 10])
    # 实例化KMeans
    clusterer = KMeans(n_clusters=n_clusters, n_init="auto", random_state=10).fit(x)
    # 每个样本点对应的标签
    cluster_labels = clusterer.labels_
    # 计算轮廓系数的均值
    silhouette_avg = silhouette_score(x, cluster_labels)
    print(
        "For n_clusters =",
        n_clusters,
        "The average silhouette_score is :",
        silhouette_avg,
    )
    # 计算数据集中每个样本自己的轮廓系数
    sample_silhouette_values = silhouette_samples(x, cluster_labels)

    # 为了不让图形紧贴X轴
    y_lower = 10

    for i in range(n_clusters):
        # 取出每个样本对应标签 i 的数组,并进行排序
        ith_cluster_silhouette_values = sample_silhouette_values[cluster_labels == i]
        ith_cluster_silhouette_values.sort()

        # 取出每个样本对应标签 i 的数组的 总记录数
        size_cluster_i = ith_cluster_silhouette_values.shape[0]

        # 10 + 每个样本对应标签 i 的数组的 总记录数
        y_upper = y_lower + size_cluster_i
        # 随机颜色
        color = cm.nipy_spectral(float(i) / n_clusters)

        ax1.fill_betweenx(
            np.arange(y_lower, y_upper),  # X轴
            ith_cluster_silhouette_values,  # Y轴
            facecolor=color,
            alpha=0.7,  # 透明度
        )

        # Y轴上的标签
        ax1.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
        # 更新下一个簇的位置
        y_lower = y_upper + 10
        # 设置标签
        ax1.set_title("The silhouette plot for the various clusters.")
        ax1.set_xlabel("The silhouette coefficient values")
        ax1.set_ylabel("Cluster label")

        # 画出平均线
        ax1.axvline(x=silhouette_avg, color="red", line)
        # 清空Y轴坐标
        ax1.set_yticks([])
        ax1.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])
        colors = cm.nipy_spectral(cluster_labels.astype(float) / n_clusters)

        # 画第二个画布的散点图
        ax2.scatter(x[:, 0], x[:, 1], marker="o", s=8, c=colors)

        # 画出质心
        centers = clusterer.cluster_centers_
        ax2.scatter(centers[:, 0], centers[:, 1], marker="x", c="red", alpha=1, s=200)

        ax2.set_title("The visualization of the clustered data.")
        ax2.set_xlabel("Feature space for the 1st feature")
        ax2.set_ylabel("Feature space for the 2nd feature")

        plt.suptitle(
            (
                "Silhouette analysis for KMeans clustering on sample data "
                "with n_clusters = %d" % n_clusters
            ),
            fontsize=14,
            fontweight="bold",
        )
plt.show()
For n_clusters = 2 The average silhouette_score is : 0.7049787496083262
For n_clusters = 3 The average silhouette_score is : 0.5882004012129721
For n_clusters = 4 The average silhouette_score is : 0.6505186632729437
For n_clusters = 5 The average silhouette_score is : 0.5662344175321901
For n_clusters = 6 The average silhouette_score is : 0.4358297989156284
For n_clusters = 7 The average silhouette_score is : 0.3685767770971513

重要参数 init & random_state & n_init

x
array([[-6.92324165e+00, -1.06695320e+01],
       [-8.63062033e+00, -7.13940564e+00],
       [-9.63048069e+00, -2.72044935e+00],
       [-2.30647659e+00,  5.30797676e+00],
       [-7.57005366e+00, -3.01446491e+00],
       [-1.00051011e+00,  2.77905153e+00],
       [-4.81826839e+00, -2.77214822e+00],
       [-5.33964799e+00, -1.27625764e+00],
       [-7.94308840e+00, -3.89993901e+00],
       [-5.54924525e+00, -3.41298968e+00],
       [-5.14508990e+00, -9.54492198e+00],
       [-7.09669936e+00, -8.04074036e+00],
       [-5.82641512e+00, -1.96346196e+00],
       [-1.83198811e+00,  3.52863145e+00],
       [-7.34267235e+00, -3.16546482e+00],
       [-7.34072825e+00, -6.92427252e+00],
       [-7.94653906e+00, -3.36768655e+00],
       [-8.24598536e+00, -8.61315821e+00],
       [-1.98197711e+00,  4.02243551e+00],
       [-4.35098035e+00, -3.69476678e+00],
       [-1.04768696e+01, -3.60318139e+00],
       [-1.10195984e+01, -3.15882031e+00],
       [-5.17255904e+00, -4.31835971e+00],
       [-2.40671820e+00,  6.09894447e+00],
       [-6.72149498e+00, -2.88440806e+00],
       [-6.58935963e+00, -4.43379548e+00],
       [-1.46126019e+00,  4.52549851e+00],
       [-9.19003455e-01,  3.45278927e+00],
       [-1.04093517e+01, -2.67482046e+00],
       [-6.36722809e+00, -3.32666072e+00],
       [-6.72766125e+00, -7.14516267e+00],
       [-2.27956075e+00,  5.10452190e+00],
       [-5.84887560e+00, -3.03970506e+00],
       [-6.07993051e+00, -7.08197568e+00],
       [-5.26682929e+00, -2.69645055e+00],
       [-6.05367512e+00, -9.62979077e+00],
       [-1.00822205e+01, -4.25071043e+00],
       [-1.18708735e+01, -3.03273343e+00],
       [-5.37107307e+00, -7.95635833e+00],
       [-9.37590900e+00, -4.55315308e+00],
       [-6.63401987e+00, -2.58340356e+00],
       [-9.54609655e+00, -2.84917422e+00],
       [-1.69825542e+00,  2.79071751e+00],
       [-5.60217602e+00, -6.59908490e-01],
       [-6.03429022e+00, -4.08821196e+00],
       [-6.37230784e+00, -8.63190046e+00],
       [-1.02264783e+01, -2.33998717e+00],
       [-5.95678148e+00, -3.97905701e+00],
       [-1.42706535e+00,  5.08904128e+00],
       [-6.20735304e-01,  6.59346952e+00],
       [-3.28102793e-01,  4.11918201e+00],
       [-1.06230545e+01, -4.54719161e+00],
       [-9.12674270e+00, -4.46180568e+00],
       [-5.24134497e+00, -3.23505873e+00],
       [-7.19967531e+00, -7.10400981e+00],
       [-1.01136977e+01, -4.12880752e+00],
       [-1.03416132e+01, -4.95351774e+00],
       [-1.25041532e+01, -6.06751247e+00],
       [-9.32331640e+00, -4.67574045e+00],
       [-7.32033002e+00, -2.73350095e+00],
       [-2.20533407e+00,  4.20765201e+00],
       [-5.27930518e-01,  5.92630669e+00],
       [-8.87430034e+00, -3.64808151e+00],
       [-6.66948545e+00, -4.26059884e+00],
       [-1.37397258e+00,  5.29163103e+00],
       [-6.60085708e+00, -3.11969688e+00],
       [-7.99175412e+00, -8.33564851e+00],
       [-6.22447869e+00, -2.43846224e+00],
       [-1.11054250e+01, -3.97106687e+00],
       [-8.95762335e+00, -4.87178859e+00],
       [-6.65461644e+00, -7.29335713e+00],
       [-1.09531378e+01, -3.36743812e+00],
       [-5.11351008e+00, -2.01881992e+00],
       [-7.24251438e+00, -9.66368448e+00],
       [-5.34929456e+00, -3.54577332e+00],
       [-6.29261332e+00, -3.68892426e+00],
       [-7.35387953e+00, -8.54504434e+00],
       [-5.78423473e+00, -4.48406848e+00],
       [-5.66256325e+00, -2.34390092e+00],
       [-8.60893311e+00, -4.61469279e+00],
       [-2.52019906e-01,  4.53559145e+00],
       [-1.92744799e+00,  4.93684534e+00],
       [-9.41306589e+00, -3.62907430e+00],
       [-8.48608233e-01,  5.45093196e+00],
       [-8.66753040e-01,  3.78295914e+00],
       [-1.01842915e+01, -4.01017303e+00],
       [-7.93192918e+00, -5.42450547e+00],
       [-2.75447175e+00,  4.57587230e+00],
       [-1.17171070e+01, -3.89622755e+00],
       [-8.85081213e+00, -4.00305113e+00],
       [-1.34392496e+00,  2.38428865e+00],
       [-8.16203654e+00, -7.31459336e+00],
       [-9.18886814e+00, -2.16359386e+00],
       [-7.13229260e+00, -4.02296730e+00],
       [-4.26103071e-02,  4.90923075e+00],
       [-7.24449448e+00, -7.65150300e+00],
       [-8.13784646e+00, -7.65806949e+00],
       [-6.73451345e+00, -1.38330194e+00],
       [-8.96369424e+00, -9.27033880e+00],
       [ 8.68765801e-01,  4.15785509e+00],
       [-5.45176929e-01,  3.81996593e+00],
       [-8.01694428e+00, -8.67137366e+00],
       [-3.33375571e+00,  5.23151969e+00],
       [-1.14385885e+01, -2.72109548e+00],
       [-2.52087627e+00,  5.08120139e+00],
       [-6.84394443e+00, -4.15058222e+00],
       [-5.87619738e+00, -3.28078916e+00],
       [-1.21819546e+00,  4.30633464e+00],
       [-2.00341358e+00,  4.45008673e+00],
       [-1.01077040e+01, -3.94479960e+00],
       [-7.03045854e+00, -1.23734756e+00],
       [-6.95685137e+00, -8.12381049e+00],
       [-2.33022219e+00,  4.78405366e+00],
       [-9.98435983e+00, -4.64804214e+00],
       [-2.33080604e+00,  4.39382527e+00],
       [-1.07796242e+01, -4.39085753e+00],
       [-2.03484486e+00,  3.76775946e+00],
       [-7.16744245e+00, -3.24998378e+00],
       [-4.99221336e-01,  4.77598259e+00],
       [-5.76681144e+00, -3.41281779e+00],
       [-1.06990569e+01, -4.49057157e+00],
       [-7.28729621e+00, -6.68306776e+00],
       [-8.17831829e+00, -8.22063813e+00],
       [-9.14443128e+00, -4.36637786e+00],
       [-7.22323543e+00, -3.51226376e+00],
       [-9.71296439e+00, -3.69088110e+00],
       [-3.19091528e-02,  4.74450157e+00],
       [-7.10406044e+00, -8.38198228e+00],
       [-7.52482501e+00, -7.50887444e+00],
       [-6.31161343e+00, -2.97641697e+00],
       [-5.38142198e-01,  4.81539041e+00],
       [-9.58041050e+00, -3.16857790e+00],
       [-9.53106924e+00, -2.91966168e+00],
       [-1.07650223e+01, -3.27877784e+00],
       [-9.54658956e+00, -4.64826945e+00],
       [-7.39393373e+00, -6.80612264e+00],
       [-2.99151157e+00,  2.64580131e+00],
       [-5.67558254e+00, -4.55902255e+00],
       [-3.51754177e+00,  5.64265390e+00],
       [-9.98539618e-01,  6.19864808e+00],
       [-5.96497901e+00, -2.03746469e+00],
       [-8.85279507e+00, -7.79138079e+00],
       [-4.64310426e+00, -2.22789422e+00],
       [-1.35938959e+00,  4.05424002e+00],
       [-5.25790464e-01,  3.30659860e+00],
       [-1.15637509e+00,  5.69971575e+00],
       [-6.42530010e+00, -2.17328619e+00],
       [-5.70183305e+00, -2.63083838e+00],
       [-6.04632971e+00, -6.92266990e+00],
       [-8.14559288e+00, -7.42775410e+00],
       [-9.15685095e+00, -4.05623576e+00],
       [-9.16170778e+00, -2.40998944e+00],
       [-1.46864442e+00,  6.50674501e+00],
       [-6.74672798e+00, -8.17245974e+00],
       [-1.98605940e+00,  3.06381408e+00],
       [-1.03289957e+01, -3.56680940e+00],
       [-9.34313235e+00, -4.00453699e+00],
       [-9.55954616e+00, -2.83102023e+00],
       [-1.01659113e+01, -4.12752889e+00],
       [-9.84144865e+00, -4.14356957e+00],
       [-1.02768102e+01, -2.33049946e+00],
       [-1.01030572e+01, -3.32315288e+00],
       [-9.90228742e+00, -3.03189848e+00],
       [-9.72121320e+00, -4.68662015e+00],
       [-1.85139546e+00,  3.51886090e+00],
       [-6.69321189e+00, -6.30021862e+00],
       [-6.53371839e+00, -8.14922726e+00],
       [-8.46369500e+00, -8.07146029e+00],
       [-5.75004528e+00, -3.56590967e+00],
       [-1.17104176e+00,  4.33091816e+00],
       [-8.52628579e+00, -8.66957601e+00],
       [-9.23890684e+00, -3.06843973e+00],
       [-6.12803051e+00, -2.51698058e+00],
       [-8.10406451e+00, -7.42020487e+00],
       [-1.61589091e+00,  4.18017563e+00],
       [-8.98758533e+00, -3.03333061e+00],
       [-1.19410359e+01, -3.60085418e+00],
       [-1.04399418e+01, -3.62982119e+00],
       [-1.14242679e+01, -2.18538860e+00],
       [-9.00992914e+00, -9.06865247e+00],
       [-6.47435649e+00, -3.74338863e+00],
       [-9.63138049e+00, -4.99793793e+00],
       [ 5.26015501e-01,  3.00999353e+00],
       [-9.76324393e+00, -9.36656623e+00],
       [-6.27965526e+00, -8.81809587e+00],
       [-9.46883276e+00, -6.19043506e+00],
       [-5.77336618e+00, -3.56739953e+00],
       [-6.69242533e+00, -8.30171791e+00],
       [-7.44439970e+00, -9.16803180e+00],
       [-7.11478469e+00, -5.38699134e+00],
       [-3.85803976e-01,  6.37359162e+00],
       [-2.00454712e+00,  4.17565013e+00],
       [-5.75517628e+00, -9.30821074e+00],
       [-9.14168421e+00, -7.20572694e+00],
       [-5.92092535e+00, -3.27574048e+00],
       [-2.35122066e+00,  4.00973634e+00],
       [-5.91907851e+00, -2.23919861e+00],
       [-5.62200526e+00, -8.69290967e+00],
       [-7.54246304e+00, -8.12722811e+00],
       [-2.41395785e+00,  5.65935802e+00],
       [-6.37151596e+00, -8.91129543e+00],
       [-1.21401792e+01, -4.78351741e+00],
       [-4.45264491e+00,  6.34401868e+00],
       [-5.59698820e+00, -4.19535853e+00],
       [-6.07503622e+00, -2.15606405e+00],
       [-7.24828238e+00, -7.05222790e+00],
       [-4.77891101e+00, -2.41333165e+00],
       [-1.24112155e+01, -5.73091492e+00],
       [-6.75264349e+00, -8.34654975e+00],
       [-5.05492139e+00, -4.22257749e+00],
       [-1.03825448e+01, -2.49524031e+00],
       [-7.22570502e+00, -3.79313579e+00],
       [-1.19498178e+01, -5.35567769e+00],
       [-7.62867092e+00, -8.06354170e+00],
       [-4.61767113e+00, -1.67111145e+00],
       [-5.12219664e+00, -3.31302123e+00],
       [-6.29225072e+00, -2.35738294e+00],
       [ 2.42271161e-04,  5.14853403e+00],
       [-8.79988166e+00, -2.24875438e+00],
       [-2.77687025e+00,  4.64090557e+00],
       [-6.39694979e+00, -3.76963703e+00],
       [-6.92263081e+00, -7.63972262e+00],
       [-1.15768688e+01, -4.78197653e+00],
       [-5.66824737e+00, -3.82607509e+00],
       [-1.11578826e+01, -2.60324173e+00],
       [-1.04730854e+01, -3.47573837e+00],
       [-9.98118494e+00, -3.77616083e+00],
       [-1.04102078e+00,  3.96331794e+00],
       [-9.32856015e+00, -2.60893309e+00],
       [-1.13898357e+00,  3.26214848e+00],
       [-6.17905638e+00, -7.96336646e+00],
       [-1.02356544e+01, -2.79806066e+00],
       [-5.77133256e+00, -8.59222577e+00],
       [-9.14500844e+00, -3.91798845e+00],
       [-1.61734616e+00,  4.98930508e+00],
       [-2.77867530e+00,  6.36256877e+00],
       [-9.54642849e+00, -5.63740853e+00],
       [-6.91486590e+00, -7.68969378e+00],
       [-1.84612968e+00,  4.30474400e+00],
       [-5.52834586e+00, -8.15360311e+00],
       [-6.00915337e+00, -3.34925152e+00],
       [-8.54628324e+00, -4.57138540e+00],
       [-7.31655639e+00, -7.77051293e+00],
       [-7.20423399e+00, -8.88176559e+00],
       [-7.55600732e+00, -8.01885499e+00],
       [-5.67856792e+00, -7.60509852e+00],
       [-5.21446826e+00, -4.79995312e+00],
       [-9.37662980e+00, -2.99722684e+00],
       [-5.31844709e+00, -8.92829839e+00],
       [-1.08278844e+01, -4.83392615e+00],
       [-6.06569910e+00, -1.53376946e+00],
       [-2.34673261e+00,  3.56128423e+00],
       [-1.25606826e+00,  5.00006839e+00],
       [-5.83979745e+00, -2.17836186e+00],
       [-6.87088211e+00, -2.22716236e+00],
       [-1.79600465e+00,  4.28743568e+00],
       [-9.37972697e+00, -4.13752487e+00],
       [-7.23605937e+00, -4.54710992e+00],
       [-1.02794488e+01, -1.89699302e+00],
       [-1.41689046e+00,  4.60832005e+00],
       [-5.78045412e+00, -4.58297922e+00],
       [ 8.52518583e-02,  3.64528297e+00],
       [-9.20268641e+00, -4.32778687e+00],
       [-9.56818636e+00, -4.56034695e+00],
       [-1.16434858e+00,  4.23178671e+00],
       [-6.16345851e+00, -3.10830802e+00],
       [-6.32152564e+00, -9.66280079e+00],
       [-7.52099974e+00, -9.13311836e+00],
       [-9.22029330e+00, -4.07211972e+00],
       [-1.08491682e+01, -2.95246712e+00],
       [-9.86366431e+00, -2.75129369e+00],
       [-6.79715224e+00, -3.45804136e+00],
       [-9.79490066e-01,  4.08668827e+00],
       [-2.06043810e+00,  5.23049549e+00],
       [-5.66839183e+00, -7.95067847e-01],
       [-7.57969185e-01,  4.90898421e+00],
       [-1.04205695e+01, -3.86688414e+00],
       [-7.12425009e+00, -6.70423870e+00],
       [-1.37889483e+00,  4.33337717e+00],
       [-6.61466444e+00, -7.52579102e+00],
       [-1.34052081e+00,  4.15711949e+00],
       [-6.21160000e+00, -8.29293984e+00],
       [-7.56885613e+00, -8.13527221e+00],
       [-1.77000693e+00,  3.78912781e+00],
       [-7.36585834e+00, -7.34577219e+00],
       [-1.49952284e+00,  5.28265879e+00],
       [-2.85882794e+00,  5.26983519e+00],
       [-7.73884935e+00, -3.24327665e+00],
       [-1.08201797e+01, -3.23163726e+00],
       [-8.53682012e+00, -3.36087575e+00],
       [-1.20349137e+01, -5.89593773e+00],
       [-5.26910909e+00, -2.73521824e+00],
       [-6.71299604e+00, -2.90324984e+00],
       [-8.36118634e+00, -2.72698382e+00],
       [-5.48941428e+00, -6.94662021e+00],
       [ 5.31139823e-01,  2.51012895e+00],
       [-5.64126775e+00, -7.24922893e+00],
       [-9.48263889e+00, -6.73588302e+00],
       [-7.53103704e+00, -6.76823676e+00],
       [-6.31078595e+00, -2.05174648e+00],
       [-8.70233178e+00, -4.19462540e+00],
       [-6.11013071e+00, -2.31061128e+00],
       [-5.83972633e+00, -9.20677418e+00],
       [-1.17536381e+01, -3.23855895e+00],
       [-9.29199482e+00, -9.85256171e+00],
       [-7.85568214e+00, -6.92950589e+00],
       [-1.01967107e+01, -2.08687717e+00],
       [-7.96356538e+00, -7.83357116e+00],
       [-6.77680402e+00, -6.65511992e+00],
       [-1.08749940e+01, -4.82113577e+00],
       [-1.84048021e+00,  3.80256924e+00],
       [-7.98067403e+00, -8.56048015e+00],
       [-6.32066246e+00, -3.30751892e+00],
       [-6.17979966e+00, -3.00803447e+00],
       [-2.17665436e+00,  3.40946304e+00],
       [-6.73224718e-01,  4.62002377e+00],
       [-8.93892171e+00, -3.51521408e+00],
       [-7.48937497e+00, -8.88475909e+00],
       [-2.89641328e+00,  5.28232880e+00],
       [-8.13399258e-01,  3.54697393e+00],
       [-5.77752667e+00, -2.85145276e+00],
       [-6.24883850e+00, -8.76563508e+00],
       [-3.10367371e+00,  3.90202401e+00],
       [-1.05724063e+00,  4.82677207e+00],
       [-5.73215048e+00, -5.04695454e+00],
       [-9.93696231e+00, -3.74222379e+00],
       [-3.03267723e+00,  4.72164926e+00],
       [-1.07035530e+01, -2.76066248e+00],
       [-5.68475631e+00, -3.76816924e+00],
       [-8.62182374e+00, -8.76567023e+00],
       [-6.67177294e+00, -9.97714796e+00],
       [-1.92577841e+00,  4.43910442e+00],
       [-8.16299488e+00, -3.38896569e+00],
       [-3.74380343e+00, -8.75345344e+00],
       [-5.66601211e+00, -4.97019633e+00],
       [-2.88961804e+00,  4.95702736e+00],
       [-2.35995841e+00,  4.20309542e+00],
       [-6.80491557e+00, -3.49602548e+00],
       [-7.10480676e+00, -4.10830531e+00],
       [-6.96685539e+00, -3.12876392e+00],
       [-6.31354495e+00, -8.01283267e+00],
       [-4.47120679e+00, -3.54131043e+00],
       [-1.53940095e+00,  5.02369298e+00],
       [-1.60875215e+00,  3.76949422e+00],
       [-1.01927698e+01, -3.14795512e+00],
       [-2.80207810e+00,  4.05714715e+00],
       [ 2.45098802e-01,  5.51754657e+00],
       [-3.31028117e+00,  3.51593428e+00],
       [-2.84187803e+00,  3.74073535e+00],
       [-5.75867612e+00, -8.75783107e+00],
       [-5.99591056e+00, -8.11285667e+00],
       [-4.98360687e+00, -3.20522961e+00],
       [-1.86845414e+00,  4.99311306e+00],
       [-9.71503679e+00, -4.77944598e+00],
       [-6.47373322e+00, -2.78682541e+00],
       [-6.99263028e+00, -7.14344077e+00],
       [-1.53773863e+00,  5.53597378e+00],
       [-1.04464505e+01, -4.62579659e+00],
       [-1.09679881e+00,  4.64722696e+00],
       [-7.25256877e+00, -2.91682833e+00],
       [-1.97451969e-01,  2.34634916e+00],
       [-1.00670412e+01, -4.06174061e+00],
       [-6.13468589e+00, -4.50793424e+00],
       [-1.03725172e+01, -4.70331816e+00],
       [-1.88188805e+00,  4.20573180e+00],
       [-7.15498484e+00, -3.10778598e+00],
       [-6.14254799e+00, -3.65202206e+00],
       [-7.42749427e+00, -9.63838456e+00],
       [-1.13009458e+00,  4.54419108e+00],
       [-6.28485505e+00, -8.78266971e+00],
       [-7.33325349e+00, -8.28490373e+00],
       [-6.40320111e+00, -7.16687592e+00],
       [-7.22187586e+00, -9.48843083e+00],
       [-6.09834293e+00, -7.44017905e+00],
       [-7.20807793e+00, -7.12024433e+00],
       [-9.68744022e+00, -6.04759636e+00],
       [-7.87372938e+00, -7.59578865e+00],
       [-1.14663009e+00,  4.10839703e+00],
       [-5.90344220e+00, -8.18075749e+00],
       [-2.76017908e+00,  5.55121358e+00],
       [-1.23606555e+00,  4.48382994e+00],
       [-9.97584967e+00, -4.42202236e+00],
       [-2.10668847e+00,  5.63099757e+00],
       [-4.73558876e+00, -4.23748969e+00],
       [-1.07233096e+01, -4.82111722e+00],
       [-8.26074369e+00, -5.64724782e+00],
       [-6.88384344e+00, -7.04605265e+00],
       [-2.15777347e+00,  4.09550489e+00],
       [-7.85988444e+00, -4.73888254e+00],
       [-4.60642026e-01,  4.59164629e+00],
       [-5.05685487e+00, -5.02946642e+00],
       [-7.66055006e+00, -8.46234942e+00],
       [-8.41923982e+00, -3.45834788e+00],
       [-1.09947323e+01, -4.06014253e+00],
       [-6.71376529e+00, -8.22199857e+00],
       [-1.07972600e+01, -4.24494314e+00],
       [-8.23746328e+00, -4.01400104e+00],
       [-2.93211866e+00,  4.72003759e+00],
       [-1.66145139e+00,  3.00986944e+00],
       [-7.65734347e+00, -1.04581360e+01],
       [-9.98054778e+00, -4.38249083e+00],
       [-5.51940374e+00, -2.38780334e+00],
       [-1.96967668e+00,  1.97165210e+00],
       [-3.88464981e+00, -2.84336261e+00],
       [-5.82969906e+00, -2.99067321e+00],
       [-6.66700176e+00, -9.14923899e+00],
       [-6.62889599e+00, -8.84071550e+00],
       [-6.48944961e+00, -2.06753733e+00],
       [-7.17134231e+00, -1.09442245e+01],
       [-1.13042466e+01, -3.87696807e+00],
       [-9.53654840e+00, -5.12933122e+00],
       [-6.09866132e+00, -7.42731125e+00],
       [-8.78925618e+00, -2.83764674e+00],
       [-7.32386504e+00, -7.96393491e+00],
       [-1.00330804e+01, -1.84274349e+00],
       [-1.03619773e+00,  3.97153319e+00],
       [-6.42829877e+00, -6.74397472e+00],
       [-2.87930430e+00,  6.85585852e+00],
       [-1.05299465e+01, -2.83521515e+00],
       [-6.11423078e+00, -3.20893543e+00],
       [-1.78245013e+00,  3.47072043e+00],
       [-8.95271809e+00, -3.34483385e+00],
       [-5.16617901e+00, -3.79170586e+00],
       [-1.64215050e+00,  3.28447114e+00],
       [-8.33534296e+00, -7.87023257e+00],
       [-6.31107706e+00, -3.92118081e+00],
       [-1.78002448e+00,  3.17336913e+00],
       [-1.68417686e+00,  3.63132825e+00],
       [-1.05552072e+01, -3.01417980e+00],
       [-5.34354009e+00, -2.13897664e+00],
       [-1.15365057e+01, -4.40124373e+00],
       [-4.89503758e+00, -2.48633456e+00],
       [-5.44396990e+00, -8.95941292e+00],
       [-1.58173878e+00,  5.02487013e+00],
       [-7.02993859e+00, -6.69931052e+00],
       [-6.17074238e+00, -2.56078204e+00],
       [-2.22186534e+00,  6.36136794e+00],
       [-7.57385446e+00, -8.31971406e+00],
       [-7.65822594e+00, -7.64292051e+00],
       [-6.89501293e+00, -9.31723608e+00],
       [-1.11141825e+01, -3.87242145e+00],
       [-7.94152277e-01,  2.10495117e+00],
       [-6.42803193e+00, -5.52129397e+00],
       [-5.89780702e+00, -8.19289680e+00],
       [-6.59169697e+00, -2.44779959e+00],
       [-6.45785776e+00, -3.30981436e+00],
       [-1.07755713e+01, -2.83750744e+00],
       [-1.02341495e+01, -3.22553505e+00],
       [-6.26681839e+00, -8.25516014e+00],
       [-5.20580980e+00, -3.29853839e+00],
       [-5.46045264e+00, -2.30831553e+00],
       [-7.04259952e+00, -3.45332351e+00],
       [-6.09962804e+00, -3.14226915e+00],
       [-5.66006950e+00, -3.43776965e+00],
       [-7.08097398e+00, -3.03972377e+00],
       [-8.41264712e+00, -6.68248825e+00],
       [-7.36513410e+00, -1.38859731e+00],
       [-1.04166504e+01, -4.43253346e+00],
       [-6.41623854e+00, -8.04588481e+00],
       [-5.88919348e+00, -2.37049472e+00],
       [-1.42946517e+00,  5.16850105e+00],
       [-6.56118069e+00, -3.95967311e+00],
       [-1.47299851e+00,  4.81654152e+00],
       [-5.88100804e+00, -3.31692615e+00],
       [-1.04125594e+01, -3.50140251e+00],
       [-8.55209377e+00, -3.15841000e+00],
       [-7.90673749e-01,  5.15690151e+00],
       [-1.00754365e-01,  4.51589257e+00],
       [-1.30901393e+00,  3.09420646e+00],
       [-9.54755699e+00, -2.18801345e+00],
       [-5.32030011e+00, -2.99303869e+00],
       [-9.48229870e+00, -5.06821960e+00],
       [-6.74361627e+00, -8.87844303e+00],
       [-1.02518924e+01, -2.55350460e+00],
       [-1.96576392e+00,  5.23446451e+00],
       [-5.88036774e+00, -2.36326290e+00],
       [-7.34774574e+00, -8.41955499e+00],
       [-7.58703957e-01,  3.72276201e+00],
       [-8.41357863e+00, -6.85069257e+00],
       [-8.20576492e-01,  5.33759195e+00],
       [-7.93489041e+00, -7.78403764e+00],
       [-5.69446566e+00, -4.06205304e+00],
       [-8.57698874e-01,  4.45305717e+00],
       [ 1.50975008e-01,  3.10076295e+00],
       [-6.55394441e+00, -6.44256627e+00],
       [-1.09316272e+01, -4.48636887e+00],
       [-6.50155596e+00, -4.65329331e+00],
       [-6.93650519e+00, -6.39281292e+00],
       [-1.01336898e+01, -4.75061833e+00],
       [-9.89148978e+00, -5.47902886e+00],
       [-8.89871617e+00, -4.85498304e+00],
       [-8.11394993e+00, -7.83656921e+00],
       [-5.29078354e+00, -3.64846688e+00],
       [-1.41076074e+00,  4.10984872e+00],
       [-9.50537595e+00, -4.63402669e+00],
       [-7.82749456e+00, -2.51032104e+00],
       [-6.38088086e+00, -8.50663809e+00],
       [-8.96014913e+00, -8.06349899e+00],
       [-7.66603898e+00, -7.59715459e+00],
       [-6.46534407e+00, -2.85544633e+00]])

案例:图片矢量量化

# 导入库
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_sample_image
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances_argmin
from sklearn.utils import shuffle
china = load_sample_image("china.jpg")
plt.axis(False)
plt.imshow(china)
<matplotlib.image.AxesImage at 0x7fec1c379e10>


china.shape
(427, 640, 3)
china.dtype
dtype(\'uint8\')
china
array([[[174, 201, 231],
        [174, 201, 231],
        [174, 201, 231],
        ...,
        [250, 251, 255],
        [250, 251, 255],
        [250, 251, 255]],

       [[172, 199, 229],
        [173, 200, 230],
        [173, 200, 230],
        ...,
        [251, 252, 255],
        [251, 252, 255],
        [251, 252, 255]],

       [[174, 201, 231],
        [174, 201, 231],
        [174, 201, 231],
        ...,
        [252, 253, 255],
        [252, 253, 255],
        [252, 253, 255]],

       ...,

       [[ 88,  80,   7],
        [147, 138,  69],
        [122, 116,  38],
        ...,
        [ 39,  42,  33],
        [  8,  14,   2],
        [  6,  12,   0]],

       [[122, 112,  41],
        [129, 120,  53],
        [118, 112,  36],
        ...,
        [  9,  12,   3],
        [  9,  15,   3],
        [ 16,  24,   9]],

       [[116, 103,  35],
        [104,  93,  31],
        [108, 102,  28],
        ...,
        [ 43,  49,  39],
        [ 13,  21,   6],
        [ 15,  24,   7]]], dtype=uint8)
china[0][0]
array([174, 201, 231], dtype=uint8)
import pandas as pd

pd.DataFrame(china.reshape(427 * 640, 3)).drop_duplicates().shape
(96615, 3)
n_clusters = 64
china = np.array(china, dtype="float64") / china.max()
w, h, d = original_shape = tuple(china.shape)
w
427
h
640
d
3
assert d == 3, "d 必须为 3"
image_array = np.reshape(china, (427 * 640, 3))
image_array.shape
(273280, 3)
image_array_sample = shuffle(image_array, random_state=0)[:1000]
kmeans = KMeans(n_clusters=n_clusters, n_init="auto", random_state=0).fit(
    image_array_sample
)
# 质心的坐标
kmeans.cluster_centers_
array([[0.97323103, 0.97706735, 0.99369139],
       [0.32053664, 0.29638803, 0.25180599],
       [0.70375817, 0.7504902 , 0.74052288],
       [0.06169935, 0.06196078, 0.04235294],
       [0.50718954, 0.53594771, 0.40043573],
       [0.83529412, 0.86349206, 0.89505135],
       [0.40612745, 0.40612745, 0.22377451],
       [0.81568627, 0.53803922, 0.35529412],
       [0.22527233, 0.16034858, 0.13420479],
       [0.50028011, 0.54789916, 0.57478992],
       [0.73524384, 0.82021116, 0.91925591],
       [0.90313725, 0.90333333, 0.90607843],
       [0.26381462, 0.26773619, 0.1144385 ],
       [0.72268908, 0.36022409, 0.25210084],
       [0.38867102, 0.46230937, 0.42788671],
       [0.88687783, 0.91463047, 0.94932127],
       [0.97777778, 0.77254902, 0.60261438],
       [0.80999367, 0.82530044, 0.84845035],
       [0.61497326, 0.67593583, 0.71265597],
       [0.1120915 , 0.13888889, 0.13398693],
       [0.48714597, 0.49215686, 0.26143791],
       [0.33832442, 0.36684492, 0.31764706],
       [0.51372549, 0.33333333, 0.19529412],
       [0.8127451 , 0.89264706, 0.98071895],
       [0.14323063, 0.10718954, 0.07656396],
       [0.76068627, 0.85617647, 0.9604902 ],
       [0.45065359, 0.32581699, 0.28562092],
       [0.16127451, 0.24068627, 0.24215686],
       [0.33986928, 0.26339869, 0.09477124],
       [0.61699346, 0.59836601, 0.54052288],
       [0.20555556, 0.22287582, 0.08137255],
       [0.93776091, 0.9368754 , 0.9485136 ],
       [0.40392157, 0.16627451, 0.10156863],
       [0.89411765, 0.63764706, 0.43529412],
       [0.40606061, 0.44278075, 0.12121212],
       [0.225     , 0.07034314, 0.06446078],
       [0.28683473, 0.44593838, 0.43305322],
       [0.59176471, 0.55215686, 0.43137255],
       [0.5827451 , 0.55098039, 0.32078431],
       [0.20588235, 0.3379085 , 0.33202614],
       [0.83071895, 0.79150327, 0.7254902 ],
       [0.72679739, 0.56339869, 0.44575163],
       [0.03006536, 0.02538126, 0.01372549],
       [0.9       , 0.94498911, 0.99368192],
       [0.54980392, 0.44627451, 0.43294118],
       [0.74871795, 0.79140271, 0.79803922],
       [0.3025641 , 0.33182504, 0.18793363],
       [0.54836601, 0.63137255, 0.63529412],
       [0.69346405, 0.70653595, 0.64901961],
       [0.56339869, 0.40130719, 0.30718954],
       [0.93368192, 0.96104575, 0.99616558],
       [0.05784314, 0.17156863, 0.2127451 ],
       [0.11960784, 0.04191176, 0.0370098 ],
       [0.26039216, 0.23581699, 0.20156863],
       [0.52679739, 0.53431373, 0.49477124],
       [0.0799253 , 0.10644258, 0.054155  ],
       [0.71540616, 0.43473389, 0.32268908],
       [0.40627451, 0.40235294, 0.33960784],
       [0.33604827, 0.34690799, 0.12217195],
       [0.84684685, 0.91944886, 0.99194489],
       [0.46784314, 0.4372549 , 0.37607843],
       [0.16265173, 0.16190476, 0.12380952],
       [0.43071895, 0.24183007, 0.18627451],
       [0.31176471, 0.15392157, 0.13578431]])
# 质心的索引
label = kmeans.predict(image_array)
label
array([10, 10, 10, ..., 61,  3,  3], dtype=int32)
kmeans.cluster_centers_[1]
array([0.32053664, 0.29638803, 0.25180599])
image_kmeans = image_array.copy()
for i in range(w * h):
    image_kmeans[i] = kmeans.cluster_centers_[label[i]]
image_kmeans
array([[0.73524384, 0.82021116, 0.91925591],
       [0.73524384, 0.82021116, 0.91925591],
       [0.73524384, 0.82021116, 0.91925591],
       ...,
       [0.16265173, 0.16190476, 0.12380952],
       [0.06169935, 0.06196078, 0.04235294],
       [0.06169935, 0.06196078, 0.04235294]])
image_kmeans = image_kmeans.reshape(w, h, d)
image_kmeans.shape
(427, 640, 3)
# 随机取出64个质心
centroid_random = shuffle(image_array)[:n_clusters]
# 函数pairwise_distances_argmin(x1,x2,axis) #x1和x2分别是序列
# 用来计算x2中的每个样本到x1中的每个样本点的距离,并返回和x2相同形状的,x1中对应的最近的样本点的索引
labels_random = pairwise_distances_argmin(centroid_random, image_array, axis=0)
image_random = image_array.copy()
for i in range(w * h):
    image_random[i] = centroid_random[labels_random[i]]
image_random = image_random.reshape(w, h, d)
image_random.shape
(427, 640, 3)
labels_random
array([55, 55, 55, ..., 52, 60, 60])
plt.figure(figsize=(10, 10))
plt.axis("off")
plt.title("Original image (96,615 colors)")
plt.imshow(china)
plt.figure(figsize=(10, 10))
plt.axis("off")
plt.title("Quantized image (64 colors, K-Means)")
plt.imshow(image_kmeans)
plt.figure(figsize=(10, 10))
plt.axis("off")
plt.title("Quantized image (64 colors, Random)")
plt.imshow(image_random)
plt.show()


算法笔记:Kmeans聚类算法简介

1. Kmeans算法简介

Kmeans算是非常经典的一个聚类算法了,早已经被写到教科书里面了,不过很不幸的是,最近干活遇到了这个,然后我发现我已经忘得差不多一干二净了……

所以这里就过来挖个坟,考个古,把这玩意拉出来复习一下。

如前所述,Kmeans算法是一个聚类算法,具体来说,我们输入一个包含 N N N个点的点集,我们的目的是要将这 N N N个点分为 K K K个簇,使得每个点到各自的簇的中心距离之和最小。

用公式来表达的话就是:

s = ∑ i = 1 N m i n j ∈ 1 , . . . , K ( d ( x i , u j ) ) s = \\sum_i=1^N \\mathopmin\\limits_j \\in \\1, ..., K\\(d(x_i, u_j)) s=i=1Nj1,...,Kmin(d(xi,uj))

要找到一组 u j u_j uj使得 s s s最大。

其中, d ( x , y ) d(x, y) d(x,y)表示 x , y x,y x,y两点间的距离,一般我们在这里使用欧氏距离。

2. Kmeans算法细节

Kmeans算法的核心思路是迭代。

首先,我们随机从 N N N个点当中选出 K K K个点作为簇的中心点。

然后,根据全部的 N N N个点到这 K K K个中心点之间的距离,我们就可以将这全部的 N N N个点进行分类,分配到这 K K K个簇当中。

而后,我们更新这 K K K个簇的中心,具体来说,我们取这 K K K个点的均值点作为这 K K K个簇的新的中心。

我们不断地重复上述两个步骤,直到达到迭代上限或者簇的中心点不再发生变化即可。

具体的,我们可以给出上述Kmeans算法的算法整理如下:

  1. step 1: 从 N N N个给定点当中随机 K K K个点作为 K K K个簇的中心点;
  2. step 2: 计算每一个点到这 K K K个簇的中心点之间的欧式距离,将其分配到最小的那个簇当中,从而对所有的点进行聚类;
  3. step 3: 对于2中得到的每一个簇,更新其中心点为所有点的均值,即 u = ∑ i x i n \\boldu = \\frac\\sum_i \\boldx_in u=nixi
  4. step 4: 重复上述2-3两步,直到迭代次数达到上限或者簇的中心不再发生变化。

而Kmeans的算法的优缺点因此也就比较明显:

  1. 优点
    • 易实现,易debug
  2. 缺点
    • 迭代非常耗时,对于大数据量尤其明显;
    • 较依赖于初始化中心的选择,不同初始化中心点的选择会带来较大的结果差异;

3. Kmeans算法收敛性证明

现在,给出了kmeans聚类算法之后,我们来考察一下kmeans算法的收敛性,也就是说,为什么kmeans算法的迭代是有效的。

我们使用原始的kmeans算法进行说明,即是说,使用欧式距离来对两点间的距离进行描述,此时,前述提到的loss函数就可以表达为:

s = ∑ i = 1 N m i n j ∈ 1 , . . . , K ∣ ∣ x i , u j ∣ ∣ s = \\sum_i=1^N \\mathopmin\\limits_j \\in \\1, ..., K\\ ||x_i, u_j|| s=i=1Nj1,...,Kmin∣∣xi,uj∣∣

具体到第 k k k次迭代上,即有:

s k = ∑ i = 1 N m i n j ∣ ∣ x i , u j k ∣ ∣ s^k = \\sum_i=1^N \\mathopmin\\limits_j ||x_i, u_j^k|| sk=i=1Njmin∣∣xi,ujk∣∣

显然, s k s^k sk是一个大于0的数列,因此,我们只需要证明 s k s^k sk递减,那么数列 s k s^k sk必然收敛。

因此,我们只需要证明 s k + 1 ≤ s k s^k+1 \\leq s^k sk+1sk即可。

我们考察第 k k k次迭代,它分为两步:

  1. 对于上一次分类完成的簇,更新簇的中心从 u k u^k uk u k + 1 u^k+1 uk+1
    s k + 1 ′ = ∑ i = 1 N ∣ ∣ x i , u j k + 1 ∣ ∣ s^k+1' = \\sum_i=1^N ||x_i, u_j^k+1|| sk+1=i=1N∣∣xi,ujk+1∣∣
  2. 使用新的簇中心 u k + 1 u^k+1 uk+1对所有的点进行更新;
    s k + 1 = ∑ i = 1 N m i n j ∣ ∣ x i , u j k + 1 ∣ ∣ s^k+1 = \\sum_i=1^N \\mathopmin\\limits_j ||x_i, u_j^k+1|| sk+1=i=1Njmin∣∣xi,ujk+1∣∣

其中,对于步骤二,显然有 s k + 1 ≤ s k + 1 ′ s^k+1 \\leq s^k+1' sk+1sk+1。因此,我们只要说明步骤一当中的聚类中心变换之后获得的新的 s k + 1 ′ s^k+1' sk+1小于等于 s k s^k sk即可。

而在这步骤一当中,由于簇的成员都没有发生改变,因此,我们要证明的问题也就是: