如何使用 Matplotlib 从多特征 kmeans 模型中绘制集群和中心?

Posted

技术标签:

【中文标题】如何使用 Matplotlib 从多特征 kmeans 模型中绘制集群和中心?【英文标题】:How to plot clusters and centers from a multi-feature kmeans model, with Matplotlib? 【发布时间】:2020-12-07 20:06:43 【问题描述】:

我使用kmeans 算法来确定我的数据集中的集群数量。在下面的代码中,您可以看到我有多个特征,有些是分类的,有些不是。我对它们进行了编码和缩放,得到了我的最佳集群数量。

您可以从这里下载数据: https://www.sendspace.com/file/1cnbji

import sklearn.metrics as sm

from sklearn.preprocessing import scale

from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import StandardScaler, MinMaxScaler

from sklearn.cluster import KMeans, SpectralClustering, MiniBatchKMeans
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

import matplotlib.pyplot as plt

import pandas as pd



df = pd.read_csv('dataset.csv')
print(df.columns)

features = df[['parcela', 'bruto', 'neto',
               'osnova', 'sipovi', 'nadzemno',
               'podzemno', 'tavanica', 'fasada']]

trans = ColumnTransformer(transformers=[('onehot', OneHotEncoder(), ['tavanica', 'fasada']),
                                        ('StandardScaler', Normalizer(), ['parcela', 'bruto', 'neto', 'osnova', 'nadzemno', 'podzemno', 'sipovi'])],
                          remainder='passthrough') # Default is to drop untransformed columns

features = trans.fit_transform(features)

Sum_of_squared_distances = []
for i in range(1,19):

     kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 0)
     kmeans.fit(features)
     Sum_of_squared_distances.append(kmeans.inertia_)


plt.plot(range(1,19), Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()

在图表上,肘部方法显示我的最佳聚类数为 7。 如何绘制这 7 个集群? 我想在图表上查看质心,并使用 7 种不同颜色的聚类散点图。

【问题讨论】:

【参考方案1】: 给定Plot: kmeans clustering centroid,其中centers 是一维。 centers 数组具有 (3, 2) 形状,x(3, 1)y(3, 1)。 针对这一一维中心演示的方法已适用于为该问题的模型生成的七维中心生成解决方案。 本题模型返回的centers有7个维度,形状为(7, 14),其中14是7组xy值。 此解决方案回答了问题,如何绘制聚类和中心? 它不提供对模型结果的 cmets 或解释,这需要在 SE: Cross Validated 或 SE: Data Science 中提出不同的问题。
# uses the imports as shown in the question
from matplotlib.patches import Rectangle, Patch  # for creating a legend
from matplotlib.lines import Line2D

# beginning with 
features = trans.fit_transform(features)

# create the model and fit it to features
kmeans_model2 = KMeans(n_clusters=7, init='k-means++', random_state=0).fit(features)

# find the centers; there are 7
centers = np.array(kmeans_model2.cluster_centers_)

# unique markers for the labels
markers = ['o', 'v', 's', '*', 'p', 'd', 'h']

# get the model labels
labels = kmeans_model2.labels_
labels_unique = set(labels)

# unique colors for each label
colors = sns.color_palette('husl', n_colors=len(labels_unique))

# color map with labels and colors
cmap = dict(zip(labels_unique, colors))

# plot
# iterate through each group of 2 centers
for j in range(0, len(centers)*2, 2):
    plt.figure(figsize=(6, 6))
    
    x_features = features[:, j]
    y_features = features[:, j+1]
    x_centers = centers[:, j]
    y_centers = centers[:, j+1]
    
    # add the data for each label to the plot
    for i, l in enumerate(labels):
#         print(f'Label: l')  # uncomment as needed
#         print(f'feature x coordinates for label:\nx_features[i]')  # uncomment as needed
#         print(f'feature y coordinates for label:\ny_features[i]')  # uncomment as needed
        plt.plot(x_features[i], y_features[i], color=colors[l], marker=markers[l], alpha=0.5)

    # print values for given plot, rounded for easier interpretation; all 4 can be commented out
    print(f'feature labels:\nlist(labels)')
    print(f'x_features:\nlist(map(lambda x: round(x, 3), x_features))')
    print(f'y_features:\nlist(map(lambda x: round(x, 3), y_features))')
    print(f'x_centers:\nlist(map(lambda x: round(x, 3), x_centers))')
    print(f'y_centers:\nlist(map(lambda x: round(x, 3), y_centers))')
    
    # add the centers
    # this loop is to color the center marker to correspond to the color of the corresponding label.
    for k in range(len(centers)):  
        plt.scatter(x_centers[k], y_centers[k], marker="X", color=colors[k])
    
    # title
    plt.title(f'Features: Dimension int(j/2)')
    
    # create the rectangles for the legend
    patches = [Patch(color=v, label=k) for k, v in cmap.items()]
    # create centers marker for the legend
    black_x = Line2D([], [], color='k', marker='X', linestyle='None', label='centers', markersize=10)
    # add the legend
    plt.legend(title='Labels', handles=patches + [black_x], bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0, fontsize=15)
    
    plt.show()

绘图输出

许多绘制的特征具有重叠的值和中心。 featurescentersxy 值已打印出来,以便更轻松地查看重叠并确认绘制的值。 当不再需要时,可以将负责的 print 行注释掉或删除。

功能 0

feature labels:
[6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
x_features:
[0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0]
y_features:
[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]
x_centers:
[1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0]
y_centers:
[0.0, 0.0, 1.0, 0.0, -0.0, -0.0, 1.0]

功能 1

feature labels:
[6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
x_features:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]
y_features:
[1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
x_centers:
[1.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0]
y_centers:
[0.0, 1.0, 0.0, -0.0, 0.0, 0.0, 1.0]

功能 2

feature labels:
[6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
x_features:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]
y_features:
[0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
x_centers:
[0.0, -0.0, 0.125, 1.0, 0.0, 0.0, 0.0]
y_centers:
[0.0, -0.0, 0.0, 0.0, 0.0, 1.0, 0.0]

功能 3

feature labels:
[6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
x_features:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0]
y_features:
[0.298, 0.193, 0.18, 0.336, 0.181, 0.174, 0.197, 0.23, 0.175, 0.212, 0.196, 0.186, 0.2, 0.15, 0.141, 0.304, 0.108, 0.101, 0.304, 0.105, 0.459, 0.18, 0.16, 0.224, 0.216, 0.246, 0.139, 0.111, 0.227, 0.177, 0.159, 0.25, 0.298, 0.223, 0.335, 0.431, 0.17, 0.381, 0.255, 0.222, 0.296, 0.156, 0.202, 0.145, 0.195, 0.15, 0.141, 0.18, 0.336, 0.175, 0.212, 0.196, 0.186, 0.2, 0.15, 0.141, 0.177, 0.177, 0.177, 0.177, 0.177, 0.177, 0.224, 0.224, 0.18, 0.16, 0.222, 0.202, 0.18, 0.336]
x_centers:
[0.0, -0.0, 0.875, -0.0, 1.0, 0.0, 0.0]
y_centers:
[0.196, 0.188, 0.249, 0.196, 0.237, 0.182, 0.328]

功能 4

feature labels:
[6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
x_features:
[0.712, 0.741, 0.763, 0.704, 0.749, 0.741, 0.754, 0.735, 0.744, 0.738, 0.743, 0.747, 0.758, 0.759, 0.749, 0.714, 0.766, 0.748, 0.728, 0.755, 0.681, 0.752, 0.762, 0.734, 0.721, 0.747, 0.749, 0.756, 0.737, 0.748, 0.742, 0.724, 0.712, 0.733, 0.73, 0.688, 0.722, 0.705, 0.777, 0.749, 0.733, 0.744, 0.733, 0.764, 0.739, 0.76, 0.749, 0.763, 0.704, 0.744, 0.738, 0.743, 0.747, 0.758, 0.759, 0.749, 0.748, 0.748, 0.748, 0.748, 0.748, 0.748, 0.734, 0.734, 0.752, 0.762, 0.749, 0.733, 0.763, 0.704]
y_features:
[0.614, 0.636, 0.612, 0.601, 0.631, 0.64, 0.62, 0.624, 0.636, 0.633, 0.632, 0.63, 0.61, 0.629, 0.641, 0.616, 0.629, 0.65, 0.601, 0.644, 0.539, 0.628, 0.623, 0.627, 0.65, 0.603, 0.641, 0.641, 0.616, 0.632, 0.648, 0.631, 0.614, 0.624, 0.58, 0.562, 0.666, 0.587, 0.565, 0.616, 0.591, 0.646, 0.642, 0.625, 0.631, 0.629, 0.641, 0.612, 0.601, 0.636, 0.633, 0.632, 0.63, 0.61, 0.629, 0.641, 0.632, 0.632, 0.632, 0.632, 0.632, 0.632, 0.627, 0.627, 0.628, 0.623, 0.616, 0.642, 0.612, 0.601]
x_centers:
[0.745, 0.747, 0.73, 0.741, 0.735, 0.752, 0.708]
y_centers:
[0.63, 0.625, 0.611, 0.632, 0.62, 0.625, 0.604]

功能 5

feature labels:
[6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
x_features:
[0.164, 0.096, 0.103, 0.171, 0.091, 0.106, 0.094, 0.132, 0.105, 0.098, 0.102, 0.101, 0.115, 0.079, 0.095, 0.135, 0.075, 0.088, 0.126, 0.063, 0.186, 0.088, 0.075, 0.134, 0.107, 0.134, 0.09, 0.072, 0.16, 0.097, 0.073, 0.123, 0.165, 0.154, 0.133, 0.158, 0.084, 0.11, 0.105, 0.1, 0.164, 0.075, 0.1, 0.075, 0.135, 0.069, 0.095, 0.103, 0.171, 0.105, 0.098, 0.102, 0.101, 0.115, 0.079, 0.095, 0.097, 0.097, 0.097, 0.097, 0.097, 0.097, 0.134, 0.134, 0.088, 0.075, 0.1, 0.1, 0.103, 0.171]
y_features:
[0.001, 0.002, 0.001, 0.001, 0.001, 0.002, 0.002, 0.001, 0.001, 0.001, 0.001, 0.005, 0.002, 0.001, 0.002, 0.001, 0.002, 0.001, 0.001, 0.002, 0.0, 0.001, 0.001, 0.002, 0.0, 0.001, 0.001, 0.002, 0.002, 0.002, 0.0, 0.001, 0.001, 0.001, 0.004, 0.004, 0.001, 0.002, 0.001, 0.001, 0.002, 0.0, 0.001, 0.001, 0.001, 0.001, 0.0, 0.001, 0.001, 0.001, 0.0, 0.0, 0.003, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.0, 0.002, 0.001, 0.001, 0.0, 0.001, 0.001, 0.002, 0.002, 0.002, 0.001]
x_centers:
[0.093, 0.1, 0.116, 0.112, 0.125, 0.101, 0.152]
y_centers:
[0.001, 0.001, 0.002, 0.001, 0.001, 0.002, 0.001]

功能 6

feature labels:
[6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
x_features:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.002, 0.0, 0.0, 0.001, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
y_features:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
x_centers:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0]
y_centers:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

在一个绘图上更新所有维度

根据 OP 的要求
# plot
plt.figure(figsize=(16, 8))
for j in range(0, len(centers)*2, 2):
    
    x_features = features[:, j]
    y_features = features[:, j+1]
    x_centers = centers[:, j]
    y_centers = centers[:, j+1]
    
    # add the data for each label to the plot
    for i, l in enumerate(labels):
        plt.plot(x_features[i], y_features[i], marker=markers[int(j/2)], color=colors[int(j/2)], alpha=0.5)

    # add the centers
    for k in range(len(centers)):  
        plt.scatter(x_centers[k], y_centers[k], marker="X", color=colors[int(j/2)])

# create the rectangles for the legend
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# create centers marker for the legend
black_x = Line2D([], [], color='k', marker='X', linestyle='None', label='centers', markersize=10)
# add the legend
plt.legend(title='Labels', handles=patches + [black_x], bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0, fontsize=15)
    
plt.show()
如个别地块所述,有很多重叠。

【讨论】:

以上是关于如何使用 Matplotlib 从多特征 kmeans 模型中绘制集群和中心?的主要内容,如果未能解决你的问题,请参考以下文章

LSTM 多特征、多类、多输出

如何为 RNN 或 LSTM 使用多输入或多特征

基于pytorch搭建多特征LSTM时间序列预测代码详细解读(附完整代码)

多特征数据预处理的一种尝试

最小二乘法--多特征(矩阵形式)

吴恩达-机器学习+多特征正规方程梯度下降