如何在此 python 代码中获取集群图？ ValueError：x 和 y 必须具有相同的第一维

Posted 2023-03-12

技术标签:

【中文标题】如何在此 python 代码中获取集群图？ ValueError：x 和 y 必须具有相同的第一维【英文标题】：How to get the plot of clusters in this python code? ValueError: x and y must have same first dimension 【发布时间】：2016-11-08 14:52:23 【问题描述】：

Error:
Traceback (most recent call last):
 File "/Users/ankitchaudhari/PycharmProjects/Learn/datascience/gg.py", line 33, in <module> plt.plot(a, k)
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/pyplot.py", line 3154, in plot
ret = ax.plot(*args, **kwargs)
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/__init__.py", line 1812, in inner
return func(ax, *args, **kwargs)
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/axes/_axes.py", line 1424, in plot
for line in self._get_lines(*args, **kwargs):
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/axes/_base.py", line 386, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/axes/_base.py", line 364, in _plot_args
x, y = self._xy_from_xy(x, y)
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/axes/_base.py", line 223, in _xy_from_xy
raise ValueError("x and y must have same first dimension")
ValueError: x and y must have same first dimension

如何在这个 python 代码中获得集群图？

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans

data = np.array([[1, 2],
                 [5, 8],
             [1.5, 1.8],
                 [8, 8],
                [9, 11],
               [1, 0.6],
                [2, 2]])

k = np.array([2,3,4,5,6,7])
df = pd.DataFrame(data)
df


def kmeans(data, k):
    labels = KMeans(n_clusters=k).fit_predict(data)
    return labels

sse = 0
for i in k:
    label = kmeans(data, i)
    cluster_mean = df.mean()
    d = np.zeros([], dtype=float)

    for j in range(len(label)):
        sse += sum(pow((data[j]) - cluster_mean, 2))
        a = np.append(d, sse)

plt.scatter(a, k)
plt.show()

生成的图未显示集群的所有点。 a 和 k 的值不相等，将它们绘制成曲线正在成为一个问题。有人可以帮帮我吗？

谢谢。

【问题讨论】：

显示了哪些点？哪些被省略了？ 【参考方案1】：

你的缩进被破坏了

sse = 0
for i in k:
  label = kmeans(data, i)
  cluster_mean = df.mean()
  d = np.zeros([], dtype=float)
# for i in k has finished here
# label, cluster_mean and d frozen in their last state
for j in range(len(label)):
  sse += sum(pow((data[j]) - cluster_mean, 2))
  a = np.append(d, sse)

基本上，当计算sse 和a 时，仅对k 中的最后一个i 执行此操作。你开始j循环在i循环：

sse = 0
for i in k:
  label = kmeans(data, i)
  cluster_mean = df.mean()
  d = np.zeros([], dtype=float)
  # same indentation as loop body!
  for j in range(len(label)):
    sse += sum(pow((data[j]) - cluster_mean, 2))
    a = np.append(d, sse)

【讨论】：

以上是关于如何在此 python 代码中获取集群图？ ValueError：x 和 y 必须具有相同的第一维的主要内容，如果未能解决你的问题，请参考以下文章