如何在此 python 代码中获取集群图? ValueError:x 和 y 必须具有相同的第一维

Posted

技术标签:

【中文标题】如何在此 python 代码中获取集群图? ValueError:x 和 y 必须具有相同的第一维【英文标题】:How to get the plot of clusters in this python code? ValueError: x and y must have same first dimension 【发布时间】:2016-11-08 14:52:23 【问题描述】:
Error:
Traceback (most recent call last):
 File "/Users/ankitchaudhari/PycharmProjects/Learn/datascience/gg.py", line 33, in <module> plt.plot(a, k)
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/pyplot.py", line 3154, in plot
ret = ax.plot(*args, **kwargs)
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/__init__.py", line 1812, in inner
return func(ax, *args, **kwargs)
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/axes/_axes.py", line 1424, in plot
for line in self._get_lines(*args, **kwargs):
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/axes/_base.py", line 386, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/axes/_base.py", line 364, in _plot_args
x, y = self._xy_from_xy(x, y)
 File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/axes/_base.py", line 223, in _xy_from_xy
raise ValueError("x and y must have same first dimension")
ValueError: x and y must have same first dimension

如何在这个 python 代码中获得集群图?

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans

data = np.array([[1, 2],
                 [5, 8],
             [1.5, 1.8],
                 [8, 8],
                [9, 11],
               [1, 0.6],
                [2, 2]])

k = np.array([2,3,4,5,6,7])
df = pd.DataFrame(data)
df


def kmeans(data, k):
    labels = KMeans(n_clusters=k).fit_predict(data)
    return labels

sse = 0
for i in k:
    label = kmeans(data, i)
    cluster_mean = df.mean()
    d = np.zeros([], dtype=float)

    for j in range(len(label)):
        sse += sum(pow((data[j]) - cluster_mean, 2))
        a = np.append(d, sse)

plt.scatter(a, k)
plt.show()

生成的图未显示集群的所有点。 a 和 k 的值不相等,将它们绘制成曲线正在成为一个问题。有人可以帮帮我吗?

谢谢。

【问题讨论】:

显示了哪些点?哪些被省略了? 【参考方案1】:

你的缩进被破坏了

sse = 0
for i in k:
  label = kmeans(data, i)
  cluster_mean = df.mean()
  d = np.zeros([], dtype=float)
# for i in k has finished here
# label, cluster_mean and d frozen in their last state
for j in range(len(label)):
  sse += sum(pow((data[j]) - cluster_mean, 2))
  a = np.append(d, sse)

基本上,当计算ssea 时,仅对k 中的最后一个i 执行此操作。你开始j循环i循环:

sse = 0
for i in k:
  label = kmeans(data, i)
  cluster_mean = df.mean()
  d = np.zeros([], dtype=float)
  # same indentation as loop body!
  for j in range(len(label)):
    sse += sum(pow((data[j]) - cluster_mean, 2))
    a = np.append(d, sse)

【讨论】:

以上是关于如何在此 python 代码中获取集群图? ValueError:x 和 y 必须具有相同的第一维的主要内容,如果未能解决你的问题,请参考以下文章

python 如何在不知道的情况下获取相对文件路径。在此示例中,Python代码,HTML文件。

如何使用 Python 创建一个 NBA 得分图

如何在此代码中正确使用 oracle EXECUTE IMMEDIATE

如何在 Python 中获取 JSON 对象(Flask 框架)

python测试开发django-186.使用 jquery 的 .val() 无法获取input框的输入值(已解决)

如何在此查询中也包含当前提供的日期信息