ValueError:在Python中创建KMeans模型时x和y的大小必须相同
Posted
技术标签:
【中文标题】ValueError:在Python中创建KMeans模型时x和y的大小必须相同【英文标题】:ValueError: x and y must be the same size In Python while creating KMeans Model 【发布时间】:2021-06-19 10:51:08 【问题描述】:我正在使用流失数据集构建 Kmeans 聚类模型,并收到一条错误消息,提示 ValueError:尝试创建聚类图时 x 和 y 的大小必须相同。
我将在几秒钟内将我的函数和图形代码都发布在这里,但在试图缩小范围时,我认为它可能与函数中的这行代码有关:
x=kmeans.cluster_centers_[:,0]
, y=kmeans.cluster_centers_[:,1]
这是完整的代码
def Create_kmeans_cluster_graph(df_final, data, n_clusters, x_title, y_title, chart_title):
""" Display K-means cluster based on data """
kmeans = KMeans(n_clusters=n_clusters # No of cluster in data
, random_state = random_state # Selecting same training data
)
kmeans.fit(data)
kmean_colors = [plotColor[c] for c in kmeans.labels_]
fig = plt.figure(figsize=(12,8))
plt.scatter(x= x_title + '_norm'
, y= y_title + '_norm'
, data=data
, color=kmean_colors # color of data points
, alpha=0.25 # transparancy of data points
)
plt.xlabel(x_title)
plt.ylabel(y_title)
plt.scatter(x=kmeans.cluster_centers_[:,0]
, y=kmeans.cluster_centers_[:,1]
, color='black'
, marker='X' # Marker sign for data points
, s=100 # marker size
)
plt.title(chart_title,fontsize=15)
plt.show()
return kmeans.fit_predict(df_final[df_final.Churn==1][[x_title+'_norm', y_title +'_norm']])
//Graph
df_final['Cluster'] = -1 # by default set Cluster to -1
df_final.iloc[(df_final.Churn==1),'Cluster'] = Create_kmeans_cluster_graph(df_final
,df_final[df_final.Churn==1][['Tenure_norm','MonthlyCharge_norm']]
,3
,'Tenure'
,'MonthlyCharges'
,"Tenure vs Monthlycharges : Churn customer cluster")
df_final['Cluster'].unique()
【问题讨论】:
【参考方案1】:由于这一行,您会收到该错误:
plt.scatter(x= x_title + '_norm'
, y= y_title + '_norm'
, data=data
, color=kmean_colors # color of data points
, alpha=0.25 # transparancy of data points
)
如果你使用plt.scatter
,它不接受data=
作为参数,你可以阅读the help page。你可以这样做:
plt.scatter(data[x_title + '_norm'],data[y_title + '_norm'],...)
或者您在 pandas 数据框上使用 plot.scatter method,我在您的函数的编辑版本中这样做了:
def Create_kmeans_cluster_graph(df_final, data, n_clusters, x_title, y_title, chart_title):
plotColor = ['k','g','b']
kmeans = KMeans(n_clusters=n_clusters , random_state = random_state)
kmeans.fit(data)
kmean_colors = [plotColor[c] for c in kmeans.labels_]
data.plot.scatter(x= x_title + '_norm', y= y_title + '_norm',
color=kmean_colors,alpha=0.25)
plt.xlabel(x_title)
plt.ylabel(y_title)
plt.scatter(x=kmeans.cluster_centers_[:,0],y=kmeans.cluster_centers_[:,1],
color='black',marker='X',s=100)
return kmeans.labels_
在示例数据集上,它可以工作:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
random_state = 42
np.random.seed(42)
df_final = pd.DataFrame('Tenure_norm':np.random.uniform(0,1,50),
'MonthlyCharge_norm':np.random.uniform(0,1,50),
'Churn':np.random.randint(0,3,50))
Create_kmeans_cluster_graph(df_final
,df_final[df_final.Churn==1][['Tenure_norm','MonthlyCharge_norm']]
,3
,'Tenure'
,'MonthlyCharge'
,"Tenure vs Monthlycharges : Churn customer cluster")
【讨论】:
以上是关于ValueError:在Python中创建KMeans模型时x和y的大小必须相同的主要内容,如果未能解决你的问题,请参考以下文章