ValueError：未知标签类型：同时使用聚类+分类模型时为“连续”

Posted 2023-03-12

技术标签:

【中文标题】ValueError：未知标签类型：同时使用聚类+分类模型时为“连续”【英文标题】：ValueError: Unknown label type: 'continuous' when using clustering + classification models together 【发布时间】：2020-07-17 19:35:07 【问题描述】：

我创建了一个聚类模型，尝试使用 Scikit-Learn 的 KMeans 算法根据年收入和支出得分找到不同的客户群体。使用它为每个客户返回的集群值，我尝试使用来自 sklearn.svm 的支持向量分类来创建分类模型。但是，当我尝试将新模型拟合到数据集时，我收到一条错误消息：

File "/Users/user/Documents/Machine Learning A-Z Template Folder/Part 4 - Clustering/Section 24 - K-Means Clustering/cluster_and_prediction.py", line 28, in <module>
    classifier.fit(x_train, y_train)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/svm/_base.py", line 149, in fit
    y = self._validate_targets(y)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/svm/_base.py", line 525, in _validate_targets
    check_classification_targets(y)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/multiclass.py", line 169, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'

我的代码如下

import pandas as pd 
import numpy as np 

# Using relevant columns from dataset
dataset = pd.read_csv('Mall_Customers.csv')
x = dataset.iloc[:, 3:5].values

# Creating model with ideal amount of clusters
kmeans = KMeans(n_clusters=5, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(x)

predictions = kmeans.predict(x)

# Creating numpy array for feature scaling
predictions = np.array(predictions, dtype=int)
predictions = predictions[:, None]

from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
x = sc_x.fit_transform(x)
predictions = sc_y.fit_transform(predictions)

# Splitting dataset into training and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, predictions, test_size=.25)

# Creating Support Vector Classification model
from sklearn.svm import SVC
classifier = SVC(kernel='rbf')
classifier.fit(x_train, y_train)

Elbow Model Used for Clustering

Clustering Visualization

.zip file with the dataset(the dataset is called 'Mall_Customers.csv'

我该如何解决这个问题？

【问题讨论】：

究竟在哪里弹出错误？请更新您的帖子以包含完整的错误跟踪（作为文本，而不是图像）。相反 - 看起来您正在尝试使用 SVC 模型预测连续数据，该模型仅接受分类数据。 class labels in classification, real numbers in regression 我创建的聚类模型将客户端分为 5 个不同的组，组的值成为分类模型的 y 变量（如“聚类可视化”图中所示。不会' t 算作分类数据吗？（对不起初学者机器学习程序员）另外，svm 是二进制分类器吗？错误之后的代码不应包含在此处（它永远不会执行，因此无关紧要）； 与问题无关的情节等也是如此 - 所有这些只会造成不必要的混乱（已编辑）。现在，请发布您的y_train 和kmeans.predict(x) 的样本。 ...并从混乱中清理代码帮助我解决了问题（请参阅下面的答案） 【参考方案1】：

由于您想将其作为 5 个类别的分类问题来解决，您应该不对标签使用缩放器；这会将它们转换为分类模型中输入的连续变量，从而产生错误。

同样，与问题无关，但正确的方法是仅在训练数据上拟合定标器，然后使用此拟合定标器转换您的测试数据。

因此，以下是必要的更改（在您完成设置 predictions 变量之后）：

# initial (unscaled) x used here:
x_train, x_test, y_train, y_test = train_test_split(x, predictions, test_size=.25)
sc = StandardScaler()
x_train_scaled = sc.fit_transform(x_train)
x_test_scaled = sc.transform(x_test)

classifier = SVC(kernel='rbf')
classifier.fit(x_train_scaled, y_train) # no scaling for predictions or y_train

也与该问题无关，但您应该使用 k-means 在扩展您的 x 数据，即您实际上应该首先扩展您的 x，然后执行您的集群（将其保留为一个练习，因为它与错误无关）。

【讨论】：

以上是关于ValueError：未知标签类型：同时使用聚类+分类模型时为“连续”的主要内容，如果未能解决你的问题，请参考以下文章

MLP 分类器：“ValueError：未知标签类型”

ValueError：未知标签类型：“未知”-标签是数字

ValueError：未知标签类型：使用 cross_validation 时

将 PermutationImportance 与 LGBMClassifier 一起使用会导致 ValueError：未知标签类型：“连续”

ValueError：未知标签类型

ValueError：未知标签类型：拟合数据时的“连续多输出”