发现样本数量不一致的输入变量:[4, 1] [关闭]

Posted

技术标签:

【中文标题】发现样本数量不一致的输入变量:[4, 1] [关闭]【英文标题】:Found input variables with inconsistent numbers of samples: [4, 1] [closed] 【发布时间】:2021-09-14 17:54:45 【问题描述】:

这就是我所做的。代码在下面。我有 music.csv 数据集。 错误是发现样本数量不一致的输入变量:[4, 1]。错误详情在代码后面。

# importing Data 
import pandas as pd

music_data = pd.read_csv('music.csv')
music_data
# split into training and testing- nothing to clean
# genre = predictions
# Inputs are age and gender and output is genre
# method=drop
X = music_data.drop(columns=['genre'])  # has everything but genre
# X= INPUT
Y = music_data['genre']  # only genre
# Y=OUTPUT
# now select algorithm
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()  # model
model.fit(X, Y)
prediction = model.predict([[21, 1]])
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)  # 20% of date=testing
# first two input other output
model.fit(X_train, y_train)
from sklearn.metrics import accuracy_score

score = accuracy_score(y_test, predictions)

然后这个错误来了。这个错误是值错误

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_28312/3992581865.py in <module>
      5 model.fit(X_train, y_train)
      6 from sklearn.metrics import accuracy_score
----> 7 score = accuracy_score(y_test, predictions)

c:\users\shrey\appdata\local\programs\python\python39\lib\site- 
packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
 61             extra_args = len(args) - len(all_args)
 62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
 64 
 65             # extra_args > 0

c:\users\shrey\appdata\local\programs\python\python39\lib\site-        
packages\sklearn\metrics\_classification.py in accuracy_score(y_true, y_pred, normalize, 
sample_weight)
200 
201     # Compute accuracy for each possible representation
--> 202     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
203     check_consistent_length(y_true, y_pred, sample_weight)
204     if y_type.startswith('multilabel'):

c:\users\shrey\appdata\local\programs\python\python39\lib\site- 
packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
 81     y_pred : array or indicator matrix
 82     """
 ---> 83     check_consistent_length(y_true, y_pred)
 84     type_true = type_of_target(y_true)
 85     type_pred = type_of_target(y_pred)

 c:\users\shrey\appdata\local\programs\python\python39\lib\site- 
packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
317     uniques = np.unique(lengths)
318     if len(uniques) > 1:
--> 319         raise ValueError("Found input variables with inconsistent numbers of"
320                          " samples: %r" % [int(l) for l in lengths])
321 

 ValueError: Found input variables with inconsistent numbers of samples: [4, 1]

请帮助我。我不知道发生了什么,但我认为这与这个 score = accuracy_score(y_test, predictions) 有关。

【问题讨论】:

【参考方案1】:

您需要在训练测试拆分后更改您的预测变量

prediction = model.predict(X_test)
```

【讨论】:

【参考方案2】:

在拆分后的测试数据中,你有四个条目(行),这意味着y_test的长度为4。

在尝试预测 [21, 1] 时,您基本上只是在预测一行。因此,预测的长度为 1。

这就是为什么您会收到不一致数量的样本错误。

你可以通过

    预测 X_test

    prediction = model.predict(X_test) 
    

    如果您想对新数据进行预测,则必须将目标(y_test)和输入特征(X_test)分开 然后做出预测 例如。如果 [21,1] 的目标是 [2]

    prediction = model.predict([[21,1]])
    y_test = [2] ## note this depends on what the corresponding target label is
    score = accuracy_score(y_test,prediction)
    

【讨论】:

感谢您的帮助

以上是关于发现样本数量不一致的输入变量:[4, 1] [关闭]的主要内容,如果未能解决你的问题,请参考以下文章

sklearn:发现样本数量不一致的输入变量:[1, 99]

Sklearn:ValueError:发现样本数量不一致的输入变量:[1, 6]

ValueError:发现样本数量不一致的输入变量:[1, 74]

GridseachCV - ValueError:发现样本数量不一致的输入变量:[33 1]

ValueError:发现样本数量不一致的输入变量:[143, 426]

ValueError:发现样本数量不一致的输入变量:[100, 300]