尝试实现逻辑回归,但 gridsearchCV 显示输入变量的样本数量不一致:[60000, 60001]
Posted
技术标签:
【中文标题】尝试实现逻辑回归,但 gridsearchCV 显示输入变量的样本数量不一致:[60000, 60001]【英文标题】:Trying to implement logistic regression but gridsearchCV shows input variables with inconsistent numbers of samples: [60000, 60001] 【发布时间】:2020-11-14 02:40:12 【问题描述】:这是我在 python 3 环境中的代码:
import joblib
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
tr_features = pd.read_csv('/home/pranjal/PycharmProjects/train_features.csv')
tr_labels = pd.read_csv('/home/pranjal/PycharmProjects/train_labels.csv', header=None)
lr = LogisticRegression()
parameters =
'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]
cv = GridSearchCV(lr, parameters, cv=5)
cv.fit(tr_features, tr_labels.values.ravel())
print_results(cv)
输出运行时错误如下:
ValueError Traceback (most recent call last)
<ipython-input-20-c836a092d0ab> in <module>
5
6 cv = GridSearchCV(lr, parameters, cv=5)
----> 7 cv.fit(tr_features, tr_labels.values.ravel())
8
9 print_results(cv)
/home/pranjal/snap/jupyter/common/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update(k: arg for k, arg in zip(sig.parameters, args))
---> 73 return f(**kwargs)
74 return inner_f
75
/home/pranjal/snap/jupyter/common/lib/python3.7/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
674 refit_metric = 'score'
675
--> 676 X, y, groups = indexable(X, y, groups)
677 fit_params = _check_fit_params(X, fit_params)
678
/home/pranjal/snap/jupyter/common/lib/python3.7/site-packages/sklearn/utils/validation.py in indexable(*iterables)
291 """
292 result = [_make_indexable(X) for X in iterables]
--> 293 check_consistent_length(*result)
294 return result
295
/home/pranjal/snap/jupyter/common/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
255 if len(uniques) > 1:
256 raise ValueError("Found input variables with inconsistent numbers of"
--> 257 " samples: %r" % [int(l) for l in lengths])
258
259
ValueError: Found input variables with inconsistent numbers of samples: [60000, 60001]
请帮我调试这段代码
【问题讨论】:
【参考方案1】:Sklearn 需要(n_samples, n_columns)
的数据形状。当您在 numpy 数组上使用 ravel
时,生成的形状为 (n_samples,)
。将其重塑为(n_samples, n_columns)
。如果这不起作用,您可以尝试对cv.fit()
中的输入使用相同的数据类型,即
cv.fit(tr_features, tr_labels)
这样特征和标签都是数据框。
【讨论】:
以上是关于尝试实现逻辑回归,但 gridsearchCV 显示输入变量的样本数量不一致:[60000, 60001]的主要内容,如果未能解决你的问题,请参考以下文章
使用 GridSearchCV 进行逻辑回归时的精度计算警告
scikit-learn 中 LogisticRegression 上的 GridSearchCV
为啥在逻辑回归中对 roc_auc 进行评分时,GridSearchCV 不给出具有最高 AUC 的 C
n_jobs=-1 的 GridSearchCV 不适用于决策树/随机森林分类