找到样本数量不一致的输入变量：RandomForestRegressor 的 [1, 4] 错误

Posted 2023-03-12

技术标签:

【中文标题】找到样本数量不一致的输入变量：RandomForestRegressor 的 [1, 4] 错误【英文标题】：Getting found input variables with inconsistent numbers of samples: [1, 4] error for RandomForestRegressor 【发布时间】：2021-01-20 05:13:13 【问题描述】：

我指的是this Random Forrest Algorithm example来预测不同阶段的拒绝。

我正在从数据库中获取 stages 和 reject_count 的值。并使用stages 值作为x 和reject_count 值作为y。我的代码是：

    import numpy as np
    from sklearn.ensemble import RandomForestRegressor
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler

    stages = [102, 103, 104, 106]
    reject_count = [1, 3, 1, 2]
    li = []
    li.append(stages)
    l2 = []
    l2.append(reject_count)
    x = np.array(li)
    y = np.array(reject_count)
    x.shape
    y.shape

    X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
    print("===============")

    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)

    regressor = RandomForestRegressor(n_estimators=100, random_state=0)
    print("x train", X_train)
    print("y train", y_train)
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)
    print(y_pred)

请指导我哪里做错了。

【问题讨论】：

请发布完整的错误跟踪 @desertnaut。没有完整的错误跟踪。仅获取发现样本数量不一致的输入变量：[1, 4] 具体在哪里？总会有痕迹的…… 【参考方案1】：

这里发生了两件事

首先，您的 x 和 y 维度不同，一个是列表列表，另一个是列表。其次，假设您希望您的数据作为每个样本一个观察值的数组，您应该重塑您的 x 值。更多关于here

import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

stages = [102, 103, 104, 106]
reject_count = [1, 3, 1, 2]
#li = []
#li.append(stages)
#l2 = []
#l2.append(reject_count)
x = np.array(stages).reshape(-1, 1)
y = np.array(reject_count)

print(x, y)
x.shape
y.shape

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
print("===============")

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

regressor = RandomForestRegressor(n_estimators=100, random_state=0)
print("x train", X_train)
print("y train", y_train)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
print(y_pred)

【讨论】：

以上是关于找到样本数量不一致的输入变量：RandomForestRegressor 的 [1, 4] 错误的主要内容，如果未能解决你的问题，请参考以下文章

如何解决 Python 中的“ValueError：找到样本数量不一致的输入变量”问题

ValueError：找到样本数量不一致的输入变量：[2，921]

Python Scikit Learn 错误：ValueError：“找到样本数量不一致的输入变量：[4, 10]”

ValueError：找到样本数量不一致的输入变量：[2839，14195]

当我尝试为 scikit-learn 模型拟合另外 1 个功能时，出现此错误“ValueError：找到样本数量不一致的输入变量”

ValueError：发现样本数量不一致的输入变量：[143, 426]