多项式回归的样本数不一致的输入变量

Posted

技术标签:

【中文标题】多项式回归的样本数不一致的输入变量【英文标题】:Input Variables With Inconsistent Numbers of Samples for Polynomial Regression 【发布时间】:2021-06-14 04:19:01 【问题描述】:

尝试进行多项式回归并且在拟合模型时遇到了一些问题。 获取

ValueError: Found input variables with inconsistent numbers of samples: [1040, 260]
import numpy as np 
import pandas as pd 
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures


x = BTCdata.iloc[:, [1, 2, 4, 5]]
y = BTCdata.iloc[:,3]

x, y = np.array(x).reshape((-1, 1)), np.array(y).reshape((-1, 1))

poly_features= PolynomialFeatures(degree= 4, include_bias = False)
x_ = poly_features.fit_transform(x)
model = LinearRegression()
model.fit(x_, y)

【问题讨论】:

能否请您发布 BTCdata 或链接或类似内容以重现错误? 是的,当然是我没能做到这一点的错 drive.google.com/file/d/13VnQZbKB9UTOeNplT6GjzTZvH8CqxQcr/… sheet is 'FinalBTC' 。刚刚做了简单的 pd.read_excel(path) 【参考方案1】:

问题出在这一行:

x = np.array(x).reshape((-1, 1))

通过这样做,您将n 行和m 列的数据框转换为n x m 行和1 列的数组。在您的示例中,x 最终具有 260 x 4 = 1040 行,而 y 具有 260,从而引发此错误。

如果您的目标是在将数据用于模型之前将其转换为 numpy 数组,那么您只需执行以下操作:

x = x.to_numpy()

【讨论】:

【参考方案2】:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm
#
BTCdata = pd.read_excel('BitcoinRegression.xlsx',                           sheet_name='FinalBTC')

x = BTCdata.iloc[:, [1, 2, 4, 5]]
print(x.shape)
y = BTCdata.iloc[:,3]
print(y.shape)
#x, y = np.array(x).reshape((-1, 1)), np.array(y).reshape((-1, 1))


poly_features= PolynomialFeatures(degree= 4, include_bias = False)
x_ = poly_features.fit_transform(x)
#model = LinearRegression()
#model.fit(x_, y)

mod = sm.OLS(y, x_).fit()
mod.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    BTC   R-squared:                       0.886
Model:                            OLS   Adj. R-squared:                  0.868
Method:                 Least Squares   F-statistic:                     46.86
Date:                Wed, 17 Mar 2021   Prob (F-statistic):           2.63e-85
Time:                        20:49:58   Log-Likelihood:                -2299.3
No. Observations:                 260   AIC:                             4675.
Df Residuals:                     222   BIC:                             4810.
Df Model:                          37                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0089      0.019     -0.468      0.640      -0.046       0.028
x2             0.0033      0.004      0.797      0.426      -0.005       0.012
x3          2.621e-05   3.55e-05      0.737      0.462   -4.38e-05    9.62e-05
x4             0.0005      0.001      0.789      0.431      -0.001       0.002
x5            -0.0238      0.067     -0.355      0.723      -0.156       0.108
x6             0.0790      0.688      0.115      0.909      -1.277       1.435
x7             0.0942      0.131      0.722      0.471      -0.163       0.352
x8             0.9679      1.276      0.759      0.449      -1.546       3.482
x9             0.0184      0.133      0.139      0.890      -0.243       0.280
x10            0.0093      0.013      0.726      0.469      -0.016       0.035
x11            0.0957      0.125      0.766      0.444      -0.150       0.342
x12            0.0001      0.000      0.864      0.389      -0.000       0.000
x13            0.0008      0.001      0.599      0.550      -0.002       0.003
x14            0.0207      0.026      0.783      0.435      -0.031       0.073
x15         3.594e-05   2.89e-05      1.245      0.214   -2.09e-05    9.28e-05
x16           -0.0004      0.001     -0.496      0.621      -0.002       0.001
x17            0.0158      0.010      1.621      0.106      -0.003       0.035
x18           -0.0068      0.002     -2.945      0.004      -0.011      -0.002
x19           -0.0014      0.007     -0.202      0.840      -0.015       0.012
x20           -0.0389      0.086     -0.454      0.650      -0.208       0.130
x21            0.1104      0.043      2.558      0.011       0.025       0.195
x22            0.7337      0.819      0.896      0.371      -0.881       2.348
x23           -1.4583      0.432     -3.378      0.001      -2.309      -0.607
x24            0.0601      0.031      1.913      0.057      -0.002       0.122
x25            0.0192      0.021      0.893      0.373      -0.023       0.061
x26            0.0403      0.091      0.445      0.657      -0.138       0.219
x27           -0.5110      0.224     -2.284      0.023      -0.952      -0.070
x28            0.0697      0.078      0.892      0.374      -0.084       0.224
x29           -0.1316      0.039     -3.397      0.001      -0.208      -0.055
x30            0.0054      0.103      0.052      0.958      -0.198       0.209
x31            0.0003      0.000      0.951      0.343      -0.000       0.001
x32            0.0060      0.007      0.856      0.393      -0.008       0.020
x33           -0.0124      0.012     -1.078      0.282      -0.035       0.010
x34            0.3317      0.394      0.842      0.400      -0.444       1.108
x35        -4.886e-09    1.1e-09     -4.439      0.000   -7.05e-09   -2.72e-09
x36         1.387e-07   3.68e-08      3.767      0.000    6.62e-08    2.11e-07
x37         5.106e-07   3.44e-06      0.148      0.882   -6.28e-06     7.3e-06
x38         4.652e-07   2.91e-07      1.601      0.111   -1.07e-07    1.04e-06
x39        -1.623e-06   5.17e-07     -3.138      0.002   -2.64e-06   -6.04e-07
x40        -8.446e-05   9.05e-05     -0.933      0.352      -0.000    9.39e-05
x41        -8.729e-06   7.38e-06     -1.182      0.238   -2.33e-05    5.82e-06
x42           -0.0017      0.002     -0.804      0.422      -0.006       0.002
x43            0.0007      0.000      1.705      0.090      -0.000       0.001
x44        -1.815e-05   2.11e-05     -0.862      0.390   -5.96e-05    2.33e-05
x45         9.562e-06   3.43e-06      2.788      0.006     2.8e-06    1.63e-05
x46            0.0012      0.001      1.413      0.159      -0.000       0.003
x47         5.405e-05    6.5e-05      0.831      0.407   -7.41e-05       0.000
x48            0.0069      0.044      0.156      0.876      -0.080       0.093
x49           -0.0078      0.006     -1.414      0.159      -0.019       0.003
x50            0.0001      0.000      0.307      0.759      -0.001       0.001
x51            0.1505      0.090      1.669      0.096      -0.027       0.328
x52            0.1555      0.046      3.410      0.001       0.066       0.245
x53           -0.0296      0.024     -1.210      0.227      -0.078       0.019
x54            0.0016      0.001      2.182      0.030       0.000       0.003
x55         -2.28e-05   8.77e-06     -2.600      0.010   -4.01e-05   -5.52e-06
x56           -0.0045      0.003     -1.594      0.112      -0.010       0.001
x57           -0.0002      0.000     -0.947      0.344      -0.001       0.000
x58           -0.0067      0.237     -0.028      0.977      -0.474       0.461
x59            0.0134      0.021      0.629      0.530      -0.029       0.055
x60            0.0020      0.002      1.123      0.262      -0.002       0.006
x61            0.0277      0.016      1.689      0.093      -0.005       0.060
x62           -0.3824      0.413     -0.926      0.355      -1.196       0.431
x63            0.3528      0.179      1.970      0.050      -0.000       0.706
x64           -0.0282      0.005     -5.708      0.000      -0.038      -0.018
x65           -0.0002      0.000     -0.695      0.488      -0.001       0.000
x66            0.0098      0.009      1.142      0.255      -0.007       0.027
x67            0.0901      0.103      0.873      0.384      -0.113       0.293
x68           -0.1941      0.648     -0.300      0.765      -1.471       1.083
x69            0.0237      0.021      1.128      0.261      -0.018       0.065
==============================================================================
Omnibus:                      127.728   Durbin-Watson:                   0.552
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              851.418
Skew:                           1.861   Prob(JB):                    1.31e-185
Kurtosis:                      11.046   Cond. No.                     4.00e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large,  4e+16. This might indicate that there are
strong multicollinearity or other numerical problems.
"""

【讨论】:

以上是关于多项式回归的样本数不一致的输入变量的主要内容,如果未能解决你的问题,请参考以下文章

尝试实现逻辑回归,但 gridsearchCV 显示输入变量的样本数量不一致:[60000, 60001]

sklearn:发现样本数量不一致的输入变量:[1, 99]

将 CSV 索引到不一致的样本数量以进行逻辑回归

样本数量不一致的 Python Sklearn 变量

拟合 LogisticRegression 时发现样本数不一致的输入变量

ValueError:发现样本数量不一致的输入变量:[143, 426]