将 n 次多项式的系数应用于公式

Posted 2023-03-12

技术标签:

【中文标题】将 n 次多项式的系数应用于公式【英文标题】：Apply coefficients from n degree polynomial to formula 【发布时间】：2021-12-02 08:16:50 【问题描述】：

我使用 sklearn LinearRegression()estimator，有 5 个变量

['feat1', 'feat2', 'feat3', 'feat4', 'feat5']

为了预测一个连续的值。

Estimator 返回系数值和偏差的列表：

linear = LinearRegression()
print(linear.coef_)
print(linear.intercept_)

[ 0.18799409 -0.05406106 -0.01327966 -0.13348129 -0.00614054]
-0.011064865422734674

然后，鉴于我将每个特征都作为变量，我可以将系数硬编码为线性公式并估计我的值，如下所示：

val = ((0.18799409*feat1) - (0.05406106*feat2) - (0.01327966*feat3) - (0.13348129*feat4) - (0.00614054*feat5)) -0.011064865422734674

现在假设我使用 2 次多项式回归，使用管道并通过打印：

model = Pipeline(steps=[
    ('scaler',StandardScaler()),
    ('polynomial_features', PolynomialFeatures(degree=degree, include_bias=False)), 
    ('linear_regression', LinearRegression())])

#fit model
model.fit(X_train, y_train)

print(model['linear_regression'].coef_)
print(model['linear_regression'].intercept_)

我明白了：

[ 7.06524186e-01 -2.98605001e-02 -4.67175212e-02 -4.86890790e-01
 -1.06320101e-02 -2.77958604e-03 -3.38253025e-04 -7.80563090e-03
  4.51356888e-03  8.32036733e-03  3.57638244e-02 -2.16446849e-02
 -7.92169287e-02  3.36809467e-02 -6.60531497e-03  2.16613331e-02
  2.10097993e-02  3.49970303e-02 -3.02970698e-02 -7.81462599e-03]
0.011042927069084668

如何转换上面的公式以便从回归中计算 val，使用来自 .coef_ 和 .intercept_ 的值，使用数组索引而不是硬编码值，对于 any '度数？

有没有适合的scipy 或numpy 方法？

【问题讨论】：

【参考方案1】：

请务必注意，多项式回归只是线性回归的扩展案例，因此我们需要做的就是一致地转换输入数据。对于任何 N，我们都可以使用 sklearn.preprocessing. 中的 PolynomialFeatures 通过使用虚拟数据，我们可以看到这是如何工作的：

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
#set parameters
X = np.stack([np.arange(i,i+10) for i in range(5)]).T
Y = np.random.randn(10)*10+3
N = 2

poly_reg=PolynomialFeatures(degree=N,include_bias=False)
X_poly=poly_reg.fit_transform(X) 
#print(X[0],X_poly[0]) #to check parameters, note that it includes the y intercept as an input of 1

poly = LinearRegression().fit(X_poly, Y)

因此，我们可以按照您之前的方式获得 coef_，只需执行矩阵乘法即可获得回归值。

new_dat = poly_reg.transform(np.arange(2,2+10,2)[None]) #5 new datapoints 
np.testing.assert_array_equal(poly.predict(new_dat),new_dat @ poly.coef_ + poly.intercept_)

----编辑----

如果您不能对多项式特征使用转换，它只是一个迭代组合循环，用于从您的特征列表中生成数据。

new_feats = np.array([feat1,feat2,feat3,feat4,feat5])

from itertools import combinations_with_replacement
def gen_poly_feats(x,N):
    #this function returns all unique groupings (w/ replacement) of the indices into the array x for use in polynomial regression.
    return np.concatenate([[np.product(x[np.array(i)]) for i in list(combinations_with_replacement(range(len(x)), n))] for n in range(1,N+1)])[None]

new_feats_poly = gen_poly_feats(new_feats,N)
# just to be sure that this matches...
np.testing.assert_array_equal(new_feats_poly,poly_reg.transform(new_feats[None]))
#then we can use the above linear regression model to predict the new data
val = new_feats_poly @ poly.coef_ + poly.intercept_

【讨论】：

val 对应于未来值，在拟合回归时不可用。这就是new_dat 的意思吗？ new_dat 只是一些新数据，其大小/类型与X 中的训练数据相同。我有点困惑，因为我实际上使用了 Pipeline()，这使得从预处理器或估计器中提取值有点尴尬。 .也许你适应旅游回答我会奖励你赏金。我已编辑问题以使其更完整编辑了我的答案，虽然我不是 100% 确定你为什么不能使用 sklearn PolynomialFeatures 函数。不管怎样，希望它现在对你有用。添加了另一个编辑，因为我之前的编辑不适用于 2 以外的 N。

以上是关于将 n 次多项式的系数应用于公式的主要内容，如果未能解决你的问题，请参考以下文章