从现有系数创建 sklearn.linear_model.LogisticRegression 实例

Posted 2023-03-12

技术标签:

【中文标题】从现有系数创建 sklearn.linear_model.LogisticRegression 实例【英文标题】：Creating a sklearn.linear_model.LogisticRegression instance from existing coefficients 【发布时间】：2014-08-17 19:03:01 【问题描述】：

是否可以根据现有系数创建这样的实例，这些系数是在不同的实现（例如 Java）中计算出来的？

我尝试创建一个实例，然后直接设置 coef_ 和 intercept_，它似乎可以工作，但我不确定这里是否有缺点，或者我是否可能会破坏某些东西。

【问题讨论】：

只要你的回归的预测函数只使用你设置的那些变量，你应该可以不用拟合。要对此进行测试，您可以在 sklearn 中运行一个小型逻辑回归，然后创建一个新的逻辑回归对象并像您一样设置coef_ 和intercept_，然后在预测中比较两者。如果它运行（这不是给定的，对于例如 SVM 来说非常困难），那么我不明白为什么它不应该工作。 【参考方案1】：

是的，它工作正常：

import numpy as np
from scipy.stats import norm
from sklearn.linear_model import LogisticRegression
import json
x = np.arange(10)[:, np.newaxis]
y = np.array([0,0,0,1,0,0,1,1,1,1])
# training one logistic regression
model1 = LogisticRegression(C=10, penalty='l1').fit(x, y)
# serialize coefficients (imitate loading from storage)
encoded = json.dumps((model1.coef_.tolist(), model1.intercept_.tolist(), model1.penalty, model1.C))
print(encoded)
decoded = json.loads(encoded)
# using coefficients in another regression
model2 = LogisticRegression()
model2.coef_ = np.array(decoded[0])
model2.intercept_ = np.array(decoded[1])
model2.penalty = decoded[2]
model2.C = decoded[3]
# resulting predictions are identical
print(model1.predict_proba(x) == model2.predict_proba(x))

输出：

[[[0.7558780101653273]], [-3.322083150375962], "l1", 10]
[[ True  True]
 [ True  True]
 [ True  True]
 [ True  True]
 [ True  True]
 [ True  True]
 [ True  True]
 [ True  True]
 [ True  True]
 [ True  True]]

所以对原始模型和重新创建的模型的预测确实是相同的。

【讨论】：

如果我可以补充，这个解决方案可能不适用于某些版本的 sklearn。我刚刚在scikit-learn 0.24.2 上尝试过，您的解决方案给出了属性错误，指出逻辑回归对象没有属性“classes_”。解决方法是用你需要的类来设置它，例如：model2.classes_ = np.array([0, 1])。

以上是关于从现有系数创建 sklearn.linear_model.LogisticRegression 实例的主要内容，如果未能解决你的问题，请参考以下文章

在线性回归中使用现有系数和截距

如何在 C/C++ 中从系数和 10 的幂创建双精度数？

如何在 python 中添加回归函数，或者从给定的系数创建一个新的回归函数？

根据变更日志创建从/到日期列

如何从现有表创建表

如何从多项式字符串中获取系数和指数？