逻辑回归模型系数

Posted

技术标签:

【中文标题】逻辑回归模型系数【英文标题】:Logistic regression model coefficient 【发布时间】:2020-12-22 03:50:57 【问题描述】:

我尝试对糖尿病进行 Logistic 回归并获得模型的结果,我假设每个变量都有 1 个系数,但结果给了我 3 个不同的系数列表和 3 个不同的截距。 我尝试了线性回归,它为每个人都给出了 1

import pandas as pd
import sklearn
from sklearn.linear_model import LogisticRegression
import numpy as np
from sklearn import linear_model, preprocessing
data = pd.read_csv ('diabetestype.csv' , sep = ',')

le = preprocessing.LabelEncoder()
Age = list(data['Age']) #will take all buying to a list and transform into proper integer values 
BSf = list(data['BS Fast'])
BSp = list(data['BS pp'])
PR = list(data['Plasma R'])
PF = list(data['Plasma F'])
Hb = list(data['HbA1c'])
Type = le.fit_transform(list(data['Type']))

X = list(zip(Age, BSf,BSp,PR,PF,Hb))
y = list(Type)


x_train,x_test, y_train,y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1)
# model = linear_model.LinearRegression()
model = LogisticRegression()
model.fit (x_train,y_train)
acc = model.score(x_test,y_test)
coef = model.coef_
inter = model.intercept_
prediction = model.predict(x_test)
for i in range (5):
    print ('predicted ', prediction[i],'variables  ', x_test[i] , 'actual', y_test[i])
print(acc)
print(coef, inter)

结果是--------

predicted  1 variables (2, 9, 14, 6, 6, 10) actual 1
predicted  2 variables (33, 7, 0, 9, 8, 8) actual 2
predicted  0 variables (19, 4, 4, 3, 2, 0) actual 0
predicted  0 variables (7, 15, 9, 5, 5, 3) actual 0
predicted  0 variables (16, 4, 4, 3, 2, 0) actual 0
1.0
[[-0.02543341  0.3763792  -0.2116062  -1.36365511 -0.87416662 -1.8448327 ]
 [ 0.00940748 -1.12894486  1.50994009  1.1101098   1.23563738 -0.2574385 ]
 [ 0.01602593  0.75256566 -1.29833389  0.25354531 -0.36147076  2.1022712 ]] [ 28.79209663 -19.24933782  -9.54275881]
C:\Users\nk\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

【问题讨论】:

【参考方案1】:

来自documentation:

coef_: ndarray of shape (1, n_features) or (n_classes, n_features)(与拦截_相同)

你有 3 个班级。

在这个最小的例子中,也有 3 个类:

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)
clf.predict(X[:2, :])

clf.predict_proba(X[:2, :])


clf.score(X, y)

set(y)  # >>>0, 1, 2 --> there are 3 classes



clf.coef_ # >>> array([[-0.41874027,  0.96699274, -2.52102832, -1.08416599],
          #            [ 0.53123044, -0.31473365, -0.20002395, -0.94866082],
          #            [-0.11249017, -0.65225909,  2.72105226,  2.03282681]])

clf.coef_.shape # >>> (3, 4)

clf.intercept_ # >>> array([  9.84028024,   2.21683511, -12.05711535])

您需要能够辨别样本是否属于哪个类别。无论您要测试哪个类,结果都将介于 0 或 1 之间。 例如。通过coef_ 的第一行,您可以检查它是否属于 1 类...等等。

【讨论】:

当我尝试预测样本时,我将如何使用 ax + b =c 进行建模。我应该尝试所有带有变量的类。之后我将如何评估结果?谢谢 @N.K 我不确定我是否理解这个问题。 HERE 你会在使用多类时找到定义。否则sample x 属于class 0Pr(class 0 |x) 的概率是e^(b0 + b1*X) / (1 + e^(b0 + b1*X))

以上是关于逻辑回归模型系数的主要内容,如果未能解决你的问题,请参考以下文章

R语言glm拟合logistic回归模型实战:基于glm构建逻辑回归模型及模型系数统计显著性分析每个预测因子对响应变量的贡献

如何将从逻辑回归模型获得的系数映射到pyspark中的特征名称

R语言广义线性模型函数GLM广义线性模型(Generalized linear models)GLM函数的语法形式glm模型常用函数常用连接函数逻辑回归泊松回归系数解读过散度分析

R语言广义线性模型函数GLMglm函数构建逻辑回归模型(Logistic regression)模型参数解读查看系数的加法效应(Interpreting the model parameters

R语言广义线性模型函数GLMglm函数构建逻辑回归模型(Logistic regression)模型参数解读查看系数的加法效应(Interpreting the model parameters

在 pymc3 中创建三级逻辑回归模型