如何在 Weka 中解释逻辑回归的结果

Posted 2023-03-13

技术标签:

【中文标题】如何在 Weka 中解释逻辑回归的结果【英文标题】：How to interpret the results of logistic regression in Weka 【发布时间】：2019-01-24 07:44:37 【问题描述】：

大家好，我是这个领域的新手，我想知道是否有人可以帮助我理解逻辑回归的结果。我需要了解自变量是否可以用来进行良好的分类。

=== Run information ===

Scheme:       weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -num-decimal-places 4
Relation:     Train
Instances:    14185
Attributes:   5
              ATTR_1
              ATTR_2
              ATTR_3
              ATTR_4
              DEPENDENT_VAR
Test mode:    evaluate on training data

=== Classifier model (full training set) ===

Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
               Class
Variable           0
====================
ATTR_1        0.0022
ATTR_2        0.0022
ATTR_3        0.0034
ATTR_4       -0.0021
Intercept     0.9156


Odds Ratios...
               Class
Variable           0
====================
ATTR_1        1.0022
ATTR_2        1.0022
ATTR_3        1.0034
ATTR_4        0.9979


Time taken to build model: 0.13 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.07 seconds

=== Summary ===

Correctly Classified Instances       51240               72.2453 %
Incorrectly Classified Instances     19685               27.7547 %
Kappa statistic                         -0.0001
Mean absolute error                      0.3992
Root mean squared error                  0.4467
Relative absolute error                 99.5581 %
Root relative squared error             99.7727 %
Total Number of Instances            70925     

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 1,000    1,000    0,723      1,000    0,839      -0,005   0,545     0,759     0
                 0,000    0,000    0,000      0,000    0,000      -0,005   0,545     0,305     1
Weighted Avg.    0,722    0,723    0,522      0,722    0,606      -0,005   0,545     0,633     

=== Confusion Matrix ===

     a     b   <-- classified as
 51240     5 |     a = 0
 19680     0 |     b = 1

特别是，我有兴趣了解系数和优势比的值。谢谢。

【问题讨论】：

这不是一个真正的编程问题。要了解有关 Weka 中逻辑回归的更多信息，您可以尝试观看 this，如果您不熟悉数据挖掘，我建议您阅读完整课程。输出中的混淆矩阵表明该分类器在您的数据上做得不好，因为它预测几乎每个实例都属于类 a，而其中 19680 个应该是 b。 【参考方案1】：

在我的头顶：

优势比和系数值彼此成正比，并且可以相互计算。

对于属性1，exp(0.0022) = 1.002

为了进行更多计算和拟合/预测，系数“更好”。然而，这些系数是必须插入到exp(x) 函数中的值，并且有些难以“在你的脑海中想象”。

为了便于人类理解，优势比有时更方便 - 更容易解释/可视化，但您无法直接使用它们进行某些计算。

Weka 不知道你对什么更感兴趣，所以为了方便，它给了你们两个。

顺便说一下，weka 做了正则化逻辑回归 (Logistic Regression with ridge parameter of 1.0E-8)，因此系数可能与不同软件包可能提供的系数略有不同。

【讨论】：

以上是关于如何在 Weka 中解释逻辑回归的结果的主要内容，如果未能解决你的问题，请参考以下文章