如何为scikit学习随机森林模型设置阈值

Posted 2023-03-12

技术标签:

【中文标题】如何为scikit学习随机森林模型设置阈值【英文标题】：how to set threshold to scikit learn random forest model 【发布时间】：2018-09-21 23:28:38 【问题描述】：

看到precision_recall_curve后，如果我想设置threshold = 0.4，如何在我的随机森林模型（二元分类）中实现0.4，对于任何概率 = 0.4，将其标记为为 1。

from sklearn.ensemble import RandomForestClassifier
  random_forest = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=12)
  random_forest.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
  predicted = random_forest.predict(X_test)
accuracy = accuracy_score(y_test, predicted)

文档Precision recall

【问题讨论】：

【参考方案1】：

sklearn.metrics.accuracy_score 采用一维数组，但您的预测数组是二维的。这会带来一个错误。 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html

【讨论】：

【参考方案2】：

random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, y_train)

threshold = 0.4

predicted = random_forest.predict_proba(X_test)
predicted[:,0] = (predicted[:,0] < threshold).astype('int')
predicted[:,1] = (predicted[:,1] >= threshold).astype('int')


accuracy = accuracy_score(y_test, predicted)
print(round(accuracy,4,)*100, "%")

这里有一个错误是指最后一个准确度部分“ValueError: Can't handle mix of binary and multilabel-indicator”

【讨论】：

【参考方案3】：

假设你正在做二进制分类，这很容易：

threshold = 0.4

predicted_proba = random_forest.predict_proba(X_test)
predicted = (predicted_proba [:,1] >= threshold).astype('int')

accuracy = accuracy_score(y_test, predicted)

【讨论】：

嗨，Stev，我的最后一部分给了我一个错误“ValueError：无法处理二进制和多标签指示符的混合。” accuracy = accuracy_score(y_test, predicted) print(round(accuracy,4,)*100, "%") 你知道怎么解决吗？ @BigData，糟糕，我没有运行您的其余代码。在这种情况下，您只需要取第二列。如果是 1 类，则该列为 1，如果不是，则为 0，我的含义是 0 类。检查我的编辑。

以上是关于如何为scikit学习随机森林模型设置阈值的主要内容，如果未能解决你的问题，请参考以下文章