解决sklearn 随机森林数据不平衡的方法
Posted allen-rg
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了解决sklearn 随机森林数据不平衡的方法相关的知识,希望对你有一定的参考价值。
Handle Imbalanced Classes In Random Forest
Preliminaries
# Load libraries
from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn import datasets
Load Iris Flower Dataset
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target
Adjust Iris Dataset To Make Classes Imbalanced
# Make class highly imbalanced by removing first 40 observations
X = X[40:,:]
y = y[40:]
# Create target vector indicating if class 0, otherwise 1
y = np.where((y == 0), 0, 1)
Train Random Forest While Balancing Classes
When using RandomForestClassifier
a useful setting is class_weight=balanced
wherein classes are automatically weighted inversely proportional to how frequently they appear in the data. Specifically:
wj=n/knj
where wj is the weight to class j, nn is the number of observations, nj is the number of observations in class j, and k is the total number of classes.
# Create decision tree classifer object
clf = RandomForestClassifier(random_state=0, n_jobs=-1, class_weight="balanced")
# Train model
model = clf.fit(X, y)
https://chrisalbon.com/machine_learning/trees_and_forests/handle_imbalanced_classes_in_random_forests/
类别不平衡处理方法:
https://segmentfault.com/a/1190000015248984
以上是关于解决sklearn 随机森林数据不平衡的方法的主要内容,如果未能解决你的问题,请参考以下文章
如何在 Scikit-Learn 的随机森林分类器中设置子样本大小?特别是对于不平衡的数据