在 Jupyter Notebook 中可视化决策树
Posted
技术标签:
【中文标题】在 Jupyter Notebook 中可视化决策树【英文标题】:Visualizing a Decision Tree in Jupyter Notebook 【发布时间】:2020-12-12 09:37:13 【问题描述】:有没有办法在 Jupyter Notebook 上“分解”以下树?它是一个简单的决策树,但我不知道是什么让它看起来崩溃了。以下是相关代码 sn-ps 和树本身。
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = (10, 8)
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import collections
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
#some more code
# Some feature values are present in train and absent in test and vice-versa.
y = df_train['Should_Vacation_there']
df_train, df_test = intersect_features(train=df_train, test=df_test)
df_train
#training a decision tree
dt = DecisionTreeClassifier(criterion='entropy', random_state=17)
dt.fit(df_train, y);
#displaying the tree
plot_tree(dt, feature_names=df_train.columns, filled=True,
class_names=["Should go there", "Shouldn't go there"]);
【问题讨论】:
检查您绘制图形的轴的大小。您可以增加matplotlib.pyplot.figure
中的figsize
参数以增加轴的大小。类似plt.figure(figsize=(10, 5))
。
一种方法是使用 Graphviz。查看here。
@NikhilKumar 这可能无法完全解决问题。 Scikit-learn 对决策树的可视化效果很差。
【参考方案1】:
#%config InlineBackend.figure_format = 'retina'
是这里的罪魁祸首。评论它会生成一个格式良好的树。
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import collections
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
%matplotlib inline
#%config InlineBackend.figure_format = 'retina'
iris = load_iris()
plt.rcParams['figure.figsize'] = (10, 8)
#some more code
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
# Some feature values are present in train and absent in test and vice-versa.
#y = df_train['Should_Vacation_there']
#df_train, df_test = intersect_features(train=df_train, test=df_test)
#df_train
#training a decision tree
dt = DecisionTreeClassifier(criterion='entropy', random_state=17)
dt.fit(X_train, y_train)
#displaying the tree
plot_tree(dt, feature_names=iris.feature_names, filled=True,
class_names=iris.target_names);
【讨论】:
【参考方案2】:取出%config InlineBackend.figure_format = 'retina'
。改用'svg'
,您将获得出色的分辨率。
from matplotlib import pyplot as plt
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# Prepare the data data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Fit the classifier with default hyper-parameters
clf = DecisionTreeClassifier(random_state=1234)
model = clf.fit(X, y)
1:
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(clf,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True)
#2
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(clf,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True)
【讨论】:
以上是关于在 Jupyter Notebook 中可视化决策树的主要内容,如果未能解决你的问题,请参考以下文章
在 jupyter notebook 中显示 scikit 决策树图