DTreeViz 命令期间训练数据出现 ValueError
Posted
技术标签:
【中文标题】DTreeViz 命令期间训练数据出现 ValueError【英文标题】:ValueError with Training Data during DTreeViz Command 【发布时间】:2021-08-27 01:20:41 【问题描述】:我创建了一个 DecisionTreeClassifier clf 来对数据进行建模,并尝试使用 dtreeviz 包来可视化树。
clf = DecisionTreeClassifier(max_depth=3)
clf.fit(X_train, y_train)
为了让 dtreeviz 函数可以消化数据,我对 X_train 和 y_train 进行了转换。
from dtreeviz.trees import dtreeviz
from sklearn import preprocessing
import graphviz
# Create integer representation of target column
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(y_train)
print ("Categorical classes:", label_encoder.classes_)
y_train_encoded = label_encoder.transform(y_train)
X_train_mod = X_train.to_numpy()
print("X_train type: ", type(X_train_mod))
print("Dimensions: ", np.ndim(X_train_mod))
print(X_train_mod.shape)
Categorical classes: ['Bad' 'Good']\ X_train type: <class 'numpy.ndarray'>\ Dimensions: 2\ (700, 61)
但是,dtreeviz 命令失败并出现以下错误:
dtreeviz(clf, x_data=X_train_mod, y_data=y_train_encoded, target_name='Good/Bad',
feature_names=X_train.columns.to_list(), class_names=list(label_encoder.classes_))
ValueError Traceback (most recent call last)
<ipython-input-130-eec9abdfba5d> in <module>
14 print(X_train_mod.shape)
15
---> 16 dtreeviz(clf, x_data=X_train_mod, y_data=y_train_encoded, target_name='Good/Bad',
17 feature_names=X_train.columns.to_list(), class_names=list(label_encoder.classes_))
~/opt/anaconda3/lib/python3.8/site-packages/dtreeviz/trees.py in dtreeviz(tree_model, x_data, y_data, feature_names, target_name, class_names, tree_index, precision, orientation, instance_orientation, show_root_edge_labels, show_node_labels, show_just_path, fancy, histtype, highlight_path, X, max_X_features_LR, max_X_features_TD, label_fontsize, ticks_fontsize, fontname, title, title_fontsize, colors, scale)
795 if fancy:
796 if shadow_tree.is_classifier():
--> 797 class_split_viz(node, X_data, y_data,
798 filename=f"tmp/nodenode.id_os.getpid().svg",
799 precision=precision,
~/opt/anaconda3/lib/python3.8/site-packages/dtreeviz/trees.py in class_split_viz(node, X_train, y_train, colors, node_heights, filename, ticks_fontsize, label_fontsize, fontname, precision, histtype, X, highlight_node)
1002
1003 bins = _get_bins(overall_feature_range, nbins)
-> 1004 hist, bins, barcontainers = ax.hist(X_hist,
1005 color=X_colors,
1006 align='mid',
~/opt/anaconda3/lib/python3.8/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
1599 def inner(ax, *args, data=None, **kwargs):
1600 if data is None:
-> 1601 return func(ax, *map(sanitize_sequence, args), **kwargs)
1602
1603 bound = new_sig.bind(ax, *args, **kwargs)
~/opt/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py in hist(self, x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, **kwargs)
6686 input_empty = np.size(x) == 0
6687 # Massage 'x' for processing.
-> 6688 x = cbook._reshape_2D(x, 'x')
6689 nx = len(x) # number of datasets
6690
~/opt/anaconda3/lib/python3.8/site-packages/matplotlib/cbook/__init__.py in _reshape_2D(X, name)
1428 return [np.reshape(x, -1) for x in X]
1429 else:
-> 1430 raise ValueError(" must have 2 or fewer dimensions".format(name))
1431
1432
ValueError: x must have 2 or fewer dimensions
但是,根据 np.ndim() 函数的输出,看起来 X_train 的尺寸是正确的。我已经与 iris 示例 here 进行了比较,以验证参数的所有类型是否匹配。我现在不知道该怎么做。
【问题讨论】:
【参考方案1】:我遇到了同样的问题。确保您的分类器也接受了编码标签的训练,即使用
clf.fit(X_train, y_train_encoded)
而不是
clf.fit(X_train, y_train)
【讨论】:
以上是关于DTreeViz 命令期间训练数据出现 ValueError的主要内容,如果未能解决你的问题,请参考以下文章
训练 TensorFlow 期间面临的问题(BatchNormV3 错误)