映射 - 特征重要性与标签分类
Posted
技术标签:
【中文标题】映射 - 特征重要性与标签分类【英文标题】:Mapping - Feature Importance vs Label classification 【发布时间】:2021-07-18 20:34:29 【问题描述】:我有一组与香草磅蛋糕烘焙相关的数据(200 行),具有 27 个特征,如下所示。标签caketaste
是衡量烤蛋糕有多好的标准,由bad(0)
、neutral(1)
、good(2)
定义。
Features = cake_id, flour_g, butter_g, sugar_g, salt_g, eggs_count, bakingpowder_g, milk_ml, water_ml, vanillaextract_ml, lemonzest_g, mixingtime_min, bakingtime_min, preheattime_min, coolingtime_min, bakingtemp_c, preheattemp_c, color_red, color_green, color_blue, traysize_small, traysize_medium, traysize_large, milktype_lowfat, milktype_skim, milktype_whole, trayshape.
Label = caketaste ["bad", "neutral", "good"]
我的任务是找到: a) 影响标签结果的 5 个最重要的特征; b) 找出对标签中“良好”分类做出贡献的 5 个最重要特征的值。
我可以使用 sklearn (Python) 解决这个问题,使用 RandomForestClassifier() 拟合数据,然后使用 Feature_Importance() 识别 5 个最重要的特征,即 mixingtime_min
、bakingtime_min
、sugar_g
、@987654336 @和preheattemp_c
。
最小、完整且可验证的示例:
#################################################################
# a) Libraries
#################################################################
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.inspection import permutation_importance
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MaxAbsScaler
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
import time
#################################################################
# b) Data Loading Symlinks
#################################################################
df = pd.read_excel("poundcake.xlsx", sheet_name="Sheet0", engine='openpyxl')
#################################################################
# c) Analyzing Dataframe
#################################################################
#Getting dataframe details e.g columns, total entries, data types etc
print("\n<syntax>: df.info()")
df.info()
#Getting the 1st 5 lines in the dataframe
print("\n<syntax>: df.head()")
df.head()
#################################################################
# d) Data Visualization
#################################################################
#Scatterplot SiteID vs LTE - Spectral Efficiency
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.scatter(df["cake_id"], df["caketaste"], color='r')
ax.set_xlabel('cake_id')
ax.set_ylabel('caketaste')
ax.set_title('scatter plot')
plt.show()
#################################################################
# e) Feature selection
#################################################################
#Note:
#Machine learning models cannot work well with categorical (string) data, specifically scikit-learn.
#Need to convert the categorical variables into numeric types before building a machine learning model.
categorical_columns = ["trayshape"]
numerical_columns = ["flour_g","butter_g","sugar_g","salt_g","eggs_count","bakingpowder_g","milk_ml","water_ml","vanillaextract_ml","lemonzest_g","mixingtime_min","bakingtime_min","preheattime_min","coolingtime_min","bakingtemp_c","preheattemp_c","color_red","color_green","color_blue","traysize_small","traysize_medium","traysize_large","milktype_lowfat","milktype_skim","milktype_whole"]
#################################################################
# f) Dataset (Train Test Split)
#
# (Dataset)
# ┌──────────────────────────────────────────┐
# ┌──────────────────────────┬────────────┐
# | Training │ Test │
# └──────────────────────────┴────────────┘
#################################################################
# Prediction target - Training data
X = df[categorical_columns + numerical_columns]
# Prediction target - Training data
y = df["caketaste"]
# Break off validation set from training data. Default: train_size=0.75, test_size=0.25
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=42)
#################################################################
# Pipeline
#################################################################
#######################
# g) Column Transformer
#######################
categorical_encoder = OneHotEncoder(handle_unknown='ignore')
#Mean might not be suitable, Remove rows?
numerical_pipe = Pipeline([
('imp', SimpleImputer(strategy='mean'))
])
preprocessing = ColumnTransformer(
[('cat', categorical_encoder, categorical_columns),
('num', numerical_pipe, numerical_columns)])
#####################
# b) Pipeline Printer
#####################
#RF: builds multiple decision trees and merges (bagging) them together
#to get a more accurate and stable prediction (averaging).
pipe_xxx_xxx_rfo = Pipeline([
('pre', preprocessing),
('scl', None),
('pca', None),
('clf', RandomForestClassifier(random_state=42))
])
pipe_abs_xxx_rfo = Pipeline([
('pre', preprocessing),
('scl', MaxAbsScaler()),
('pca', None),
('clf', RandomForestClassifier(random_state=42))
])
#################################################################
# h) Hyper-Parameter Tuning
#################################################################
parameters_rfo =
'clf__n_estimators':[100],
'clf__criterion':['gini'],
'clf__min_samples_split':[2,5],
'clf__min_samples_leaf':[1,2]
parameters_rfo_bk =
'clf__n_estimators':[10,20,30,40,50,60,70,80,90,100,1000],
'clf__criterion':['gini','entropy'],
'clf__min_samples_split':[5,10,15,20,25,30],
'clf__min_samples_leaf':[1,2,3,4,5]
#########################
# i) GridSearch Printer
#########################
# scoring can be used as 'accuracy' or for MAE use 'neg_mean_absolute_error'
scr='accuracy'
grid_xxx_xxx_rfo = GridSearchCV(pipe_xxx_xxx_rfo,
param_grid=parameters_rfo,
scoring=scr,
cv=5,
refit=True)
grid_abs_xxx_rfo = GridSearchCV(pipe_abs_xxx_rfo,
param_grid=parameters_rfo,
scoring=scr,
cv=5,
refit=True)
print("Pipeline setup.... Complete")
###################################################
# Machine Learning Models Evaluation Algorithm
###################################################
grids = [grid_xxx_xxx_rfo, grid_abs_xxx_rfo]
grid_dict = 0: 'RandomForestClassifier',
1: 'RandomForestClassifier with AbsMaxScaler',
# Fit the grid search objects
print('Performing model optimizations...\n')
best_test_scr = -999999999999999 #Python3 does not allow to use None anymore
best_clf = 0
best_gs = ''
for idx, grid in enumerate(grids):
start_time = time.time()
print('*' * 100)
print('\nEstimator: %s' % grid_dict[idx])
# Fit grid search
grid.fit(X_train, y_train)
#Calculate the score once and use when needed
test_scr = grid.score(X_test,y_test)
train_scr = grid.score(X_train,y_train)
# Track best (lowest grid.score) model
if test_scr > best_test_scr:
best_test_scr = test_scr
best_train_scr = train_scr
best_rf = grid
best_clf = idx
print("..........................this model is better. SELECTED")
print("Best params : %s" % grid.best_params_)
print("Training accuracy : %s" % best_train_scr)
print("Test accuracy : %s" % best_test_scr)
print("Modeling time : %s" % time.strftime("%H:%M:%S", time.gmtime(time.time() - start_time)))
print('\nClassifier with best test set score: %s' % grid_dict[best_clf])
#########################################################################################
# j) Feature Importance using Gini Importance or Mean Decrease in Impurity (MDI)
# Note:
# 1.Calculates each feature importance as the sum over the number of splits (accross
# all trees) that include the feature, proportionaly to the number of samples it splits.
# 2. Biased towards cardinality i.e numerical variables
########################################################################################
ohe = (best_rf.best_estimator_.named_steps['pre'].named_transformers_['cat'])
feature_names = ohe.get_feature_names(input_features=categorical_columns)
feature_names = np.r_[feature_names, numerical_columns]
tree_feature_importances = (best_rf.best_estimator_.named_steps['clf'].feature_importances_)
sorted_idx = tree_feature_importances.argsort()
# Figure: Top Features
count=-28
y_ticks = np.arange(0, abs(count))
fig, ax = plt.subplots()
ax.barh(y_ticks[count:], tree_feature_importances[sorted_idx][count:])
ax.set_yticklabels(feature_names[sorted_idx][count:], fontsize=7)
ax.set_yticks(y_ticks[count:])
ax.set_title("Random Forest Tree's Feature Importance from Mean Decrease in Impurity (MDI)")
fig.tight_layout()
plt.show()
可以使用什么方法来解决任务 b)?我正在尝试回答以下研究问题,
mixingtime_min
、bakingtime_min
、flour_g
、sugar_g
和 preheattemp_c
在统计上对良好的caketaste
有什么贡献(良好:2)?
可能的预期结果:
mixingtime_min = [5,10,15] AND
bakingtime_min = [50,51,52,53,54,55] AND
flour_g = [150,160,170,180] AND
sugar_g = [200, 250] AND
preheattemp_c = [150,160,170]
上面的结果基本上可以得出结论,如果一个人想要一个好吃的蛋糕,他需要用 150-180g 面粉和 200-250g 糖烘烤他的蛋糕,并在 5-15 分钟之间混合面团,然后再烘烤 50在 150-170ºC 的预热烤箱中 -55 分钟。
希望大家指点一下。
问题
您能指导我如何着手解决这个研究问题吗? sklearn 中是否有任何库或其他能够获取此信息的库? 任何额外的信息,如置信区间、异常值等都是额外的。
数据(磅蛋糕.xlsx):
cake_id flour_g butter_g sugar_g salt_g eggs_count bakingpowder_g milk_ml water_ml vanillaextract_ml lemonzest_g mixingtime_min bakingtime_min preheattime_min coolingtime_min bakingtemp_c preheattemp_c color_red color_green color_blue traysize_small traysize_medium traysize_large milktype_lowfat milktype_skim milktype_whole trayshape caketaste
0 180 50 250 2 3 3 15 80 1 2 10 30 25 15 170 175 1 0 0 1 0 0 1 0 0 square 1
1 195 50 500 6 6 1 30 60 1 2 10 40 30 10 170 170 0 1 0 1 0 0 0 1 0 rectangle 1
2 160 40 600 6 5 1 15 90 3 3 5 30 30 10 155 160 1 0 0 1 0 0 0 0 1 square 2
3 200 80 350 8 4 2 15 50 1 1 7 40 20 10 175 165 0 1 0 1 0 0 0 0 1 rectangle 0
4 175 90 400 6 6 4 25 90 1 1 9 60 25 15 160 155 1 0 0 0 0 1 0 1 0 rectangle 0
5 180 60 650 6 3 4 20 80 2 3 7 15 20 20 155 160 0 0 1 0 0 1 0 1 0 rectangle 2
6 165 50 200 6 4 2 20 80 1 2 7 60 30 20 150 170 0 1 0 1 0 0 1 0 0 rectangle 0
7 170 70 200 6 2 3 25 50 2 3 8 70 20 10 170 150 0 1 0 1 0 0 0 1 0 rectangle 1
8 160 90 300 8 4 4 25 60 3 2 9 35 30 15 175 170 0 1 0 1 0 0 1 0 0 square 1
9 165 50 350 6 4 1 25 80 1 2 11 30 10 10 170 170 1 0 0 0 1 0 1 0 0 square 1
10 180 90 650 4 3 4 20 50 2 3 8 30 30 15 165 170 1 0 0 1 0 0 0 1 0 square 1
11 165 40 350 6 2 2 30 60 3 3 5 50 25 15 175 170 0 0 1 1 0 0 0 0 1 rectangle 1
12 175 70 500 6 2 1 25 80 1 1 7 60 20 15 170 170 0 1 0 1 0 0 1 0 0 square 2
13 175 70 350 6 2 1 15 60 2 3 9 45 30 15 175 170 0 0 0 1 0 0 0 1 0 rectangle 1
14 160 70 600 4 6 4 30 60 2 3 5 60 25 10 150 155 0 1 0 1 0 0 0 1 0 rectangle 0
15 165 50 500 2 3 4 20 60 1 3 10 30 15 20 175 175 0 1 0 1 0 0 1 0 0 rectangle 0
16 195 50 600 6 5 2 25 60 1 1 5 30 10 20 170 150 0 0 0 1 0 0 0 0 1 square 2
17 160 60 600 8 5 4 25 70 3 3 9 30 30 10 175 150 0 0 0 1 0 0 1 0 0 rectangle 0
18 160 80 550 6 3 3 23 80 1 1 9 25 30 15 155 170 0 0 1 1 0 0 0 0 1 rectangle 1
19 170 60 600 4 5 1 20 90 3 3 10 55 20 15 165 155 0 0 1 1 0 0 0 0 1 square 0
20 175 70 300 6 5 4 25 70 1 1 11 65 15 20 170 155 0 0 1 1 0 0 0 1 0 round 0
21 195 80 250 6 6 2 23 70 2 3 11 20 30 15 170 155 0 0 1 1 0 0 1 0 0 rectangle 0
22 170 90 650 6 3 4 20 70 1 2 10 60 25 15 170 155 0 0 1 0 0 1 0 1 0 rectangle 1
23 180 40 200 6 3 1 15 60 3 1 5 35 15 15 170 170 0 1 0 1 0 0 0 1 0 rectangle 2
24 165 50 550 8 4 2 23 80 1 2 5 65 30 15 155 175 0 0 0 1 0 0 1 0 0 rectangle 1
25 170 50 250 6 2 3 25 70 2 2 6 30 20 15 165 175 0 0 0 0 0 1 0 1 0 rectangle 2
26 180 50 200 6 4 2 30 80 1 3 10 30 20 15 165 165 0 0 0 1 0 0 0 1 0 rectangle 2
27 200 90 500 6 3 4 25 70 2 1 9 60 30 15 170 160 0 1 0 1 0 0 0 1 0 rectangle 2
28 170 60 300 6 2 3 25 80 1 1 9 15 15 15 160 150 1 0 0 0 0 1 0 0 1 round 1
29 170 60 400 2 3 2 25 60 1 3 9 25 15 15 160 175 0 0 0 1 0 0 1 0 0 square 0
30 195 50 650 4 5 2 25 60 1 3 7 40 15 15 165 170 0 1 0 1 0 0 1 0 0 rectangle 1
31 170 50 350 6 6 1 25 80 2 2 8 50 25 15 150 170 0 1 0 1 0 0 1 0 0 rectangle 2
32 160 80 550 4 4 4 20 70 1 3 7 25 25 15 170 165 1 0 0 0 0 1 0 0 1 rectangle 2
33 170 50 300 4 4 2 23 50 2 2 10 30 20 15 150 170 0 0 0 1 0 0 1 0 0 rectangle 0
34 175 70 650 4 4 1 23 70 3 3 10 55 10 15 150 170 0 0 1 1 0 0 0 0 1 rectangle 0
35 180 70 400 6 2 2 20 60 1 1 6 55 30 15 170 150 0 0 0 1 0 0 1 0 0 square 2
36 195 60 300 6 6 4 23 70 2 2 10 30 30 15 170 175 1 0 0 1 0 0 1 0 0 rectangle 0
37 180 70 400 6 4 1 20 70 3 2 9 30 30 20 160 170 1 0 0 1 0 0 0 1 0 rectangle 2
38 170 90 600 8 3 1 20 50 1 2 9 30 30 15 155 170 1 0 0 1 0 0 0 1 0 rectangle 2
39 180 60 200 2 3 2 20 70 1 2 10 55 30 20 165 155 0 1 0 1 0 0 0 1 0 round 2
40 180 70 400 6 4 2 15 60 1 3 7 45 30 10 170 175 0 0 0 1 0 0 0 1 0 rectangle 2
41 170 70 200 6 3 1 30 60 3 2 6 40 15 15 170 175 0 0 1 1 0 0 0 0 1 rectangle 2
42 170 60 550 6 3 4 20 80 1 2 9 60 20 15 150 165 1 0 0 1 0 0 1 0 0 round 2
43 170 50 600 6 4 3 30 60 1 2 11 15 30 15 155 150 1 0 0 0 1 0 1 0 0 rectangle 0
44 175 70 200 4 4 3 30 70 3 2 6 20 10 20 170 170 0 0 0 1 0 0 1 0 0 rectangle 1
45 195 70 500 8 4 2 25 60 2 3 6 15 30 15 165 170 1 0 0 0 0 1 0 1 0 rectangle 2
46 180 80 200 4 4 4 15 80 1 3 6 50 30 15 155 150 0 0 0 1 0 0 0 1 0 rectangle 2
47 165 50 350 6 4 2 20 60 1 1 9 40 20 15 150 155 0 0 0 1 0 0 1 0 0 rectangle 0
48 170 70 550 2 2 4 20 60 3 2 9 55 30 15 165 165 0 1 0 1 0 0 0 0 1 round 0
49 175 70 350 6 5 4 30 80 1 2 9 55 30 10 155 170 0 0 0 0 0 1 1 0 0 rectangle 1
50 180 50 400 6 4 3 25 50 2 2 9 20 20 20 160 160 0 0 0 1 0 0 0 1 0 rectangle 2
51 165 50 650 6 5 4 20 60 1 2 5 60 30 15 175 170 0 0 1 1 0 0 0 0 1 square 0
52 170 70 200 2 6 3 25 60 1 3 8 35 25 15 170 155 1 0 0 1 0 0 0 0 1 rectangle 1
53 180 40 350 4 4 3 30 60 3 2 12 45 30 15 150 175 0 0 0 1 0 0 0 1 0 rectangle 1
54 175 50 600 8 3 1 20 80 2 1 7 30 15 15 150 160 0 0 0 1 0 0 0 0 1 square 0
55 175 70 400 4 3 1 25 90 1 2 5 50 30 10 170 170 1 0 0 0 0 1 1 0 0 rectangle 1
56 170 50 650 6 6 3 20 70 1 1 6 25 30 15 170 160 1 0 0 1 0 0 0 1 0 rectangle 2
57 200 70 650 6 3 1 15 60 2 1 10 25 10 15 170 150 0 1 0 1 0 0 0 0 1 rectangle 2
58 175 80 650 6 5 2 23 70 1 1 5 45 15 15 160 170 0 1 0 1 0 0 0 0 1 rectangle 1
59 170 50 200 8 3 4 30 70 1 3 11 35 25 15 170 170 0 0 0 1 0 0 0 1 0 rectangle 1
60 170 60 300 6 3 1 20 60 3 3 11 20 30 15 170 170 1 0 0 1 0 0 0 0 1 rectangle 0
61 180 40 350 2 4 3 20 70 3 2 12 20 10 15 150 160 0 0 0 1 0 0 1 0 0 square 2
62 175 60 200 6 6 1 15 80 2 2 12 25 20 15 155 160 1 0 0 1 0 0 0 0 1 rectangle 2
63 170 70 650 6 2 3 23 90 3 3 10 25 30 20 170 155 1 0 0 1 0 0 0 1 0 rectangle 2
64 170 70 600 6 4 2 25 80 2 2 6 50 15 15 170 155 0 0 0 1 0 0 0 1 0 rectangle 0
65 170 60 250 6 2 2 30 60 1 2 9 20 15 10 165 165 0 0 0 1 0 0 0 1 0 rectangle 2
66 175 50 650 4 2 1 23 60 2 2 11 20 30 20 170 175 1 0 0 1 0 0 0 1 0 rectangle 1
67 175 70 350 4 3 3 30 50 1 2 10 35 25 15 175 170 0 0 0 1 0 0 1 0 0 rectangle 0
68 165 90 600 6 5 2 23 60 1 3 9 55 10 15 160 165 0 1 0 1 0 0 1 0 0 square 0
69 200 80 600 6 3 1 30 60 2 1 8 30 30 15 175 165 1 0 0 0 1 0 0 0 1 rectangle 1
70 165 50 200 6 5 2 23 60 2 1 12 55 30 15 170 170 0 0 0 0 0 1 0 0 1 round 0
71 175 60 300 4 6 1 15 60 3 2 12 55 20 15 175 165 0 0 0 1 0 0 0 0 1 square 0
72 175 70 200 8 5 4 20 60 1 3 12 60 25 15 175 170 0 1 0 1 0 0 0 1 0 rectangle 2
73 180 60 200 4 4 4 30 70 1 3 8 35 30 10 175 170 0 0 0 1 0 0 1 0 0 rectangle 2
74 170 80 650 6 3 1 30 60 1 2 5 55 30 20 155 175 1 0 0 1 0 0 0 0 1 rectangle 2
75 170 60 500 8 4 1 23 60 3 2 7 60 30 15 165 170 0 0 0 1 0 0 0 1 0 square 2
76 175 70 250 6 4 2 30 60 1 2 12 65 20 15 170 160 1 0 0 0 0 1 0 0 1 square 2
77 180 50 500 8 5 1 15 70 3 3 8 40 10 15 165 155 0 0 1 0 1 0 0 0 1 rectangle 1
78 175 60 550 6 4 2 20 90 1 2 7 25 30 15 175 165 0 1 0 1 0 0 0 0 1 rectangle 0
79 170 70 600 8 4 4 15 80 3 3 11 60 30 15 175 150 1 0 0 1 0 0 0 0 1 rectangle 1
80 195 60 200 4 5 3 30 60 1 2 8 30 20 15 170 170 0 1 0 1 0 0 0 1 0 square 0
81 180 70 300 6 3 3 20 90 1 3 11 25 20 10 170 150 0 0 0 1 0 0 0 1 0 rectangle 0
82 170 40 550 2 4 3 30 60 1 2 9 35 30 10 170 170 0 0 0 0 0 1 0 1 0 square 1
83 175 60 550 6 5 2 15 90 1 1 11 30 10 15 170 175 1 0 0 1 0 0 0 0 1 rectangle 0
84 180 50 350 4 4 3 23 50 2 2 7 20 30 10 170 175 0 0 0 1 0 0 0 0 1 rectangle 2
85 180 80 600 4 4 1 25 60 1 1 5 55 30 10 170 165 0 0 1 1 0 0 0 0 1 rectangle 1
86 175 50 650 8 2 3 15 50 1 2 10 50 25 15 160 160 0 0 0 1 0 0 0 0 1 square 0
87 175 50 350 2 6 3 23 80 2 2 10 20 25 15 170 155 1 0 0 1 0 0 0 0 1 rectangle 1
88 170 50 350 4 2 4 25 60 2 1 10 20 15 15 150 155 0 1 0 1 0 0 1 0 0 rectangle 0
89 180 50 550 6 5 4 30 90 2 3 7 60 30 15 155 175 0 0 0 1 0 0 0 1 0 rectangle 2
90 170 70 600 6 5 3 15 90 1 2 6 45 10 15 170 170 0 1 0 1 0 0 1 0 0 round 1
91 170 70 300 4 4 2 20 60 1 1 10 15 30 10 165 155 0 0 0 1 0 0 1 0 0 rectangle 1
92 180 50 650 4 2 4 20 80 1 2 8 65 30 15 150 160 0 1 0 1 0 0 0 0 1 rectangle 2
93 170 50 350 6 3 3 30 60 1 3 7 55 30 20 155 170 1 0 0 1 0 0 1 0 0 rectangle 0
94 170 90 400 6 4 1 30 60 3 2 12 70 30 15 170 160 0 0 1 1 0 0 0 1 0 rectangle 1
95 160 70 400 2 6 4 23 70 2 1 9 20 30 10 150 175 0 0 0 1 0 0 0 0 1 square 1
96 170 80 250 4 2 3 30 60 3 1 10 30 30 15 155 165 0 0 0 0 0 1 0 0 1 rectangle 1
97 195 70 250 6 6 4 30 80 3 1 11 20 15 15 170 170 1 0 0 1 0 0 0 0 1 rectangle 2
98 180 50 650 6 6 1 30 90 3 1 7 25 15 15 170 170 1 0 0 1 0 0 0 0 1 rectangle 2
99 195 50 200 6 3 1 23 90 1 1 9 55 25 15 160 170 0 0 0 1 0 0 0 0 1 rectangle 0
100 175 50 200 4 3 3 20 50 2 2 12 15 30 10 170 170 0 0 1 1 0 0 0 1 0 square 1
101 165 70 350 4 4 4 15 90 1 2 12 40 15 15 155 155 0 1 0 1 0 0 0 0 1 rectangle 1
102 180 80 600 4 4 3 25 50 1 2 11 30 10 15 155 170 0 0 1 1 0 0 0 0 1 rectangle 1
103 165 50 300 6 3 1 30 60 1 1 9 40 25 15 160 170 0 0 0 1 0 0 0 1 0 rectangle 1
104 160 50 600 8 2 4 20 60 1 2 12 60 30 15 170 170 0 0 0 1 0 0 1 0 0 square 2
105 170 90 200 2 2 2 15 60 3 2 5 40 20 15 170 160 0 0 0 1 0 0 0 1 0 rectangle 2
106 175 90 600 6 4 2 15 60 1 1 7 20 30 15 175 170 1 0 0 0 0 1 0 1 0 rectangle 2
107 180 70 550 6 3 1 15 90 1 1 9 25 30 15 150 160 1 0 0 1 0 0 0 1 0 rectangle 2
108 170 90 250 8 4 4 30 60 2 3 6 60 25 15 155 155 0 0 0 1 0 0 0 0 1 rectangle 0
109 200 40 500 6 6 2 20 60 3 2 10 50 30 15 170 155 0 0 0 1 0 0 1 0 0 rectangle 0
110 175 70 500 2 3 4 30 60 3 2 5 65 20 15 170 155 1 0 0 1 0 0 0 0 1 rectangle 2
111 165 60 550 6 3 2 30 80 2 1 9 20 25 20 170 175 0 0 0 1 0 0 0 0 1 rectangle 2
112 195 70 350 6 6 2 25 90 2 2 12 50 30 15 150 165 0 0 1 1 0 0 0 1 0 square 2
113 165 90 300 4 3 4 30 60 1 2 9 30 25 15 165 170 0 1 0 0 0 1 0 1 0 rectangle 0
114 195 40 650 6 2 1 23 80 1 2 5 25 25 15 170 165 0 1 0 1 0 0 0 1 0 rectangle 1
115 175 60 200 2 4 3 15 50 3 3 6 25 30 15 155 170 1 0 0 1 0 0 1 0 0 square 0
116 175 70 400 6 4 3 15 60 2 3 11 20 20 15 150 170 1 0 0 0 1 0 0 1 0 rectangle 2
117 195 70 350 6 3 2 30 60 3 2 12 25 25 20 175 175 0 0 0 1 0 0 0 0 1 rectangle 2
118 170 50 500 6 4 3 30 80 2 3 10 60 30 15 170 160 0 1 0 1 0 0 0 0 1 rectangle 0
119 195 60 650 6 4 1 20 70 3 2 5 65 20 20 170 150 0 0 1 0 0 1 0 0 1 rectangle 2
120 170 70 650 8 4 4 25 80 1 2 9 45 30 15 170 170 0 0 1 1 0 0 0 1 0 round 1
121 170 70 650 8 4 2 30 90 1 2 12 30 15 15 170 170 0 0 1 1 0 0 1 0 0 square 0
122 170 60 400 4 6 4 15 60 2 2 11 60 30 15 170 150 0 0 1 1 0 0 1 0 0 square 0
123 175 60 300 8 6 3 20 60 2 2 12 50 25 15 150 175 0 0 1 0 1 0 0 1 0 round 2
124 175 50 400 4 3 1 23 50 3 2 9 50 30 15 150 150 0 0 1 1 0 0 0 1 0 square 0
125 180 40 300 6 4 1 15 50 3 2 10 60 30 15 170 175 0 0 1 0 1 0 0 1 0 rectangle 2
126 195 60 250 6 4 3 25 90 2 2 6 60 30 10 170 175 1 0 0 0 0 1 0 0 1 rectangle 2
127 160 70 300 4 2 1 20 60 2 2 5 40 20 15 160 170 0 0 0 1 0 0 0 1 0 square 2
128 170 60 300 8 6 2 30 80 1 1 10 65 30 15 155 155 0 1 0 1 0 0 0 0 1 square 2
129 160 40 350 6 6 2 15 60 1 1 5 25 30 15 155 170 0 0 1 0 0 1 0 1 0 rectangle 2
130 170 60 500 2 5 3 30 50 3 2 10 60 10 15 165 160 0 0 0 1 0 0 1 0 0 rectangle 1
131 170 60 650 8 3 3 23 90 1 1 10 70 15 15 170 175 1 0 0 1 0 0 1 0 0 rectangle 2
132 170 50 600 4 4 1 20 50 2 2 5 60 25 15 170 160 1 0 0 1 0 0 0 0 1 square 2
133 180 50 350 6 5 2 25 90 3 2 5 20 30 15 175 160 0 0 0 1 0 0 1 0 0 rectangle 0
134 170 90 200 4 2 4 20 90 3 2 10 20 25 15 170 175 0 0 0 1 0 0 0 0 1 rectangle 1
135 200 40 350 6 6 1 30 80 1 1 5 60 25 20 170 175 0 0 1 1 0 0 0 1 0 rectangle 2
136 165 60 250 2 3 2 25 60 1 1 8 20 15 15 170 170 0 1 0 1 0 0 0 0 1 rectangle 0
137 175 70 250 6 6 4 15 60 2 2 11 50 30 15 175 175 0 1 0 0 0 1 0 1 0 rectangle 2
138 180 50 350 6 4 2 25 70 3 2 5 45 25 15 170 170 0 0 0 0 0 1 1 0 0 rectangle 0
139 195 60 600 6 4 2 20 50 1 1 10 35 15 15 165 175 1 0 0 1 0 0 0 1 0 round 2
140 180 60 300 8 4 4 25 80 1 1 5 60 30 15 165 170 0 0 0 1 0 0 0 0 1 rectangle 1
141 200 60 500 8 4 1 23 70 2 2 8 15 30 15 160 170 0 0 0 1 0 0 1 0 0 rectangle 0
142 170 60 550 6 4 4 30 60 2 2 6 65 20 15 175 165 0 1 0 1 0 0 0 0 1 rectangle 1
143 170 40 600 2 2 1 15 70 1 2 11 30 25 20 175 165 0 0 0 1 0 0 0 0 1 rectangle 0
144 175 70 250 6 4 3 30 60 1 2 10 60 30 20 155 175 0 1 0 1 0 0 1 0 0 rectangle 2
145 180 50 250 4 5 3 15 80 1 2 6 60 30 15 170 170 0 0 0 1 0 0 0 0 1 rectangle 2
146 165 50 350 6 4 4 25 80 1 2 12 25 15 15 155 165 1 0 0 1 0 0 0 0 1 rectangle 0
147 170 60 500 6 5 4 23 60 1 2 10 15 30 20 160 170 1 0 0 1 0 0 1 0 0 rectangle 1
148 170 50 400 6 4 3 20 60 2 3 6 35 10 15 170 175 0 0 1 1 0 0 0 0 1 rectangle 1
149 195 80 650 8 4 3 30 90 1 1 6 15 20 10 165 160 1 0 0 0 1 0 1 0 0 rectangle 2
150 165 90 500 8 3 4 20 60 2 2 5 25 30 15 165 170 0 1 0 0 0 1 0 0 1 rectangle 1
151 160 80 200 2 4 4 30 80 3 1 5 50 25 15 170 160 0 1 0 1 0 0 0 1 0 rectangle 0
152 180 50 500 2 6 1 15 60 1 1 8 65 20 15 170 170 1 0 0 0 0 1 1 0 0 rectangle 2
153 165 60 600 6 4 1 30 70 3 3 11 15 30 10 170 170 0 0 0 1 0 0 1 0 0 rectangle 0
154 180 60 600 2 3 2 30 70 1 2 6 55 15 15 150 165 1 0 0 1 0 0 0 0 1 rectangle 2
155 160 60 400 2 6 4 15 60 1 1 9 55 30 10 170 160 1 0 0 1 0 0 1 0 0 rectangle 0
156 180 60 250 4 3 2 25 80 3 1 6 25 25 20 170 160 0 1 0 0 1 0 0 1 0 square 2
157 195 50 200 6 4 3 30 70 3 2 6 35 30 15 165 170 1 0 0 0 0 1 1 0 0 rectangle 2
158 170 50 650 6 5 2 15 60 3 2 12 35 30 10 170 175 1 0 0 0 1 0 0 1 0 rectangle 0
159 160 70 400 6 3 2 20 50 1 2 9 20 30 15 155 155 0 0 1 0 0 1 1 0 0 rectangle 0
160 175 90 600 6 4 4 23 80 3 3 7 20 20 15 155 160 1 0 0 1 0 0 0 1 0 rectangle 0
161 180 50 400 4 4 1 23 70 1 2 12 20 30 20 165 170 0 1 0 1 0 0 0 0 1 rectangle 1
162 170 90 250 6 3 3 20 80 2 2 12 25 15 15 170 155 0 0 0 1 0 0 0 1 0 round 2
163 170 60 200 2 6 1 23 80 3 1 10 30 30 15 170 175 0 1 0 0 0 1 0 1 0 rectangle 2
164 175 50 650 2 5 3 25 70 3 2 11 60 25 15 175 160 0 1 0 1 0 0 0 0 1 rectangle 2
165 195 90 400 6 3 3 23 60 1 2 7 35 25 20 170 155 0 0 0 1 0 0 1 0 0 round 1
166 180 50 600 6 3 4 25 60 2 2 10 20 10 15 155 175 0 1 0 1 0 0 0 1 0 square 0
167 200 50 500 6 3 3 15 90 2 1 6 20 25 10 170 155 0 1 0 1 0 0 0 0 1 rectangle 1
168 200 60 200 6 2 3 20 60 3 3 5 20 10 15 170 170 1 0 0 1 0 0 0 1 0 rectangle 1
169 200 60 300 4 5 3 20 90 3 2 12 30 25 15 155 160 0 0 1 1 0 0 0 0 1 rectangle 0
170 180 70 250 6 4 3 30 50 1 2 12 35 25 10 155 150 0 0 0 1 0 0 1 0 0 rectangle 1
171 175 70 200 4 6 4 30 60 2 2 5 25 30 15 150 160 0 0 1 1 0 0 0 1 0 square 0
172 165 90 400 2 5 1 30 90 3 2 6 70 30 15 170 170 0 1 0 1 0 0 0 0 1 rectangle 2
173 165 70 200 6 6 4 20 70 1 1 5 65 20 20 175 155 0 0 0 1 0 0 0 1 0 round 0
174 180 50 650 2 3 3 20 70 3 2 12 40 30 15 155 170 0 0 0 1 0 0 0 0 1 rectangle 1
175 180 40 200 6 3 2 30 80 3 3 7 60 30 10 175 150 0 1 0 1 0 0 1 0 0 rectangle 2
176 180 60 400 2 5 3 20 50 1 3 5 20 30 15 175 150 0 1 0 1 0 0 0 1 0 rectangle 1
177 200 50 400 4 6 4 23 60 2 2 7 55 20 15 160 170 0 1 0 1 0 0 0 0 1 round 0
178 180 50 550 6 4 3 20 50 2 2 8 20 25 20 170 170 1 0 0 0 0 1 1 0 0 rectangle 0
179 175 70 250 8 4 1 20 50 2 3 6 60 30 15 170 170 0 0 0 0 1 0 0 0 1 square 0
180 195 70 400 6 4 4 23 60 3 1 7 65 25 15 170 150 1 0 0 1 0 0 0 1 0 rectangle 1
181 160 50 500 6 4 3 25 50 1 1 11 55 10 15 170 170 0 0 0 0 0 1 0 0 1 rectangle 1
182 180 90 500 6 3 3 23 60 2 1 8 20 30 15 170 170 0 0 0 0 0 1 0 1 0 rectangle 1
183 170 70 650 2 3 3 25 80 1 3 8 45 20 10 170 170 0 1 0 1 0 0 0 1 0 round 2
184 195 70 600 6 4 2 25 60 1 2 6 40 30 15 155 170 1 0 0 1 0 0 0 0 1 rectangle 1
185 165 70 200 6 4 1 20 60 1 2 8 45 15 15 170 150 0 1 0 1 0 0 0 0 1 round 1
186 165 80 200 4 4 3 30 60 1 1 8 25 30 10 160 170 0 1 0 1 0 0 1 0 0 round 0
187 175 60 600 4 2 3 20 60 1 2 6 25 20 15 170 155 0 0 0 1 0 0 1 0 0 rectangle 2
188 180 70 500 6 4 3 30 70 2 2 7 55 30 15 170 150 1 0 0 1 0 0 1 0 0 square 1
189 180 50 600 2 4 4 30 60 3 1 9 40 25 15 170 170 1 0 0 0 0 1 0 0 1 rectangle 0
190 160 50 600 8 3 2 20 60 3 2 12 30 30 15 165 150 0 0 0 0 1 0 1 0 0 rectangle 2
191 180 60 200 6 2 1 30 60 3 2 7 20 30 15 175 160 1 0 0 1 0 0 1 0 0 rectangle 2
192 195 70 600 6 4 3 23 80 2 2 12 50 25 10 170 170 0 0 0 0 0 1 1 0 0 rectangle 1
193 180 60 250 6 3 1 15 60 2 3 5 60 30 20 175 165 1 0 0 0 0 1 0 1 0 rectangle 1
194 170 70 250 6 4 1 20 90 2 2 10 25 20 20 175 170 0 0 0 1 0 0 0 1 0 round 1
195 180 90 250 6 3 1 25 50 1 2 9 55 30 15 170 175 1 0 0 0 0 1 1 0 0 rectangle 1
196 160 70 550 6 3 4 30 90 3 2 10 60 20 15 165 165 0 1 0 1 0 0 0 1 0 round 0
197 175 60 200 8 2 3 15 60 1 2 11 50 30 15 165 175 0 1 0 1 0 0 0 0 1 rectangle 1
198 170 80 500 6 3 2 25 50 1 1 5 60 20 15 175 150 1 0 0 1 0 0 0 1 0 square 2
199 180 50 600 4 4 4 15 80 1 1 5 50 20 15 170 170 1 0 0 1 0 0 0 1 0 rectangle 2
【问题讨论】:
这当然很有趣,因为您需要获取值范围而不是特定值。 您可以使用lime
进行敏感性分析。但如果您想进行敏感性分析,最好选择LinearRegression 作为分类器。特别是因为你的大部分特征都是数字的,除了 Meat_Freshness,Food_Taste 似乎是分类的(所以对于 LR,你需要一次性处理它们,不要将它们视为数字)。
相关:Lime vs TreeInterpreter for interpreting decision tree, Random Forests interpretability
这个问题目前contains no code which violates SO guidelines (and appears to be coursework),所以为了使它合法,你需要用lime
(或其他包)尝试它,编辑它以向我们展示你的代码和输出,以及你在哪里卡住了。
@DavidLee 你是对的。它必须是特定的值。以上已编辑。我是 ML 新手。你有什么建议如何从特征重要性中找到对标签的 Good:2 分类有贡献的 3 个最重要特征的值的组合
【参考方案1】:
非常简单的解决方案可以使用您的数据运行决策树分类器并使用 grapviz 库可视化树这里是文档https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html , 得到代码生成的点文件后,还可以在 webgraphiz 中进行可视化。此练习的结果可能是您期望的范围值。
【讨论】:
以上是关于映射 - 特征重要性与标签分类的主要内容,如果未能解决你的问题,请参考以下文章