映射 - 特征重要性与标签分类

Posted

技术标签:

【中文标题】映射 - 特征重要性与标签分类【英文标题】:Mapping - Feature Importance vs Label classification 【发布时间】:2021-07-18 20:34:29 【问题描述】:

我有一组与香草磅蛋糕烘焙相关的数据(200 行),具有 27 个特征,如下所示。标签caketaste 是衡量烤蛋糕有多好的标准,由bad(0)neutral(1)good(2) 定义。

Features = cake_id, flour_g, butter_g, sugar_g, salt_g, eggs_count, bakingpowder_g, milk_ml, water_ml, vanillaextract_ml, lemonzest_g, mixingtime_min, bakingtime_min, preheattime_min, coolingtime_min, bakingtemp_c, preheattemp_c, color_red, color_green, color_blue, traysize_small, traysize_medium, traysize_large, milktype_lowfat, milktype_skim, milktype_whole, trayshape.

Label = caketaste ["bad", "neutral", "good"]

我的任务是找到: a) 影响标签结果的 5 个最重要的特征; b) 找出对标签中“良好”分类做出贡献的 5 个最重要特征的值。

我可以使用 sklearn (Python) 解决这个问题,使用 RandomForestClassifier() 拟合数据,然后使用 Feature_Importance() 识别 5 个最重要的特征,即 mixingtime_minbakingtime_minsugar_g、@987654336 @和preheattemp_c

最小、完整且可验证的示例:

#################################################################
# a) Libraries
#################################################################

import pandas as pd 
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.inspection import permutation_importance
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MaxAbsScaler
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
import time

#################################################################
# b) Data Loading Symlinks
#################################################################

df = pd.read_excel("poundcake.xlsx", sheet_name="Sheet0", engine='openpyxl')

#################################################################
# c) Analyzing Dataframe
#################################################################

#Getting dataframe details e.g columns, total entries, data types etc
print("\n<syntax>: df.info()")
df.info()

#Getting the 1st 5 lines in the dataframe
print("\n<syntax>: df.head()")
df.head()

#################################################################
# d) Data Visualization
#################################################################

#Scatterplot SiteID vs LTE - Spectral Efficiency
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.scatter(df["cake_id"], df["caketaste"], color='r')
ax.set_xlabel('cake_id')
ax.set_ylabel('caketaste')
ax.set_title('scatter plot')
plt.show()

#################################################################
# e) Feature selection
#################################################################

#Note: 
#Machine learning models cannot work well with categorical (string) data, specifically scikit-learn. 
#Need to convert the categorical variables into numeric types before building a machine learning model. 

categorical_columns = ["trayshape"]
numerical_columns = ["flour_g","butter_g","sugar_g","salt_g","eggs_count","bakingpowder_g","milk_ml","water_ml","vanillaextract_ml","lemonzest_g","mixingtime_min","bakingtime_min","preheattime_min","coolingtime_min","bakingtemp_c","preheattemp_c","color_red","color_green","color_blue","traysize_small","traysize_medium","traysize_large","milktype_lowfat","milktype_skim","milktype_whole"]

#################################################################
# f) Dataset (Train Test Split)
#
#                         (Dataset)
# ┌──────────────────────────────────────────┐  
#  ┌──────────────────────────┬────────────┐ 
#  |          Training        │ Test       │ 
#  └──────────────────────────┴────────────┘ 
#################################################################

# Prediction target - Training data
X = df[categorical_columns + numerical_columns]

# Prediction target - Training data
y = df["caketaste"] 

# Break off validation set from training data. Default: train_size=0.75, test_size=0.25
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=42)

#################################################################
# Pipeline
#################################################################

#######################
# g) Column Transformer
#######################
categorical_encoder = OneHotEncoder(handle_unknown='ignore')

#Mean might not be suitable, Remove rows?
numerical_pipe = Pipeline([
    ('imp', SimpleImputer(strategy='mean'))
])

preprocessing = ColumnTransformer(
    [('cat', categorical_encoder, categorical_columns),
     ('num', numerical_pipe, numerical_columns)])

#####################
# b) Pipeline Printer
#####################
#RF: builds multiple decision trees and merges (bagging) them together 
#to get a more accurate and stable prediction (averaging).

pipe_xxx_xxx_rfo = Pipeline([
    ('pre', preprocessing),
    ('scl', None),
    ('pca', None),
    ('clf', RandomForestClassifier(random_state=42))
    ])

pipe_abs_xxx_rfo = Pipeline([
    ('pre', preprocessing),    
    ('scl', MaxAbsScaler()),
    ('pca', None),
    ('clf', RandomForestClassifier(random_state=42))
    ])

#################################################################
# h) Hyper-Parameter Tuning
#################################################################
parameters_rfo = 
        'clf__n_estimators':[100], 
        'clf__criterion':['gini'], 
        'clf__min_samples_split':[2,5], 
        'clf__min_samples_leaf':[1,2]
    

parameters_rfo_bk = 
        'clf__n_estimators':[10,20,30,40,50,60,70,80,90,100,1000], 
        'clf__criterion':['gini','entropy'], 
        'clf__min_samples_split':[5,10,15,20,25,30], 
        'clf__min_samples_leaf':[1,2,3,4,5]
    

#########################
# i) GridSearch Printer
#########################    

# scoring can be used as 'accuracy' or for MAE use 'neg_mean_absolute_error'  
scr='accuracy'

grid_xxx_xxx_rfo = GridSearchCV(pipe_xxx_xxx_rfo,
    param_grid=parameters_rfo,
    scoring=scr,
    cv=5,
    refit=True) 

grid_abs_xxx_rfo = GridSearchCV(pipe_abs_xxx_rfo,
    param_grid=parameters_rfo,
    scoring=scr,
    cv=5,
    refit=True)

print("Pipeline setup.... Complete")

###################################################
# Machine Learning Models Evaluation Algorithm
################################################### 
grids = [grid_xxx_xxx_rfo, grid_abs_xxx_rfo]   

grid_dict =     0: 'RandomForestClassifier', 
                 1: 'RandomForestClassifier with AbsMaxScaler',   
        

# Fit the grid search objects
print('Performing model optimizations...\n')
best_test_scr = -999999999999999 #Python3 does not allow to use None anymore
best_clf = 0
best_gs = ''

for idx, grid in enumerate(grids):
    start_time = time.time()

    print('*' * 100)
    print('\nEstimator: %s' % grid_dict[idx])   
    # Fit grid search   
    grid.fit(X_train, y_train)
    
    #Calculate the score once and use when needed
    test_scr = grid.score(X_test,y_test)
    train_scr = grid.score(X_train,y_train)
    
    # Track best (lowest grid.score) model
    if test_scr > best_test_scr:
        best_test_scr = test_scr
        best_train_scr = train_scr
        best_rf = grid
        best_clf = idx
        print("..........................this model is better. SELECTED")
    
    print("Best params                          : %s" % grid.best_params_)
    print("Training accuracy                    : %s" % best_train_scr)
    print("Test accuracy                        : %s" % best_test_scr)
    print("Modeling time                        : %s" % time.strftime("%H:%M:%S", time.gmtime(time.time() - start_time)))

print('\nClassifier with best test set score: %s' % grid_dict[best_clf])  

#########################################################################################
# j) Feature Importance using Gini Importance or Mean Decrease in Impurity (MDI)
# Note:
# 1.Calculates each feature importance as the sum over the number of splits (accross 
# all trees) that include the feature, proportionaly to the number of samples it splits.
# 2. Biased towards cardinality i.e numerical variables
########################################################################################

ohe = (best_rf.best_estimator_.named_steps['pre'].named_transformers_['cat'])
feature_names = ohe.get_feature_names(input_features=categorical_columns)
feature_names = np.r_[feature_names, numerical_columns]

tree_feature_importances = (best_rf.best_estimator_.named_steps['clf'].feature_importances_)
sorted_idx = tree_feature_importances.argsort()

# Figure: Top Features
count=-28
y_ticks = np.arange(0, abs(count))         
fig, ax = plt.subplots()
ax.barh(y_ticks[count:], tree_feature_importances[sorted_idx][count:])
ax.set_yticklabels(feature_names[sorted_idx][count:], fontsize=7)
ax.set_yticks(y_ticks[count:])
ax.set_title("Random Forest Tree's Feature Importance from Mean Decrease in Impurity (MDI)")
fig.tight_layout()
plt.show()

可以使用什么方法来解决任务 b)?我正在尝试回答以下研究问题,

mixingtime_minbakingtime_minflour_gsugar_gpreheattemp_c 在统计上对良好的caketaste 有什么贡献(良好:2)?

可能的预期结果:

mixingtime_min = [5,10,15] AND
bakingtime_min = [50,51,52,53,54,55] AND
flour_g = [150,160,170,180] AND
sugar_g = [200, 250] AND
preheattemp_c = [150,160,170]

上面的结果基本上可以得出结论,如果一个人想要一个好吃的蛋糕,他需要用 150-180g 面粉和 200-250g 糖烘烤他的蛋糕,并在 5-15 分钟之间混合面团,然后再烘烤 50在 150-170ºC 的预热烤箱中 -55 分钟。

希望大家指点一下。

问题

您能指导我如何着手解决这个研究问题吗? sklearn 中是否有任何库或其他能够获取此信息的库? 任何额外的信息,如置信区间、异常值等都是额外的。

数据(磅蛋糕.xlsx):

cake_id flour_g butter_g    sugar_g salt_g  eggs_count  bakingpowder_g  milk_ml water_ml    vanillaextract_ml   lemonzest_g mixingtime_min  bakingtime_min  preheattime_min coolingtime_min bakingtemp_c    preheattemp_c   color_red   color_green color_blue  traysize_small  traysize_medium traysize_large  milktype_lowfat milktype_skim   milktype_whole  trayshape   caketaste
0   180 50  250 2   3   3   15  80  1   2   10  30  25  15  170 175 1   0   0   1   0   0   1   0   0   square  1
1   195 50  500 6   6   1   30  60  1   2   10  40  30  10  170 170 0   1   0   1   0   0   0   1   0   rectangle   1
2   160 40  600 6   5   1   15  90  3   3   5   30  30  10  155 160 1   0   0   1   0   0   0   0   1   square  2
3   200 80  350 8   4   2   15  50  1   1   7   40  20  10  175 165 0   1   0   1   0   0   0   0   1   rectangle   0
4   175 90  400 6   6   4   25  90  1   1   9   60  25  15  160 155 1   0   0   0   0   1   0   1   0   rectangle   0
5   180 60  650 6   3   4   20  80  2   3   7   15  20  20  155 160 0   0   1   0   0   1   0   1   0   rectangle   2
6   165 50  200 6   4   2   20  80  1   2   7   60  30  20  150 170 0   1   0   1   0   0   1   0   0   rectangle   0
7   170 70  200 6   2   3   25  50  2   3   8   70  20  10  170 150 0   1   0   1   0   0   0   1   0   rectangle   1
8   160 90  300 8   4   4   25  60  3   2   9   35  30  15  175 170 0   1   0   1   0   0   1   0   0   square  1
9   165 50  350 6   4   1   25  80  1   2   11  30  10  10  170 170 1   0   0   0   1   0   1   0   0   square  1
10  180 90  650 4   3   4   20  50  2   3   8   30  30  15  165 170 1   0   0   1   0   0   0   1   0   square  1
11  165 40  350 6   2   2   30  60  3   3   5   50  25  15  175 170 0   0   1   1   0   0   0   0   1   rectangle   1
12  175 70  500 6   2   1   25  80  1   1   7   60  20  15  170 170 0   1   0   1   0   0   1   0   0   square  2
13  175 70  350 6   2   1   15  60  2   3   9   45  30  15  175 170 0   0   0   1   0   0   0   1   0   rectangle   1
14  160 70  600 4   6   4   30  60  2   3   5   60  25  10  150 155 0   1   0   1   0   0   0   1   0   rectangle   0
15  165 50  500 2   3   4   20  60  1   3   10  30  15  20  175 175 0   1   0   1   0   0   1   0   0   rectangle   0
16  195 50  600 6   5   2   25  60  1   1   5   30  10  20  170 150 0   0   0   1   0   0   0   0   1   square  2
17  160 60  600 8   5   4   25  70  3   3   9   30  30  10  175 150 0   0   0   1   0   0   1   0   0   rectangle   0
18  160 80  550 6   3   3   23  80  1   1   9   25  30  15  155 170 0   0   1   1   0   0   0   0   1   rectangle   1
19  170 60  600 4   5   1   20  90  3   3   10  55  20  15  165 155 0   0   1   1   0   0   0   0   1   square  0
20  175 70  300 6   5   4   25  70  1   1   11  65  15  20  170 155 0   0   1   1   0   0   0   1   0   round   0
21  195 80  250 6   6   2   23  70  2   3   11  20  30  15  170 155 0   0   1   1   0   0   1   0   0   rectangle   0
22  170 90  650 6   3   4   20  70  1   2   10  60  25  15  170 155 0   0   1   0   0   1   0   1   0   rectangle   1
23  180 40  200 6   3   1   15  60  3   1   5   35  15  15  170 170 0   1   0   1   0   0   0   1   0   rectangle   2
24  165 50  550 8   4   2   23  80  1   2   5   65  30  15  155 175 0   0   0   1   0   0   1   0   0   rectangle   1
25  170 50  250 6   2   3   25  70  2   2   6   30  20  15  165 175 0   0   0   0   0   1   0   1   0   rectangle   2
26  180 50  200 6   4   2   30  80  1   3   10  30  20  15  165 165 0   0   0   1   0   0   0   1   0   rectangle   2
27  200 90  500 6   3   4   25  70  2   1   9   60  30  15  170 160 0   1   0   1   0   0   0   1   0   rectangle   2
28  170 60  300 6   2   3   25  80  1   1   9   15  15  15  160 150 1   0   0   0   0   1   0   0   1   round   1
29  170 60  400 2   3   2   25  60  1   3   9   25  15  15  160 175 0   0   0   1   0   0   1   0   0   square  0
30  195 50  650 4   5   2   25  60  1   3   7   40  15  15  165 170 0   1   0   1   0   0   1   0   0   rectangle   1
31  170 50  350 6   6   1   25  80  2   2   8   50  25  15  150 170 0   1   0   1   0   0   1   0   0   rectangle   2
32  160 80  550 4   4   4   20  70  1   3   7   25  25  15  170 165 1   0   0   0   0   1   0   0   1   rectangle   2
33  170 50  300 4   4   2   23  50  2   2   10  30  20  15  150 170 0   0   0   1   0   0   1   0   0   rectangle   0
34  175 70  650 4   4   1   23  70  3   3   10  55  10  15  150 170 0   0   1   1   0   0   0   0   1   rectangle   0
35  180 70  400 6   2   2   20  60  1   1   6   55  30  15  170 150 0   0   0   1   0   0   1   0   0   square  2
36  195 60  300 6   6   4   23  70  2   2   10  30  30  15  170 175 1   0   0   1   0   0   1   0   0   rectangle   0
37  180 70  400 6   4   1   20  70  3   2   9   30  30  20  160 170 1   0   0   1   0   0   0   1   0   rectangle   2
38  170 90  600 8   3   1   20  50  1   2   9   30  30  15  155 170 1   0   0   1   0   0   0   1   0   rectangle   2
39  180 60  200 2   3   2   20  70  1   2   10  55  30  20  165 155 0   1   0   1   0   0   0   1   0   round   2
40  180 70  400 6   4   2   15  60  1   3   7   45  30  10  170 175 0   0   0   1   0   0   0   1   0   rectangle   2
41  170 70  200 6   3   1   30  60  3   2   6   40  15  15  170 175 0   0   1   1   0   0   0   0   1   rectangle   2
42  170 60  550 6   3   4   20  80  1   2   9   60  20  15  150 165 1   0   0   1   0   0   1   0   0   round   2
43  170 50  600 6   4   3   30  60  1   2   11  15  30  15  155 150 1   0   0   0   1   0   1   0   0   rectangle   0
44  175 70  200 4   4   3   30  70  3   2   6   20  10  20  170 170 0   0   0   1   0   0   1   0   0   rectangle   1
45  195 70  500 8   4   2   25  60  2   3   6   15  30  15  165 170 1   0   0   0   0   1   0   1   0   rectangle   2
46  180 80  200 4   4   4   15  80  1   3   6   50  30  15  155 150 0   0   0   1   0   0   0   1   0   rectangle   2
47  165 50  350 6   4   2   20  60  1   1   9   40  20  15  150 155 0   0   0   1   0   0   1   0   0   rectangle   0
48  170 70  550 2   2   4   20  60  3   2   9   55  30  15  165 165 0   1   0   1   0   0   0   0   1   round   0
49  175 70  350 6   5   4   30  80  1   2   9   55  30  10  155 170 0   0   0   0   0   1   1   0   0   rectangle   1
50  180 50  400 6   4   3   25  50  2   2   9   20  20  20  160 160 0   0   0   1   0   0   0   1   0   rectangle   2
51  165 50  650 6   5   4   20  60  1   2   5   60  30  15  175 170 0   0   1   1   0   0   0   0   1   square  0
52  170 70  200 2   6   3   25  60  1   3   8   35  25  15  170 155 1   0   0   1   0   0   0   0   1   rectangle   1
53  180 40  350 4   4   3   30  60  3   2   12  45  30  15  150 175 0   0   0   1   0   0   0   1   0   rectangle   1
54  175 50  600 8   3   1   20  80  2   1   7   30  15  15  150 160 0   0   0   1   0   0   0   0   1   square  0
55  175 70  400 4   3   1   25  90  1   2   5   50  30  10  170 170 1   0   0   0   0   1   1   0   0   rectangle   1
56  170 50  650 6   6   3   20  70  1   1   6   25  30  15  170 160 1   0   0   1   0   0   0   1   0   rectangle   2
57  200 70  650 6   3   1   15  60  2   1   10  25  10  15  170 150 0   1   0   1   0   0   0   0   1   rectangle   2
58  175 80  650 6   5   2   23  70  1   1   5   45  15  15  160 170 0   1   0   1   0   0   0   0   1   rectangle   1
59  170 50  200 8   3   4   30  70  1   3   11  35  25  15  170 170 0   0   0   1   0   0   0   1   0   rectangle   1
60  170 60  300 6   3   1   20  60  3   3   11  20  30  15  170 170 1   0   0   1   0   0   0   0   1   rectangle   0
61  180 40  350 2   4   3   20  70  3   2   12  20  10  15  150 160 0   0   0   1   0   0   1   0   0   square  2
62  175 60  200 6   6   1   15  80  2   2   12  25  20  15  155 160 1   0   0   1   0   0   0   0   1   rectangle   2
63  170 70  650 6   2   3   23  90  3   3   10  25  30  20  170 155 1   0   0   1   0   0   0   1   0   rectangle   2
64  170 70  600 6   4   2   25  80  2   2   6   50  15  15  170 155 0   0   0   1   0   0   0   1   0   rectangle   0
65  170 60  250 6   2   2   30  60  1   2   9   20  15  10  165 165 0   0   0   1   0   0   0   1   0   rectangle   2
66  175 50  650 4   2   1   23  60  2   2   11  20  30  20  170 175 1   0   0   1   0   0   0   1   0   rectangle   1
67  175 70  350 4   3   3   30  50  1   2   10  35  25  15  175 170 0   0   0   1   0   0   1   0   0   rectangle   0
68  165 90  600 6   5   2   23  60  1   3   9   55  10  15  160 165 0   1   0   1   0   0   1   0   0   square  0
69  200 80  600 6   3   1   30  60  2   1   8   30  30  15  175 165 1   0   0   0   1   0   0   0   1   rectangle   1
70  165 50  200 6   5   2   23  60  2   1   12  55  30  15  170 170 0   0   0   0   0   1   0   0   1   round   0
71  175 60  300 4   6   1   15  60  3   2   12  55  20  15  175 165 0   0   0   1   0   0   0   0   1   square  0
72  175 70  200 8   5   4   20  60  1   3   12  60  25  15  175 170 0   1   0   1   0   0   0   1   0   rectangle   2
73  180 60  200 4   4   4   30  70  1   3   8   35  30  10  175 170 0   0   0   1   0   0   1   0   0   rectangle   2
74  170 80  650 6   3   1   30  60  1   2   5   55  30  20  155 175 1   0   0   1   0   0   0   0   1   rectangle   2
75  170 60  500 8   4   1   23  60  3   2   7   60  30  15  165 170 0   0   0   1   0   0   0   1   0   square  2
76  175 70  250 6   4   2   30  60  1   2   12  65  20  15  170 160 1   0   0   0   0   1   0   0   1   square  2
77  180 50  500 8   5   1   15  70  3   3   8   40  10  15  165 155 0   0   1   0   1   0   0   0   1   rectangle   1
78  175 60  550 6   4   2   20  90  1   2   7   25  30  15  175 165 0   1   0   1   0   0   0   0   1   rectangle   0
79  170 70  600 8   4   4   15  80  3   3   11  60  30  15  175 150 1   0   0   1   0   0   0   0   1   rectangle   1
80  195 60  200 4   5   3   30  60  1   2   8   30  20  15  170 170 0   1   0   1   0   0   0   1   0   square  0
81  180 70  300 6   3   3   20  90  1   3   11  25  20  10  170 150 0   0   0   1   0   0   0   1   0   rectangle   0
82  170 40  550 2   4   3   30  60  1   2   9   35  30  10  170 170 0   0   0   0   0   1   0   1   0   square  1
83  175 60  550 6   5   2   15  90  1   1   11  30  10  15  170 175 1   0   0   1   0   0   0   0   1   rectangle   0
84  180 50  350 4   4   3   23  50  2   2   7   20  30  10  170 175 0   0   0   1   0   0   0   0   1   rectangle   2
85  180 80  600 4   4   1   25  60  1   1   5   55  30  10  170 165 0   0   1   1   0   0   0   0   1   rectangle   1
86  175 50  650 8   2   3   15  50  1   2   10  50  25  15  160 160 0   0   0   1   0   0   0   0   1   square  0
87  175 50  350 2   6   3   23  80  2   2   10  20  25  15  170 155 1   0   0   1   0   0   0   0   1   rectangle   1
88  170 50  350 4   2   4   25  60  2   1   10  20  15  15  150 155 0   1   0   1   0   0   1   0   0   rectangle   0
89  180 50  550 6   5   4   30  90  2   3   7   60  30  15  155 175 0   0   0   1   0   0   0   1   0   rectangle   2
90  170 70  600 6   5   3   15  90  1   2   6   45  10  15  170 170 0   1   0   1   0   0   1   0   0   round   1
91  170 70  300 4   4   2   20  60  1   1   10  15  30  10  165 155 0   0   0   1   0   0   1   0   0   rectangle   1
92  180 50  650 4   2   4   20  80  1   2   8   65  30  15  150 160 0   1   0   1   0   0   0   0   1   rectangle   2
93  170 50  350 6   3   3   30  60  1   3   7   55  30  20  155 170 1   0   0   1   0   0   1   0   0   rectangle   0
94  170 90  400 6   4   1   30  60  3   2   12  70  30  15  170 160 0   0   1   1   0   0   0   1   0   rectangle   1
95  160 70  400 2   6   4   23  70  2   1   9   20  30  10  150 175 0   0   0   1   0   0   0   0   1   square  1
96  170 80  250 4   2   3   30  60  3   1   10  30  30  15  155 165 0   0   0   0   0   1   0   0   1   rectangle   1
97  195 70  250 6   6   4   30  80  3   1   11  20  15  15  170 170 1   0   0   1   0   0   0   0   1   rectangle   2
98  180 50  650 6   6   1   30  90  3   1   7   25  15  15  170 170 1   0   0   1   0   0   0   0   1   rectangle   2
99  195 50  200 6   3   1   23  90  1   1   9   55  25  15  160 170 0   0   0   1   0   0   0   0   1   rectangle   0
100 175 50  200 4   3   3   20  50  2   2   12  15  30  10  170 170 0   0   1   1   0   0   0   1   0   square  1
101 165 70  350 4   4   4   15  90  1   2   12  40  15  15  155 155 0   1   0   1   0   0   0   0   1   rectangle   1
102 180 80  600 4   4   3   25  50  1   2   11  30  10  15  155 170 0   0   1   1   0   0   0   0   1   rectangle   1
103 165 50  300 6   3   1   30  60  1   1   9   40  25  15  160 170 0   0   0   1   0   0   0   1   0   rectangle   1
104 160 50  600 8   2   4   20  60  1   2   12  60  30  15  170 170 0   0   0   1   0   0   1   0   0   square  2
105 170 90  200 2   2   2   15  60  3   2   5   40  20  15  170 160 0   0   0   1   0   0   0   1   0   rectangle   2
106 175 90  600 6   4   2   15  60  1   1   7   20  30  15  175 170 1   0   0   0   0   1   0   1   0   rectangle   2
107 180 70  550 6   3   1   15  90  1   1   9   25  30  15  150 160 1   0   0   1   0   0   0   1   0   rectangle   2
108 170 90  250 8   4   4   30  60  2   3   6   60  25  15  155 155 0   0   0   1   0   0   0   0   1   rectangle   0
109 200 40  500 6   6   2   20  60  3   2   10  50  30  15  170 155 0   0   0   1   0   0   1   0   0   rectangle   0
110 175 70  500 2   3   4   30  60  3   2   5   65  20  15  170 155 1   0   0   1   0   0   0   0   1   rectangle   2
111 165 60  550 6   3   2   30  80  2   1   9   20  25  20  170 175 0   0   0   1   0   0   0   0   1   rectangle   2
112 195 70  350 6   6   2   25  90  2   2   12  50  30  15  150 165 0   0   1   1   0   0   0   1   0   square  2
113 165 90  300 4   3   4   30  60  1   2   9   30  25  15  165 170 0   1   0   0   0   1   0   1   0   rectangle   0
114 195 40  650 6   2   1   23  80  1   2   5   25  25  15  170 165 0   1   0   1   0   0   0   1   0   rectangle   1
115 175 60  200 2   4   3   15  50  3   3   6   25  30  15  155 170 1   0   0   1   0   0   1   0   0   square  0
116 175 70  400 6   4   3   15  60  2   3   11  20  20  15  150 170 1   0   0   0   1   0   0   1   0   rectangle   2
117 195 70  350 6   3   2   30  60  3   2   12  25  25  20  175 175 0   0   0   1   0   0   0   0   1   rectangle   2
118 170 50  500 6   4   3   30  80  2   3   10  60  30  15  170 160 0   1   0   1   0   0   0   0   1   rectangle   0
119 195 60  650 6   4   1   20  70  3   2   5   65  20  20  170 150 0   0   1   0   0   1   0   0   1   rectangle   2
120 170 70  650 8   4   4   25  80  1   2   9   45  30  15  170 170 0   0   1   1   0   0   0   1   0   round   1
121 170 70  650 8   4   2   30  90  1   2   12  30  15  15  170 170 0   0   1   1   0   0   1   0   0   square  0
122 170 60  400 4   6   4   15  60  2   2   11  60  30  15  170 150 0   0   1   1   0   0   1   0   0   square  0
123 175 60  300 8   6   3   20  60  2   2   12  50  25  15  150 175 0   0   1   0   1   0   0   1   0   round   2
124 175 50  400 4   3   1   23  50  3   2   9   50  30  15  150 150 0   0   1   1   0   0   0   1   0   square  0
125 180 40  300 6   4   1   15  50  3   2   10  60  30  15  170 175 0   0   1   0   1   0   0   1   0   rectangle   2
126 195 60  250 6   4   3   25  90  2   2   6   60  30  10  170 175 1   0   0   0   0   1   0   0   1   rectangle   2
127 160 70  300 4   2   1   20  60  2   2   5   40  20  15  160 170 0   0   0   1   0   0   0   1   0   square  2
128 170 60  300 8   6   2   30  80  1   1   10  65  30  15  155 155 0   1   0   1   0   0   0   0   1   square  2
129 160 40  350 6   6   2   15  60  1   1   5   25  30  15  155 170 0   0   1   0   0   1   0   1   0   rectangle   2
130 170 60  500 2   5   3   30  50  3   2   10  60  10  15  165 160 0   0   0   1   0   0   1   0   0   rectangle   1
131 170 60  650 8   3   3   23  90  1   1   10  70  15  15  170 175 1   0   0   1   0   0   1   0   0   rectangle   2
132 170 50  600 4   4   1   20  50  2   2   5   60  25  15  170 160 1   0   0   1   0   0   0   0   1   square  2
133 180 50  350 6   5   2   25  90  3   2   5   20  30  15  175 160 0   0   0   1   0   0   1   0   0   rectangle   0
134 170 90  200 4   2   4   20  90  3   2   10  20  25  15  170 175 0   0   0   1   0   0   0   0   1   rectangle   1
135 200 40  350 6   6   1   30  80  1   1   5   60  25  20  170 175 0   0   1   1   0   0   0   1   0   rectangle   2
136 165 60  250 2   3   2   25  60  1   1   8   20  15  15  170 170 0   1   0   1   0   0   0   0   1   rectangle   0
137 175 70  250 6   6   4   15  60  2   2   11  50  30  15  175 175 0   1   0   0   0   1   0   1   0   rectangle   2
138 180 50  350 6   4   2   25  70  3   2   5   45  25  15  170 170 0   0   0   0   0   1   1   0   0   rectangle   0
139 195 60  600 6   4   2   20  50  1   1   10  35  15  15  165 175 1   0   0   1   0   0   0   1   0   round   2
140 180 60  300 8   4   4   25  80  1   1   5   60  30  15  165 170 0   0   0   1   0   0   0   0   1   rectangle   1
141 200 60  500 8   4   1   23  70  2   2   8   15  30  15  160 170 0   0   0   1   0   0   1   0   0   rectangle   0
142 170 60  550 6   4   4   30  60  2   2   6   65  20  15  175 165 0   1   0   1   0   0   0   0   1   rectangle   1
143 170 40  600 2   2   1   15  70  1   2   11  30  25  20  175 165 0   0   0   1   0   0   0   0   1   rectangle   0
144 175 70  250 6   4   3   30  60  1   2   10  60  30  20  155 175 0   1   0   1   0   0   1   0   0   rectangle   2
145 180 50  250 4   5   3   15  80  1   2   6   60  30  15  170 170 0   0   0   1   0   0   0   0   1   rectangle   2
146 165 50  350 6   4   4   25  80  1   2   12  25  15  15  155 165 1   0   0   1   0   0   0   0   1   rectangle   0
147 170 60  500 6   5   4   23  60  1   2   10  15  30  20  160 170 1   0   0   1   0   0   1   0   0   rectangle   1
148 170 50  400 6   4   3   20  60  2   3   6   35  10  15  170 175 0   0   1   1   0   0   0   0   1   rectangle   1
149 195 80  650 8   4   3   30  90  1   1   6   15  20  10  165 160 1   0   0   0   1   0   1   0   0   rectangle   2
150 165 90  500 8   3   4   20  60  2   2   5   25  30  15  165 170 0   1   0   0   0   1   0   0   1   rectangle   1
151 160 80  200 2   4   4   30  80  3   1   5   50  25  15  170 160 0   1   0   1   0   0   0   1   0   rectangle   0
152 180 50  500 2   6   1   15  60  1   1   8   65  20  15  170 170 1   0   0   0   0   1   1   0   0   rectangle   2
153 165 60  600 6   4   1   30  70  3   3   11  15  30  10  170 170 0   0   0   1   0   0   1   0   0   rectangle   0
154 180 60  600 2   3   2   30  70  1   2   6   55  15  15  150 165 1   0   0   1   0   0   0   0   1   rectangle   2
155 160 60  400 2   6   4   15  60  1   1   9   55  30  10  170 160 1   0   0   1   0   0   1   0   0   rectangle   0
156 180 60  250 4   3   2   25  80  3   1   6   25  25  20  170 160 0   1   0   0   1   0   0   1   0   square  2
157 195 50  200 6   4   3   30  70  3   2   6   35  30  15  165 170 1   0   0   0   0   1   1   0   0   rectangle   2
158 170 50  650 6   5   2   15  60  3   2   12  35  30  10  170 175 1   0   0   0   1   0   0   1   0   rectangle   0
159 160 70  400 6   3   2   20  50  1   2   9   20  30  15  155 155 0   0   1   0   0   1   1   0   0   rectangle   0
160 175 90  600 6   4   4   23  80  3   3   7   20  20  15  155 160 1   0   0   1   0   0   0   1   0   rectangle   0
161 180 50  400 4   4   1   23  70  1   2   12  20  30  20  165 170 0   1   0   1   0   0   0   0   1   rectangle   1
162 170 90  250 6   3   3   20  80  2   2   12  25  15  15  170 155 0   0   0   1   0   0   0   1   0   round   2
163 170 60  200 2   6   1   23  80  3   1   10  30  30  15  170 175 0   1   0   0   0   1   0   1   0   rectangle   2
164 175 50  650 2   5   3   25  70  3   2   11  60  25  15  175 160 0   1   0   1   0   0   0   0   1   rectangle   2
165 195 90  400 6   3   3   23  60  1   2   7   35  25  20  170 155 0   0   0   1   0   0   1   0   0   round   1
166 180 50  600 6   3   4   25  60  2   2   10  20  10  15  155 175 0   1   0   1   0   0   0   1   0   square  0
167 200 50  500 6   3   3   15  90  2   1   6   20  25  10  170 155 0   1   0   1   0   0   0   0   1   rectangle   1
168 200 60  200 6   2   3   20  60  3   3   5   20  10  15  170 170 1   0   0   1   0   0   0   1   0   rectangle   1
169 200 60  300 4   5   3   20  90  3   2   12  30  25  15  155 160 0   0   1   1   0   0   0   0   1   rectangle   0
170 180 70  250 6   4   3   30  50  1   2   12  35  25  10  155 150 0   0   0   1   0   0   1   0   0   rectangle   1
171 175 70  200 4   6   4   30  60  2   2   5   25  30  15  150 160 0   0   1   1   0   0   0   1   0   square  0
172 165 90  400 2   5   1   30  90  3   2   6   70  30  15  170 170 0   1   0   1   0   0   0   0   1   rectangle   2
173 165 70  200 6   6   4   20  70  1   1   5   65  20  20  175 155 0   0   0   1   0   0   0   1   0   round   0
174 180 50  650 2   3   3   20  70  3   2   12  40  30  15  155 170 0   0   0   1   0   0   0   0   1   rectangle   1
175 180 40  200 6   3   2   30  80  3   3   7   60  30  10  175 150 0   1   0   1   0   0   1   0   0   rectangle   2
176 180 60  400 2   5   3   20  50  1   3   5   20  30  15  175 150 0   1   0   1   0   0   0   1   0   rectangle   1
177 200 50  400 4   6   4   23  60  2   2   7   55  20  15  160 170 0   1   0   1   0   0   0   0   1   round   0
178 180 50  550 6   4   3   20  50  2   2   8   20  25  20  170 170 1   0   0   0   0   1   1   0   0   rectangle   0
179 175 70  250 8   4   1   20  50  2   3   6   60  30  15  170 170 0   0   0   0   1   0   0   0   1   square  0
180 195 70  400 6   4   4   23  60  3   1   7   65  25  15  170 150 1   0   0   1   0   0   0   1   0   rectangle   1
181 160 50  500 6   4   3   25  50  1   1   11  55  10  15  170 170 0   0   0   0   0   1   0   0   1   rectangle   1
182 180 90  500 6   3   3   23  60  2   1   8   20  30  15  170 170 0   0   0   0   0   1   0   1   0   rectangle   1
183 170 70  650 2   3   3   25  80  1   3   8   45  20  10  170 170 0   1   0   1   0   0   0   1   0   round   2
184 195 70  600 6   4   2   25  60  1   2   6   40  30  15  155 170 1   0   0   1   0   0   0   0   1   rectangle   1
185 165 70  200 6   4   1   20  60  1   2   8   45  15  15  170 150 0   1   0   1   0   0   0   0   1   round   1
186 165 80  200 4   4   3   30  60  1   1   8   25  30  10  160 170 0   1   0   1   0   0   1   0   0   round   0
187 175 60  600 4   2   3   20  60  1   2   6   25  20  15  170 155 0   0   0   1   0   0   1   0   0   rectangle   2
188 180 70  500 6   4   3   30  70  2   2   7   55  30  15  170 150 1   0   0   1   0   0   1   0   0   square  1
189 180 50  600 2   4   4   30  60  3   1   9   40  25  15  170 170 1   0   0   0   0   1   0   0   1   rectangle   0
190 160 50  600 8   3   2   20  60  3   2   12  30  30  15  165 150 0   0   0   0   1   0   1   0   0   rectangle   2
191 180 60  200 6   2   1   30  60  3   2   7   20  30  15  175 160 1   0   0   1   0   0   1   0   0   rectangle   2
192 195 70  600 6   4   3   23  80  2   2   12  50  25  10  170 170 0   0   0   0   0   1   1   0   0   rectangle   1
193 180 60  250 6   3   1   15  60  2   3   5   60  30  20  175 165 1   0   0   0   0   1   0   1   0   rectangle   1
194 170 70  250 6   4   1   20  90  2   2   10  25  20  20  175 170 0   0   0   1   0   0   0   1   0   round   1
195 180 90  250 6   3   1   25  50  1   2   9   55  30  15  170 175 1   0   0   0   0   1   1   0   0   rectangle   1
196 160 70  550 6   3   4   30  90  3   2   10  60  20  15  165 165 0   1   0   1   0   0   0   1   0   round   0
197 175 60  200 8   2   3   15  60  1   2   11  50  30  15  165 175 0   1   0   1   0   0   0   0   1   rectangle   1
198 170 80  500 6   3   2   25  50  1   1   5   60  20  15  175 150 1   0   0   1   0   0   0   1   0   square  2
199 180 50  600 4   4   4   15  80  1   1   5   50  20  15  170 170 1   0   0   1   0   0   0   1   0   rectangle   2

【问题讨论】:

这当然很有趣,因为您需要获取值范围而不是特定值。 您可以使用lime 进行敏感性分析。但如果您想进行敏感性分析,最好选择LinearRegression 作为分类器。特别是因为你的大部分特征都是数字的,除了 Meat_Freshness,Food_Taste 似乎是分类的(所以对于 LR,你需要一次性处理它们,不要将它们视为数字)。 相关:Lime vs TreeInterpreter for interpreting decision tree, Random Forests interpretability 这个问题目前contains no code which violates SO guidelines (and appears to be coursework),所以为了使它合法,你需要用lime(或其他包)尝试它,编辑它以向我们展示你的代码和输出,以及你在哪里卡住了。 @DavidLee 你是对的。它必须是特定的值。以上已编辑。我是 ML 新手。你有什么建议如何从特征重要性中找到对标签的 Good:2 分类有贡献的 3 个最重要特征的值的组合 【参考方案1】:

非常简单的解决方案可以使用您的数据运行决策树分类器并使用 grapviz 库可视化树这里是文档https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html , 得到代码生成的点文件后,还可以在 webgraphiz 中进行可视化。此练习的结果可能是您期望的范围值。

【讨论】:

以上是关于映射 - 特征重要性与标签分类的主要内容,如果未能解决你的问题,请参考以下文章

机器学习 --- 线性回归

如何从 xgboost 或随机森林中区分重要特征的方向?

XGBoost三种特征重要性计算方法对比

Python分类定义特征重要性

5.线性回归算法

带有列名的pyspark随机森林分类器特征重要性