带有 Shap ValueError 的 DeepExplainer:使用不是符号张量的输入调用了 Layersequential_1

Posted

技术标签:

【中文标题】带有 Shap ValueError 的 DeepExplainer:使用不是符号张量的输入调用了 Layersequential_1【英文标题】:DeepExplainer with Shap ValueError: Layer sequential_1 was called with an input that isn't a symbolic tensor 【发布时间】:2020-09-21 21:23:31 【问题描述】:

我尝试使用带有 Shap 库的 Keras 获取经典神经网络的特征重要性,但出现以下错误:ValueError: Layersequential_1 was called with an input that is not a symbolic tensor。我查看了论坛,但答案仅适用于卷积网络。请在下面找到我的代码。

import pandas as pd
import pickle 
import numpy as np

from sklearn.utils import shuffle

    # Train

dataset_train_shuffle = shuffle(list_dataset_train[0], random_state = 24) 
dataset_train_shuffle = dataset_train_shuffle.reset_index(drop=True)

X_train = dataset_train_shuffle.iloc[:,1:8]
label_train = dataset_train_shuffle.iloc[:,[-1]]

    # Validation

X_validation = list_dataset_validation[0]
X_validation = X_validation.iloc[:,1:8]

label_validation = list_dataset_validation[0]
label_validation = label_validation.iloc[:,[-1]]

    # Test

X_test = list_dataset_test[0]
X_test = X_test.iloc[:,1:8]

label_test = list_dataset_test[0]
label_test = label_test.iloc[:,[-1]]

我的 X 是具有以下形状的数据框:

      BookEquityToMarketEquity    Market  ...  EPSGrowth1yrFwd  LowVolatility
0                    -0.725018 -0.531440  ...         0.551760      -1.111092
1                     0.622943 -0.372537  ...        -0.036427      -0.391065
2                    -1.123209  2.099897  ...         1.885993      -1.762509
3                    -3.047993  2.582608  ...         2.272227      -2.906862
4                     0.461661  0.562763  ...        -0.524000      -0.155260
                       ...       ...  ...              ...            ...
3007                 -1.466322 -2.234277  ...        -0.493226       1.712511
3008                  0.061376  0.294030  ...         0.411817      -0.057478
3009                  0.807521  0.357246  ...        -0.169811      -0.713736
3010                 -0.396623  0.320133  ...        -0.096492      -0.287331
3011                 -1.308371  1.074483  ...         1.447048      -1.062359

我的标签是具有以下形状的数据框:

      NYSE:AEE
0            0
1            0
2            0
3            0
4            1
       ...
3007         0
3008         0
3009         0
3010         0
3011         1

我的模型如下:

from keras.models import Sequential
from keras.layers.core import Dense, Dropout
from keras import optimizers
import tensorflow as tf

model = Sequential()
model.add(Dense(32,input_dim=len(X_train.columns), activation = 'relu',))
model.add(Dropout(0.25))

model.add(Dense(16, activation = 'relu'))
model.add(Dropout(0.25))

model.add(Dense(8, activation ='relu')) 
model.add(Dropout(0.25))

model.add(Dense(1,activation ='sigmoid'))

model.compile(loss = 'binary_crossentropy',
              optimizer = 'adam',
              metrics = [tf.keras.metrics.AUC()],
              )

model.fit(X_train,
          label_train,
          validation_data = (X_validation, label_validation),
          epochs = 100, 
          batch_size = 50,
          verbose = 1,
          )

我在尝试获取特征重要性时遇到了 DeepExplainer 的问题:

background = X_train[:1000]
explainer = shap.DeepExplainer(model, background)
shap_values = explainer.shap_values(X_test)

shap.force_plot(explainer.expected_value, shap_values[0,:], X_train.iloc[0,:])

ValueError: Layer sequential_1 was called with an input that isn't a symbolic tensor. Received type: <class 'pandas.core.frame.DataFrame'>. Full input: [     BookEquityToMarketEquity    Market  ...  EPSGrowth1yrFwd  LowVolatility
0                   -0.725018 -0.531440  ...         0.551760      -1.111092
1                    0.622943 -0.372537  ...        -0.036427      -0.391065
2                   -1.123209  2.099897  ...         1.885993      -1.762509
3                   -3.047993  2.582608  ...         2.272227      -2.906862
4                    0.461661  0.562763  ...        -0.524000      -0.155260
..                        ...       ...  ...              ...            ...
995                 -1.552939 -0.102533  ...         0.852491      -0.383818
996                  1.311711  1.659371  ...         1.028700      -0.967370
997                  1.013556 -1.029374  ...        -1.386222       0.319806
998                  0.374137 -1.736694  ...        -0.433354      -0.220381
999                  0.353116 -0.631120  ...        -0.227051       0.475108

[1000 rows x 7 columns]]. All inputs to the layer should be tensors.

有人有想法吗?提前感谢您的帮助。

【问题讨论】:

【参考方案1】:

我有同样的错误。我发现使用 tensorflow.Keras 而不是 Keras 来解决问题。另请参阅此link,因为 SHAP 不太支持 Keras。 所以你需要做的就是改变

from keras.models import Sequential
from keras.layers.core import Dense, Dropout
from keras import optimizers

从 tensorflow.keras 到模块

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers.core import Dense, Dropout
from tensorflow.keras import optimizers`

【讨论】:

以上是关于带有 Shap ValueError 的 DeepExplainer:使用不是符号张量的输入调用了 Layersequential_1的主要内容,如果未能解决你的问题,请参考以下文章

为啥 SHAP 的 Deep Explainer 在 ResNet-50 预训练模型上失败?

Shap LSTM (Keras, TensorFlow) ValueError: shape mismatch: objects cannot be broadcast to a single sh

带有 RPYC 的多处理 Python “ValueError:酸洗已禁用”

ValueError:使用带有 seaborn 线图的索引时无法解释输入“索引”

数字后带有减号的 CSV 文件。 “ValueError:无法将字符串转换为浮点数:”

在管道中使用时带有 scikit-learn PLSRegression 的 ValueError