带有 TensorFlow 2.4+ 错误的 SHAP DeepExplainer

Posted

技术标签:

【中文标题】带有 TensorFlow 2.4+ 错误的 SHAP DeepExplainer【英文标题】:SHAP DeepExplainer with TensorFlow 2.4+ error 【发布时间】:2021-06-23 03:21:15 【问题描述】:

我正在尝试使用 DeepExplainer 计算 shap 值,但出现以下错误:

keras 不再支持,请改用 tf.keras

即使我使用的是 tf.keras?

KeyError Traceback(最近一次调用最后一次) 在 6 # ...或直接传递张量 7 解释器 = shap.DeepExplainer((model.layers[0].input, model.layers[-1].output), 背景) 8 shap_values = 解释器.shap_values(X_test[1:5]) C:\ProgramData\Anaconda3\lib\site-packages\shap\explainers\_deep\__init__.py 在 shap_values(self, X, rank_outputs, output_rank_order, check_additivity) 122人被选为“***”。 124返回self.explainer.shap_values(X,ranked_outputs,output_rank_order,check_additivity=check_additivity) C:\ProgramData\Anaconda3\lib\site-packages\shap\explainers\_deep\deep_tf.py in shap_values(self, X,ranked_outputs, output_rank_order, check_additivity) 310 # 将属性分配到输出数组的右侧 311 for l in range(len(X)): 312 phis[l][j] = (sample_phis[l][bg_data[l].shape[0]:] * (X[l][j] - bg_data[l])).mean(0) 313 314 output_phis.append(phis[0] if not self.multi_input else phis) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 2798 如果 self.columns.nlevels > 1: 第2799章 2800 索引器 = self.columns.get_loc(key) 2801 如果 is_integer(索引器): 2802 分度器 = [分度器] C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 第2646章 2647,除了 KeyError: 第2648章 第2649章 2650 如果 indexer.ndim > 1 或 indexer.size > 1: pandas\_libs\index.pyx 在 pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx 在 pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi 在 pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi 在 pandas._libs.hashtable.PyObjectHashTable.get_item() 关键错误:0
import shap
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.keras.backend as K

from keras.utils import to_categorical 
from sklearn.model_selection import train_test_split
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras import Sequential
from tensorflow.keras import optimizers

# print the JS visualization code to the notebook
shap.initjs()

X_train,X_test,Y_train,Y_test = train_test_split(*shap.datasets.iris(), test_size=0.2, random_state=0)

Y_train = to_categorical(Y_train, num_classes=3) 
Y_test = to_categorical(Y_test, num_classes=3) 

# Define baseline model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(8, input_dim=len(X_train.columns), activation="relu"))
model.add(tf.keras.layers.Dense(3, activation="softmax"))
model.summary()


# compile the model
model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=['accuracy'])

hist = model.fit(X_train, Y_train, batch_size=5,epochs=200, verbose=0)

# select a set of background examples to take an expectation over
background = X_train.iloc[np.random.choice(X_train.shape[0], 100, replace=False)]

# Explain predictions of the model
#explainer = shap.DeepExplainer(model, background)
# ...or pass tensors directly
explainer = shap.DeepExplainer((model.layers[0].input, model.layers[-1].output), background)
shap_values = explainer.shap_values(X_test[1:5])


【问题讨论】:

你能添加完整的错误信息(完整的回溯)吗?此外,在您的导入中:from keras.utils import to_categorical,您使用的是keras @Lescurel to_categorical 只是一个实用函数,用来转换标签,它几乎不可能起任何作用;该模型显然是使用 tf.keras 构建的。确实需要完整的错误跟踪。 @Lescurel ;我添加了完整的错误跟踪 完整的错误跟踪显示完全不同的错误 (KeyError)。 【参考方案1】:

TL;DR

在 TF 2.4+ 的顶部添加 tf.compat.v1.disable_v2_behavior() 在 numpy 数组上计算 shap 值,而不是在 df 上

完整的可重现示例:

import shap
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

import tensorflow as tf    
tf.compat.v1.disable_v2_behavior() # <-- HERE !

import tensorflow.keras.backend as K
from tensorflow.keras.utils import to_categorical
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras import Sequential
from tensorflow.keras import optimizers

print("SHAP version is:", shap.__version__)
print("Tensorflow version is:", tf.__version__)

X_train, X_test, Y_train, Y_test = train_test_split(
    *shap.datasets.iris(), test_size=0.2, random_state=0
)

Y_train = to_categorical(Y_train, num_classes=3)
Y_test = to_categorical(Y_test, num_classes=3)

# Define baseline model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(8, input_dim=len(X_train.columns), activation="relu"))
model.add(tf.keras.layers.Dense(3, activation="softmax"))
# model.summary()

# compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

hist = model.fit(X_train, Y_train, batch_size=5, epochs=200, verbose=0)

# select a set of background examples to take an expectation over
background = X_train.iloc[np.random.choice(X_train.shape[0], 100, replace=False)]

explainer = shap.DeepExplainer(
    (model.layers[0].input, model.layers[-1].output), background
)
shap_values = explainer.shap_values(X_test[:3].values) # <-- HERE !

# print the JS visualization code to the notebook
shap.initjs()
shap.force_plot(
    explainer.expected_value[0], shap_values[0][0], feature_names=X_train.columns
)

SHAP version is: 0.39.0
Tensorflow version is: 2.5.0

【讨论】:

以上是关于带有 TensorFlow 2.4+ 错误的 SHAP DeepExplainer的主要内容,如果未能解决你的问题,请参考以下文章

安装带有 SYCL 支持的 TensorFlow

带有 TensorRT 的 C++ Tensorflow API

Tensorflow 导入错误:没有名为“tensorflow”的模块

如何使用 Tensorflow V.2.4 RTX 2070 Super Ubuntu 18.04 安装 Cuda 10.1

在没有 sudo 的 Ubuntu 20.04 上使用 GPU 设置 Tensorflow 2.4

解读先电2.4版 iaas-pre-host.sh 脚本