Tensorflow:不提供梯度的自定义损失函数

Posted

技术标签:

【中文标题】Tensorflow:不提供梯度的自定义损失函数【英文标题】:Tensorflow: Custom Loss Function Not Providing Gradients 【发布时间】:2021-10-26 21:59:20 【问题描述】:

我正在尝试在 tensorflow 中定义一个自定义损失函数,该函数根据 this post 的答案来惩罚误报和误报。我必须修改计算特异性和回忆的代码,因为我有一个多类分类问题,而帖子中的问题只是二进制分类。如果重要的话,我正在使用存储在 ImageDataGenerator 对象中的图像进行训练。

损失函数的作用如下。

    y_pred 中的logits 和y_true 中的one-hot 编码类转换为每个批次的稀疏数字向量(例如[0, 2, 1, 1]) 为真阳性、真阴性、假阳性和假阴性实例化计数器(TPx、TNx、FPx、FNx - x 为 0、1 或 2,具体取决于类别)。庞大的 ifelif 语句基本上计算了混淆矩阵中的每个点,因为 3x3 混淆矩阵比 2x2 混淆矩阵复杂得多。它只是将每个类的指标(TP_g、TN_g、FP_g、FN_g)相加得到总指标。 将添加的指标转换为张量流张量(我从上述帖子中窃取了该部分) 计算特异性和召回率,然后从 1.0 中减去加权和以返回批次的总损失。

这是我定义的损失函数:

def myLossFcn(y_true, y_pred, recall_weight, spec_weight):
    #benign ==0
    #hyperplastic ==1
    #neoplastic ==2
    
    y_true = np.argmax(y_true, axis=1)
    y_pred = np.argmax(y_pred, axis=1)
    
    y_true = tensorflow.cast(y_true, tensorflow.float32)
    y_pred = tensorflow.cast(y_pred, tensorflow.float32)
    
    print('y_true:', y_true)
    print('y_pred:', y_pred)
    
    #true positives for all classes
    TP0 = 0
    TP1 = 0
    TP2 = 0
    for i in range(len(y_true)):
        if y_true[i] == 0 and y_pred[i] == 0:
            TP0 += 1 #benign true positive
        elif y_true[i] == 1 and y_pred[i] == 1:
            TP1 += 1 #hyperplastic true positive
        elif y_true[i] == 2 and y_pred[i] ==2: #neoplastic true positive
            TP2 += 1
    TP_g = TP0 + TP1 + TP2 #num true positives total (per batch) 
        
    #true negatives for all classes
    TN0 = 0
    TN1 = 0
    TN2 = 0
    for i in range(len(y_true)):
        if (y_true[i] == 1 and y_pred[i] == 1)  or (y_true[i] == 1 and y_pred[i] == 2) or (y_true[i] == 2 and y_pred[i] == 1) or (y_true[i] == 2 and y_pred[i] == 2):
            TN0 +=1
        elif (y_true[i] == 0 and y_pred[i] == 0) or (y_true[i] == 0 and y_pred[i] == 2) or (y_true[i] == 2 and y_pred[i] == 0) or (y_true[i] == 2 and y_pred[i] == 2):
            TN1 +=1
        elif (y_true[i] == 0 and y_pred[i] == 0) or (y_true[i] == 0 and y_pred[i] == 1) or (y_true[i] == 1 and y_pred[i] == 0) or (y_true[i] == 1 and y_pred[i] == 1):
            TN2 +=1
    TN_g = TN0 + TN1 + TN2
    
    #false positives for all classes
    FP0 = 0
    FP1 = 0
    FP2 = 0
    
    for i in range(len(y_true)):
        if (y_true[i] == 0 and y_pred[i] == 1) or (y_true[i] == 0 and y_pred[i] == 2):
            FP0 +=1
        elif (y_true[i] == 1 and y_pred[i] == 0) or (y_true[i] == 1 and y_pred[i] == 2):
            FP1 +=1
        elif (y_true[i] == 0 and y_pred[i] == 2) or (y_true[i] == 1 and y_pred[i] == 2):
            FP2 +=1
    FP_g = FP0 + FP1 + FP2
    
    #false negatives for all classes
    FN0 = 0
    FN1 = 0
    FN2 = 0
    
    for i in range(len(y_true)):
        if (y_true[i] == 0 and y_pred[i] == 1) or (y_true[i] == 0 and y_pred[i] == 2):
            FN0 +=1
        elif (y_true[i] == 1 and y_pred[i] == 0) or (y_true[i] == 1 and y_pred[i] == 2):
            FN1 += 1
        elif (y_true[i] == 0 and y_pred[i] == 1) or (y_true[i] == 1 and y_pred[i] == 2):
            FN2 +=1
    FN_g = FN0 + FN1 + FN2
    
    #Converted as Keras Tensors    
    TP_g = K.sum(K.variable(TP_g))
    TN_g = K.sum(K.variable(TN_g))
    FP_g = K.sum(K.variable(FP_g))
    FN_g = K.sum(K.variable(FN_g))
    
    print(TP_g)
    print(TN_g)
    print(FP_g)
    print(FN_g)
    
    specificity = TN_g / (TN_g + FP_g + K.epsilon())
    recall = TP_g / (TP_g + FN_g + K.epsilon())
    print('spec:', specificity)
    print('recall:', recall)
    
    loss = 1.0 - (recall_weight*recall + spec_weight*specificity)
    print('loss:', loss)
    
    return tensorflow.constant(loss)

在上一篇文章之后,我实例化了一个函数包装器以传递权重以获得特异性和召回率,然后开始训练:

def custom_loss(recall_weight, spec_weight):
    def recall_spec_loss(y_true, y_pred):
        return myLossFcn(y_true, y_pred, recall_weight, spec_weight)
    
    return recall_spec_loss

model = tensorflow.keras.applications.resnet50.ResNet50(weights=None,
                                                    input_shape=(100,100,1),
                                                    pooling=max,
                                                    classes=3)
loss = custom_loss(recall_weight=0.9, spec_weight=0.1)
model.compile(optimizer=hyperparameters['optimizer'],
          loss=loss,
          metrics=['accuracy', tensorflow.keras.metrics.FalseNegatives()],
          run_eagerly=True)

history = model.fit(train_set,
                epochs=50,
                callbacks=[model_checkpoint],
                validation_data=val_set,
                verbose=2)

当我运行我的代码时,我得到一个错误提示

ValueError:没有为任何变量提供渐变:[为简洁起见,我不会复制+粘贴它列出的所有渐变名称]

我还将发布我收到的输出以及该错误消息的回溯:

Found 625 images belonging to 3 classes.
Found 376 images belonging to 3 classes.
Found 252 images belonging to 3 classes.
Epoch 1/50
y_true: tf.Tensor([0. 2. 1. 0.], shape=(4,), dtype=float32)
y_pred: tf.Tensor([0. 0. 0. 0.], shape=(4,), dtype=float32)
tf.Tensor(2.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)
tf.Tensor(1.0, shape=(), dtype=float32)
tf.Tensor(1.0, shape=(), dtype=float32)
spec: tf.Tensor(0.8, shape=(), dtype=float32)
recall: tf.Tensor(0.6666667, shape=(), dtype=float32)
loss: tf.Tensor(0.32, shape=(), dtype=float32)
Traceback (most recent call last):
  File "/home/d/dsussman/dsherman/endo_git_v2/justin_method.py", line 253, in <module>
    verbose=2)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1178, in fit
    tmp_logs = self.train_function(iterator)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 850, in train_function
    return step_function(self, iterator)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 840, in step_function
    outputs = model.distribute_strategy.run(run_step, args=(data,))
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1285, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2833, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3608, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 597, in wrapper
    return func(*args, **kwargs)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 833, in run_step
    outputs = model.train_step(data)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 794, in train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 530, in minimize
    return self.apply_gradients(grads_and_vars, name=name)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 630, in apply_gradients
    grads_and_vars = optimizer_utils.filter_empty_gradients(grads_and_vars)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/utils.py", line 76, in filter_empty_gradients
    ([v.name for _, v in grads_and_vars],))
ValueError: No gradients provided for any variable:

我在网上找了好久都没有用。如this post 中所述,我已确保我的所有变量都是张量,并查看了this post,但我并不真正理解解决方案的含义:

请记住,调用您编写的 python 函数 (custom_loss) 来生成和编译 C 函数。编译函数是训练期间调用的函数。当您调用 python custom_loss 函数时,参数是没有附加数据的张量对象。 K.eval 调用将失败,K.shape 调用也将失败

我什至不确定第二个帖子是否相关,但这是我在互联网上能找到的全部内容。我希望解决方案像我忘记做一些非常明显的事情一样简单,或者改变一些容易的事情,但对于我的生活,我无法弄清楚出了什么问题。

非常感谢任何帮助

编辑

我已经更新了我的损失函数,以便所有中间值都是 dtype float32 的张量流张量并且收到相同的错误:

def myLossFcn(y_true, y_pred, recall_weight, spec_weight):
    #benign ==0
    #hyperplastic ==1
    #neoplastic ==2

    print('y_true:', y_true)
    print('y_pred:', y_pred)

    tp = tensorflow.keras.metrics.TruePositives()
    tp.update_state(y_pred, y_true)
    
    TP_g = tp.result()

    tn = tensorflow.metrics.TrueNegatives()
    tn.update_state(y_pred, y_true)
    
    TN_g = tn.result()

    fp = tensorflow.keras.metrics.FalsePositives()
    fp.update_state(y_pred, y_true)
    
    FP_g = fp.result()

    fn = tensorflow.keras.metrics.FalseNegatives()
    fn.update_state(y_pred, y_true)
    
    FN_g= fn.result()
    
    print(TP_g)
    print(TN_g)
    print(FP_g)
    print(FN_g)    
    
    #Converted as Keras Tensors
    TP_g = K.sum(K.variable(TP_g))
    TN_g = K.sum(K.variable(TN_g))
    FP_g = K.sum(K.variable(FP_g))
    FN_g = K.sum(K.variable(FN_g))
    
    print(TP_g)
    print(TN_g)
    print(FP_g)
    print(FN_g)
    
    specificity = TN_g / (TN_g + FP_g + K.epsilon())
    recall = TP_g / (TP_g + FN_g + K.epsilon())
    print('spec:', specificity)
    print('recall:', recall)
    
    loss = 1.0 - (recall_weight*recall + spec_weight*specificity)
    print('loss:', loss)
    
    return tensorflow.constant(loss) #probably not a tensorflow scalar atm

我打印了两次指标,看看K.sum(K.variable(**METRIC**)) 是否有任何影响。

这是输出:

tf.Tensor(8.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)
tf.Tensor(0.0, shape=(), dtype=float32)
tf.Tensor(0.0, shape=(), dtype=float32)
tf.Tensor(8.0, shape=(), dtype=float32)
spec: tf.Tensor(0.0, shape=(), dtype=float32)
recall: tf.Tensor(0.33333334, shape=(), dtype=float32)
loss: tf.Tensor(0.7, shape=(), dtype=float32)
Traceback (most recent call last):
  File "/home/d/dsussman/dsherman/endo_git_v2/justin_method.py", line 282, in <module>
    verbose=2)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1178, in fit
    tmp_logs = self.train_function(iterator)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 850, in train_function
    return step_function(self, iterator)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 840, in step_function
    outputs = model.distribute_strategy.run(run_step, args=(data,))
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1285, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2833, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3608, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 597, in wrapper
    return func(*args, **kwargs)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 833, in run_step
    outputs = model.train_step(data)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 794, in train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 530, in minimize
    return self.apply_gradients(grads_and_vars, name=name)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 630, in apply_gradients
    grads_and_vars = optimizer_utils.filter_empty_gradients(grads_and_vars)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/utils.py", line 76, in filter_empty_gradients
    ([v.name for _, v in grads_and_vars],))
ValueError: No gradients provided for any variable:

【问题讨论】:

***.com/questions/61894755/… 问题在于 if 和 for 语句 这里有多个问题,首先是loss必须用tensorflow来实现,而不是numpy,而且计算TPs、FPs、TNs等是不可微分的,这是个数学问题。跨度> 感谢您的解释,我会尝试更新这篇文章 【参考方案1】:

这是因为您定义的损失函数不支持收敛为答案,并且其导数具有不连续性。即当您将错误转换为零和一时。 梯度下降需要逐渐收敛到一个答案,而离散损失函数对此无济于事。

【讨论】:

我该如何解决这个问题?只使用keras.backend 函数或其他什么?

以上是关于Tensorflow:不提供梯度的自定义损失函数的主要内容,如果未能解决你的问题,请参考以下文章

Tensorflow:实现新的损失函数返回“ValueError:没有为任何变量提供梯度”

TensorFlow 2 自定义损失:“没有为任何变量提供梯度”错误

Keras 中的自定义损失函数 - 遍历 TensorFlow

用于嘈杂的不可微损失函数的自定义 Tensorflow 优化器

梯度下降

tensorflow2.0中的任何变量均未提供梯度