关于 tf.nn.softmax_cross_entropy_with_logits_v2

Posted 2023-02-16

技术标签:

【中文标题】关于 tf.nn.softmax_cross_entropy_with_logits_v2【英文标题】：About tf.nn.softmax_cross_entropy_with_logits_v2 【发布时间】：2018-08-28 20:52:51 【问题描述】：

我注意到tf.nn.softmax_cross_entropy_with_logits_v2(labels, logits)主要执行3个操作：

将 softmax 应用于 logits (y_hat) 以对其进行标准化：y_hat_softmax = softmax(y_hat)。

计算交叉熵损失：y_cross = y_true * tf.log(y_hat_softmax)

对一个实例的不同类求和：-tf.reduce_sum(y_cross, reduction_indices=[1])

从here 借来的代码完美地证明了这一点。

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))

# first step
y_hat_softmax = tf.nn.softmax(y_hat)

# second step
y_cross = y_true * tf.log(y_hat_softmax)

# third step
result = - tf.reduce_sum(y_cross, 1)

# use tf.nn.softmax_cross_entropy_with_logits_v2
result_tf = tf.nn.softmax_cross_entropy_with_logits_v2(labels = y_true, logits = y_hat)

with tf.Session() as sess:
    sess.run(result)
    sess.run(result_tf)
    print('y_hat_softmax:\n0\n'.format(y_hat_softmax.eval()))
    print('y_true: \n0\n'.format(y_true.eval()))
    print('y_cross: \n0\n'.format(y_cross.eval()))
    print('result: \n0\n'.format(result.eval()))
    print('result_tf: \n0'.format(result_tf.eval()))

输出：

y_hat_softmax:
[[0.227863   0.61939586 0.15274114]
[0.49674623 0.20196195 0.30129182]]

y_true: 
[[0. 1. 0.]
[0. 0. 1.]]

y_cross: 
[[-0.         -0.4790107  -0.        ]
[-0.         -0.         -1.19967598]]

result: 
[0.4790107  1.19967598]

result_tf: 
[0.4790107  1.19967598]

然而，一个热标签包括 0 或 1，因此这种二进制情况的交叉熵公式如下所示 here 和 here：

我在下一个单元格中为这个公式编写代码，其结果与上面不同。我的问题是哪个更好或正确？ tensorflow是否也有根据这个公式计算交叉熵的功能？

y_true = np.array([[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]])
y_hat_softmax_from_tf = np.array([[0.227863, 0.61939586, 0.15274114], 
                              [0.49674623, 0.20196195, 0.30129182]])
comb = np.dstack((y_true, y_hat_softmax_from_tf))
#print(comb)

print('y_hat_softmax_from_tf: \n0\n'.format(y_hat_softmax_from_tf))
print('y_true: \n0\n'.format(y_true))

def cross_entropy_fn(sample):
    output = []
    for label in sample:
        if label[0]:
            y_cross_1 = label[0] * np.log(label[1])
        else:
            y_cross_1 = (1 - label[0]) * np.log(1 - label[1])
        output.append(y_cross_1)
    return output

y_cross_1 = np.array([cross_entropy_fn(sample) for sample in comb])
print('y_cross_1: \n0\n'.format(y_cross_1))

result_1 = - np.sum(y_cross_1, 1)
print('result_1: \n0'.format(result_1))

输出

y_hat_softmax_from_tf: 
[[0.227863   0.61939586 0.15274114]
[0.49674623 0.20196195 0.30129182]]

y_true: 
[[0. 1. 0.]
[0. 0. 1.]]

y_cross_1: 
[[-0.25859328 -0.4790107  -0.16574901]
[-0.68666072 -0.225599   -1.19967598]]

result_1: 
[0.90335299 2.11193571]

【问题讨论】：

在官方文档中要小心：警告：此操作需要未缩放的 logits，因为它在内部对 logits 执行 softmax 以提高效率。不要用 softmax 的输出调用这个操作，因为它会产生不正确的结果。似乎 y 不应该传递给 softmax 函数。这个V2和上一个有什么区别？我可以用新的 V2 替换代码吗？我在运行 tf.nn.softmax_cross_entropy_with_logits(...) 的 tf 1.9 代码时收到一条已弃用的消息 【参考方案1】：

您的公式是正确的，但它仅适用于二元分类。 tensorflow 中的演示代码分为 3 个类。这就像将苹果与橙子进行比较。 One of the answers你指的也提到了：

这个公式通常用于一个输出预测两个类的网络（通常为 1 的正类成员和 0 输出的负类成员）。在这种情况下，我可能只有一个值 - 你可能会失去 i 的总和。

this question 详细描述了这两个公式（二元交叉熵与多项交叉熵）之间的区别以及何时适用。

你第二个问题的答案是肯定的，有这样一个函数叫tf.nn.sigmoid_cross_entropy_with_logits。请参阅上述问题。

【讨论】：

以上是关于关于 tf.nn.softmax_cross_entropy_with_logits_v2的主要内容，如果未能解决你的问题，请参考以下文章