如何在 Keras 模型中使用 TensorFlow 的采样 softmax 损失函数？

Posted 2023-03-28

技术标签:

【中文标题】如何在 Keras 模型中使用 TensorFlow 的采样 softmax 损失函数？【英文标题】：How can I use TensorFlow's sampled softmax loss function in a Keras model? 【发布时间】：2018-06-02 04:59:06 【问题描述】：

我正在 Keras 中训练一个语言模型，并希望通过使用采样的 softmax 作为我网络中的最终激活函数来加快训练速度。从 TF 文档看来，我需要为 weights 和 biases 提供参数，但我不确定这些输入的预期内容。看来我可以在 Keras 中编写一个自定义函数，如下所示：

import keras.backend as K

def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
    return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)

但是，我不确定如何将其“插入”到我现有的网络中。 LM 的架构非常简单：

model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

鉴于这种架构，在模型上调用 compile 方法时，我可以将 sampled_softmax 函数作为 loss 参数传递吗？或者是否需要将其编写为最终全连接层之后的层。此处的任何指导将不胜感激。谢谢。

【问题讨论】：

这可能会有所帮助。 *** question 【参考方案1】：

这里的关键观察结果是 TensorFlow 采样的 softmax 函数返回实际损失，而不是对一组可能的标签的一组预测，以与地面实况数据进行比较，然后作为单独的步骤计算损失。这使得模型设置有点奇怪。

首先，我们向模型添加第二个输入层，该层将目标（训练）数据作为输入进行第二次编码，除了作为目标输出。这用于sampled_softmax_loss 函数的labels 参数。它需要是 Keras 输入，因为在我们实例化和设置模型时，它会被视为输入。

其次，我们构建了一个新的自定义 Keras 层，它调用 sampled_softmax_loss 函数，并使用两个 Keras 层作为其输入：预测我们的类的密集层的输出，然后是包含训练副本的第二个输入数据。请注意，我们在访问_keras_history 实例变量以从原始全连接层的输出张量中获取权重和偏差张量进行了一些严重的黑客攻击。

最后，我们必须构造一个新的“哑”损失函数，它忽略训练数据，只使用sampled_softmax_loss 函数报告的损失。

请注意，由于采样的 softmax 函数返回的是损失，而不是类预测，因此您不能使用此模型规范进行验证或推理。您需要在新规范中重新使用此“训练版本”中的训练层，该规范将标准 softmax 函数应用于应用了默认激活函数的原始密集层。

肯定有一种更优雅的方法可以做到这一点，但我相信这是可行的，所以我想我现在按原样发布它，而不是等到我有一些更整洁的东西。例如，您可能希望将类数作为SampledSoftmax 层的参数，或者更好的是，将其全部压缩到原始问题中的损失函数中，并避免两次传入训练数据。

from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K

class SampledSoftmax(Layer):
    def __init__(self, **kwargs):
        super(SampledSoftmax, self).__init__(**kwargs)


    def call(self, inputs):
        """
        The first input should be the model as it were, and the second the
        target (i.e., a repeat of the training data) to compute the labels
        argument

        """
        # the labels input to this function is batch size by 1, where the
        # value at position (i, 1) is the index that is true (not zero)
        # e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
        return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
                                            biases=inputs[0]._keras_history[0].bias,
                                            inputs=inputs[0],
                                            labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
                                            num_sampled=1000,
                                            num_classes=200000)

def custom_loss(y_true, y_pred):
    return K.tf.reduce_mean(y_pred)


num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))

dense = Dense(num_classes)

outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])

model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired

【讨论】：

显然，无法深入模型是一个长期存在的问题，据我所知尚未解决。例如，github.com/keras-team/keras/issues/7395 感谢您对此进行破解。最近对这个问题重新感兴趣并尝试运行它。建议的代码似乎没有运行。看起来存在尺寸不匹配，tf.nn.sampled_softmax_loss 需要 2D 张量，而这段代码最终会提供 3D 张量。我正在查看更多内容，但如果您（或任何人）提出任何建议，我很乐意更新！啊，我在一个更复杂的模型上使用了这个采样的 softmax 层，所以试着把它剥离成一些准系统发布在这里。我会更新我的原始答案，但这只是我愚蠢并且与模型不匹配。如果您修复输入以对齐形状 (300,) 而不是我所拥有的，它应该可以工作。换句话说，如果你使用了我的带有完整 softmax 的虚拟模型并跳过了这个自定义层，它仍然无法工作，因为它只是指定得很差。

以上是关于如何在 Keras 模型中使用 TensorFlow 的采样 softmax 损失函数？的主要内容，如果未能解决你的问题，请参考以下文章