为啥在编码“填充掩码”时添加了新的暗淡

Posted 2023-03-04

技术标签:

【中文标题】为啥在编码“填充掩码”时添加了新的暗淡【英文标题】：why are new dim added when coding "padding mask"为什么在编码“填充掩码”时添加了新的暗淡 【发布时间】：2022-01-14 19:20:18 【问题描述】：

https://www.tensorflow.org/text/tutorials/transformer?hl=en 在tf官方文档《Transformer model for language understanding》的Mask部分，为什么要加newaix，为什么一定要加在这里？

def create_padding_mask(seq):
  seq = tf.cast(tf.math.equal(seq, 0), tf.float32)

  return seq[:, tf.newaxis, tf.newaxis, :]

【问题讨论】：

【参考方案1】：

由于此掩码随后在scaled_dot_product_attention 函数中使用，将填充位置的 logits 映射到点积后的 $-\infty$，其中点积的形状为 (batch_size, num_heads, seq_len_q, seq_len_k)。这样做的方式是将 $-\infty * mask$ 添加到点积中，因此掩码需要可广播到点积。而且tensorflow使用的广播规则和numpy's很像：

It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

they are equal, or
one of them is 1

所以这里将轴 1 和轴 2 添加到掩码中以使其可广播。

【讨论】：

非常感谢。之后，我阅读了以下代码并学习了它。我的理解就像你的答案。感谢您的反馈。

以上是关于为啥在编码“填充掩码”时添加了新的暗淡的主要内容，如果未能解决你的问题，请参考以下文章