Keras MultiHeadAttention() 类不返回预期值
Posted
技术标签:
【中文标题】Keras MultiHeadAttention() 类不返回预期值【英文标题】:The Keras MultiHeadAttention() class does not return expected values 【发布时间】:2021-12-30 13:18:20 【问题描述】:我想将 Chollet 的书 Deep learning with Python 第 339 页上的 self_attention()
函数的结果与下面的 MultiHeadAttention()
示例的结果相匹配同一页。
我用相同的输入写了一个例子,我得到了不同的结果。有人可以解释为什么吗?为了清楚起见,我插入了 self_attention()
函数。
import numpy as np
from scipy.special import softmax
from tensorflow.keras.layers import MultiHeadAttention
def self_attention(input_sequence):
output = np.zeros(shape=input_sequence.shape)
# The output will consist of contextual embeddinsgs of the same shape
for i, pivot_vector in enumerate(input_sequence):
scores = np.zeros(shape=(len(input_sequence),))
for j, vector in enumerate(input_sequence):
scores[j] = np.dot(pivot_vector, vector.T) # Q K^T
scores /= np.sqrt(input_sequence.shape[1]) # sqrt(d_k)
scores = softmax(scores) # softmax(Q K^T / sqrt(d_k))
print(i, scores)
new_pivot_representation = np.zeros(shape=pivot_vector.shape)
for j, vector in enumerate(input_sequence):
new_pivot_representation += vector * scores[j]
output[i] = new_pivot_representation
return output
test_input_sequence = np.array([[[1.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 1.0, 1.0]]])
test_input_sequence.shape
# (1, 3, 4)
self_attention(test_input_sequence[0])
"""
returns
[[0.50648039 0.49351961 0.30719589 0.81367628]
[0.23269654 0.76730346 0.38365173 0.61634827]
[0.21194156 0.78805844 0.57611688 0.78805844]]
the attention scores being:
[0.50648039 0.18632372 0.30719589]
[0.23269654 0.38365173 0.38365173]
[0.21194156 0.21194156 0.57611688]
"""
att_layer = MultiHeadAttention(num_heads=1,
key_dim=4,
use_bias=False,
attention_axes=(1,))
att_layer(test_input_sequence,
test_input_sequence,
test_input_sequence,
return_attention_scores=True)
"""
returns
array([[[-0.46123487, 0.36683324, -0.47130704, -0.00722525],
[-0.49571565, 0.37488416, -0.52883905, -0.02713571],
[-0.4566634 , 0.38055322, -0.45884743, -0.00156384]]],
dtype=float32)
and the attention scores
array([[[[0.31446996, 0.36904442, 0.3164856 ],
[0.34567958, 0.2852166 , 0.36910382],
[0.2934979 , 0.3996053 , 0.30689687]]]], dtype=float32)>)
"""
【问题讨论】:
【参考方案1】:我找到了答案。这是由于查询、键和值之前的三个密集层,以及注意力模块之后的一个密集层(书中的图 11.8 中缺少最后一个密集层)。
要重现self_attention()
的结果,我们只需要传递密集层即可:
i_4 = np.identity(4)
w_pt_4 = [i_4.reshape(4, 1, 4) for _ in range(3)] + [i_4.reshape(1, 4, 4)]
att_layer.set_weights(w_pt_4)
【讨论】:
以上是关于Keras MultiHeadAttention() 类不返回预期值的主要内容,如果未能解决你的问题,请参考以下文章
为啥 torch.nn.MultiheadAttention 中的 W_q 矩阵是二次的
pytorch笔记:nn.MultiheadAttention