LayerNormalization 层标椎化tensorflow代码
Posted 炫云云
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了LayerNormalization 层标椎化tensorflow代码相关的知识,希望对你有一定的参考价值。
文章目录
原理
在批处理中独立地对每个训练数据的前一层的激活进行标椎化,而不是像批处理标椎化那样跨批处理。即,应用一种转换,使每个训练数据中的平均激活值保持接近于0,激活标准差保持接近于1。
缩放和中心化使标准化方程如下:
让mini-batch处理的中间激活成为“输入”。对于每个具有
k
k
k特征的输入样本
x
i
x_i
xi,我们计算平均值和样本方差:
μ
i
=
∑
j
k
(
x
i
[
j
]
)
k
σ
i
2
=
∑
j
k
(
x
i
[
j
]
−
μ
i
)
2
k
\\mu_{i} =\\frac{\\sum_j^{k}(x_i[j])}{k} \\\\ \\sigma_i^2 = \\frac{\\sum_j^{k}(x_i[j]-\\mu_{i} )^2}{k}
μi=k∑jk(xi[j])σi2=k∑jk(xi[j]−μi)2
然后计算一个标椎化的
x
i
n
o
r
m
a
l
i
z
e
d
x_i ~~normalized
xi normalized,使用一个用于数值稳定性的小因子
ϵ
\\epsilon
ϵ。
x
i
n
o
r
m
a
l
i
z
e
d
=
x
i
−
μ
i
σ
i
2
+
ϵ
x_i ~~normalized = \\frac{x_{i}-\\mu_{i}}{\\sqrt{\\sigma_i^2+\\epsilon}}
xi normalized=σi2+ϵxi−μi
然后缩放,LN 计算公式
L
N
(
x
i
)
=
γ
×
x
i
−
μ
i
σ
i
2
+
ϵ
+
β
\\begin{array}{l} L N\\left(x_{i}\\right)=\\gamma \\times \\frac{x_{i}-\\mu_{i}}{\\sqrt{\\sigma_i^2+\\epsilon}}+\\beta \\end{array}
LN(xi)=γ×σi2+ϵxi−μi+β
tensorflow代码
tf.keras.layers.LayerNormalization(
axis=-1,
epsilon=1e-3,
center=True, # If True, add offset of `beta` to normalized tensor. If False, `beta`is ignored. Defaults to True.
scale=True,# If True, multiply by `gamma`. If False, `gamma` is not used. Defaults to True. When the next layer is linear (also e.g. `nn.relu`), this can bedisabled since the scaling will be done by the next layer.
beta_initializer='zeros',
gamma_initializer='ones',
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
trainable=True,
name=None,
**kwargs):
axis: Integer or List/Tuple. The axis or axes to normalize across. Typically this is the features axis/axes. The left-out axes are typically the batch axis/axes. This argument defaults to -1
, the last dimension in the input.
epsilon: Small float added to variance to avoid dividing by zero. Defaults to 1e-3
center: If True, add offset of beta
to normalized tensor. If False, beta
is ignored. Defaults to True.
scale: If True, multiply by gamma
. If False, gamma
is not used. Defaults to True. When the next layer is linear (also e.g. nn.relu
), this can be disabled since the scaling will be done by the next layer.
beta_initializer: Initializer for the beta weight. Defaults to zeros.
gamma_initializer: Initializer for the gamma weight. Defaults to ones.
beta_regularizer: Optional regularizer for the beta weight. None by default.
gamma_regularizer: Optional regularizer for the gamma weight. None by default.
beta_constraint: Optional constraint for the beta weight. None by default.
gamma_constraint: Optional constraint for the gamma weight. None by default.
trainable: Boolean, if True
the variables will be marked as trainable. Defaults to True.
Input shape: Arbitrary. Use the keyword argument input_shape
(tuple of integers, does not include the samples axis) when using this layer as the first layer in a model.
Output shape: Same shape as input.
给定一个张量inputs
,计算矩并在axis
中指定的轴上进行归一化。
data = tf.constant(np.arange(10).reshape(5, 2) * 10, dtype=tf.float32)
print(data)
"""
tf.Tensor(
[[ 0. 10.]
[20. 30.]
[40. 50.]
[60. 70.]
[80. 90.]], shape=(5, 2), dtype=float32)
"""
layer = tf.keras.layers.LayerNormalization(axis=1)
output = layer(data)
print(output)
"""
tf.Tensor(
[[-1. 1.]
[-1. 1.]
[-1. 1.]
[-1. 1.]
[-1. 1.]], shape=(5, 2), dtype=float32)
"""
注意,使用层归一化,标椎化发生在轴内每个例子,而不是在批处理中的不同轴例子上。
如果scale
或center
被启用,该层将通过广播可训练变量gamma
来缩放标准化输出,并通过广播可训练变量beta
来集中输出。gamma
将默认为1张量,beta
将默认为0张量,因此在训练开始之前,定心和缩放都是没有操作的。
’ gamma ‘和’ beta ‘将跨越’ axis '中指定的’输入’轴,并且输入的这一部分的形状必须完全定义
layer = tf.keras.layers.LayerNormalization(axis=[1, 2, 3])
layer.build([5, 20, 30, 40])
print(layer.beta.shape)
"""
(20, 30, 40)
"""
print(layer.gamma.shape)
"""
(20, 30, 40)
"""
请注意,层标准化的其他实现可能选择定义gamma
和 beta
,而不是被标准化的轴的另一组轴。例如,组大小为1的组归一化(Wu et al. 2018)对应于跨高度、宽度和通道进行标准化的层标准化,并且gamma
和 beta
只跨越通道维度。因此,这个层标准化实现将不匹配组大小设置为1的组标准化层。
以上是关于LayerNormalization 层标椎化tensorflow代码的主要内容,如果未能解决你的问题,请参考以下文章