张量流中推理时的批量标准化
Posted
技术标签:
【中文标题】张量流中推理时的批量标准化【英文标题】:Batch Normalization at inference time in tensorflow 【发布时间】:2020-04-03 19:22:09 【问题描述】:我已加载经过训练的检查点文件以进行推理。我已经从模型中提取了 beta、移动均值和移动方差以及所有权重。在批量标准化中,当我手动计算batch_normalization
的输出时,我得到了错误的结果。
[更新]
我在这里分享我的代码,它加载检查点、打印批量标准化的输入、打印 beta、移动均值和移动方差,并在控制台上打印批量标准化的输出。
import tensorflow as tf
import cv2
import numpy as np
import time
import os
def main():
with tf.Session() as sess:
#[INFO] code for loading checkpoint
#---------------------------------------------------------------------
saver = tf.train.import_meta_graph("./bag-model-34000.meta")
saver.restore(sess, tf.train.latest_checkpoint("./"))
graph = tf.get_default_graph()
input_place = graph.get_tensor_by_name('input/image_input:0')
op = graph.get_tensor_by_name('output/image_output:0')
#----------------------------------------------------------------------
#[INFO] generating input data which is equal to input tensor shape
#----------------------------------------------------------------------
input_data = np.random.randint(255, size=(1,320,240, 3)).astype(float)
#----------------------------------------------------------------------
#[INFO] code to get all tensors_name
#----------------------------------------------------------------------
operations = sess.graph.get_operations()
ind = 0;
tens_name = [] # store all tensor name in list
for operation in operations:
#print(ind,"> ", operation.name, "=> \n", operation.values())
if (operation.values()):
name_of_tensor = str(operation.values()).split()[1][1:-1]
tens_name.append(name_of_tensor)
ind = ind + 1
#------------------------------------------------------------------------
#[INFO] printing Input to batch normalization, beta, moving mean and moving variance
# so I can calculate manually batch normalization output
#------------------------------------------------------------------------
tensor_number = 0
for tname in tens_name: # looping through each tensor name
if tensor_number <= 812: # I am interested in first 812 tensors
tensor = graph.get_tensor_by_name(tname)
tensor_values = sess.run(tensor, feed_dict=input_place: input_data)
print("tensor: ", tensor_number, ": ", tname, ": \n\t\t", tensor_values.shape)
# [INFO] 28'th tensor its name is "input/conv1/conv1_1/separable_conv2d:0"
# the output of this tensor is input to the batch normalization
if tensor_number == 28:
# here I am printing this tensor output
print(tensor_values) # [[[[-0.03182551 0.00226904 0.00440771 ...
print(tensor_values.shape) # (1, 320, 240, 32)
# [INFO] 31'th tensor its name is "conv1/conv1_1/BatchNorm/beta:0"
# the output of this tensor is all beta
if tensor_number == 31:
# here I am printing this beta's
print(tensor_values) # [ 0.04061257 -0.16322449 -0.10942575 ...
print(tensor_values.shape) # (32,)
# [INFO] 35'th tensor its name is "conv1/conv1_1/BatchNorm/moving_mean:0"
# the output of this tensor is all moving mean
if tensor_number == 35:
# here I am printing this moving means
print(tensor_values) # [-0.0013569 0.00618145 0.00248459 ...
print(tensor_values.shape) # (32,)
# [INFO] 39'th tensor its name is "conv1/conv1_1/BatchNorm/moving_variance:0"
# the output of this tensor is all moving_variance
if tensor_number == 39:
# here I am printing this moving variance
print(tensor_values) # [4.48082483e-06 1.21615967e-05 5.37582537e-06 ...
print(tensor_values.shape) # (32,)
# [INFO] 44'th tensor its name is "input/conv1/conv1_1/BatchNorm/FusedBatchNorm:0"
# here perform batch normalization and here I am printing the output of this tensor
if tensor_number == 44:
# here I am printing the output of this tensor
print(tensor_values) # [[[[-8.45019519e-02 1.23237416e-01 -4.60943699e-01 ...
print(tensor_values.shape) # (1, 320, 240, 32)
tensor_number = tensor_number + 1
#---------------------------------------------------------------------------------------------
if __name__ == "__main__":
main()
因此,在从控制台运行上述代码后,我得到了批量标准化的输入,即“input/conv1/conv1_1/separable_conv2d:0
”这个张量的输出。
I am taking the first value from that output as x,
so, input x = -0.03182551
and beta, moving mean and moving variance is also printed on console.
and I am take the first value from each array.
beta = 0.04061257
moving mean = -0.0013569
moving variance = 4.48082483e-06
epsilon = 0.001 ... It is default value
and gamma is ignored. because I set training time as scale = false so gamma is ignored.
When I am calculate the output of batch normalization at inference time for given input x
x_hat = (x - moving_mean) / square_root_of(moving variance + epsilon)
= (-0.03182551 − (-0.0013569)) / √(0.00000448082483 + 0.001)
= −0.961350647
so x_hat is −0.961350647
y = gamma * x_hat + beta
gamma is ignored so equation becomes y = x_hat + beta
= −0.961350647 + 0.04061257
y = −0.920738077
So If I calculated manually y at inference time it gives as y = −0.920738077
but in program it showing y = -8.45019519e-02
It is output of "input/conv1/conv1_1/BatchNorm/FusedBatchNorm:0" tensor.
It is very very different from what I am calculated. Is my equation is wrong? So which modifications
I have to make to above x_hat and y equation so I can get this value.
所以,我很困惑为什么我的计算结果与结果值有很大不同?
我还使用 tf.compat.v1.global_variables() 检查了 beta、移动均值和移动方差。所有值都与控制台上打印的 beta、移动均值和移动方差的值匹配。
那么为什么在公式x_hat
和y
中手动替换此值后我得到错误的结果?
另外我在这里提供我的控制台输出,从 tensor_number 28 到 44...
tensor: 28 : input/conv1/conv1_1/separable_conv2d:0 :
(1, 320, 240, 32)
[[[[-0.03182551 0.00226904 0.00440771 ... -0.01204819 0.02620635
tensor: 29 : input/conv1/conv1_1/BatchNorm/Const:0 :
(32,)
tensor: 30 : conv1/conv1_1/BatchNorm/beta/Initializer/zeros:0 :
(32,)
tensor: 31 : conv1/conv1_1/BatchNorm/beta:0 :
(32,)
[ 0.04061257 -0.16322449 -0.10942575 0.05056419 -0.13785222 0.4060304
tensor: 32 : conv1/conv1_1/BatchNorm/beta/Assign:0 :
(32,)
tensor: 33 : conv1/conv1_1/BatchNorm/beta/read:0 :
(32,)
tensor: 34 : conv1/conv1_1/BatchNorm/moving_mean/Initializer/zeros:0 :
(32,)
tensor: 35 : conv1/conv1_1/BatchNorm/moving_mean:0 :
(32,)
[-0.0013569 0.00618145 0.00248459 0.00340403 0.00600711 0.00291052
tensor: 36 : conv1/conv1_1/BatchNorm/moving_mean/Assign:0 :
(32,)
tensor: 37 : conv1/conv1_1/BatchNorm/moving_mean/read:0 :
(32,)
tensor: 38 : conv1/conv1_1/BatchNorm/moving_variance/Initializer/ones:0 :
(32,)
tensor: 39 : conv1/conv1_1/BatchNorm/moving_variance:0 :
(32,)
[4.48082483e-06 1.21615967e-05 5.37582537e-06 1.40261754e-05
tensor: 40 : conv1/conv1_1/BatchNorm/moving_variance/Assign:0 :
(32,)
tensor: 41 : conv1/conv1_1/BatchNorm/moving_variance/read:0 :
(32,)
tensor: 42 : input/conv1/conv1_1/BatchNorm/Const_1:0 :
(0,)
tensor: 43 : input/conv1/conv1_1/BatchNorm/Const_2:0 :
(0,)
tensor: 44 : input/conv1/conv1_1/BatchNorm/FusedBatchNorm:0 :
(1, 320, 240, 32)
[[[[-8.45019519e-02 1.23237416e-01 -4.60943699e-01 ... 3.77691090e-01
【问题讨论】:
你是如何提取你的价值观的?您的平均值为 1e-3,方差为 4.5e-6,这意味着 0.02 的值与许多正标准差相距甚远,因此对于这些值,我认为 10 的归一化值是完全合理的。因此,我怀疑这些不是批处理规范层的正确值,或者您的输入值不正确,因此请用您如何获取这些值及其输入来更新您的问题(例如,输入在输入之前是否已标准化模型呢?)。 感谢您的评论。在这里,我分享了我的代码,该代码描述了我如何获得批量标准化、beta、移动均值和移动方差的输入。 我达到了和你一样的价值观。你能打印出张量 29 的值吗?我认为这可能是您影响 x 张量的值,但我不确定它是否具有“输入”和批处理规范的范围。你能解释一下吗? 我已经解决了这个问题,对于批量归一化操作,它使用批量均值和批次方差和 beta 作为 0,而不是提供移动均值、移动方差和 beta。因此,我计算了批次均值、批次方差并在方程中替换了这些值,现在它给出了正确的输出。谢谢你的帮助.. 【参考方案1】:我已经解决了这个问题,对于批量标准化操作,它认为它正在训练中。
因此,它使用批均值和批方差以及 beta 作为 0,而不是提供移动均值、移动方差和 beta。
所以我计算了批次均值、批次方差,并将这些值代入方程中,现在它给出了正确的输出。
那么如何才能强迫他使用移动均值和移动方差并提供 beta 呢? 我通过将培训设置为假来尝试进行此更改。但它不起作用。
for tname in tens_name: # looping through each tensor name
if tensor_number <= 812: # I am interested in first 812 tensors
training = tf.placeholder(tf.bool, name = 'training')
is_training = tf.placeholder(tf.bool, name = 'is_training')
tensor = graph.get_tensor_by_name(tname)
tensor_values = sess.run(tensor, feed_dict=is_training: False, training: False, input_place: input_data)
在实际代码中is_training为真
def load_cnn(self,keep_prob = 0.5, num_filt = 32, num_layers = 2,is_training=True):
self.reuse=False
with tf.name_scope('input'):
self.image_input=tf.placeholder(tf.float32,shape=[None,None,None,3],name='image_input')
net=self.image_input
with slim.arg_scope([slim.separable_conv2d],
depth_multiplier=1,
normalizer_fn=slim.batch_norm,
normalizer_params='is_training':is_training,
activation_fn=tf.nn.relu,weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
weights_regularizer=slim.l2_regularizer(0.0005)):
# Down Scaling
# Block 1
net=slim.repeat(net, 2, slim.separable_conv2d, num_filt, [3, 3], scope = 'conv1')
print('en_conv1',net.shape,net.name) # 320x240x3 -> 316x236x32
self.cnn_layer1=net
#Down Sampling
net=slim.max_pool2d(net,[2,2],scope='pool1')
print('en_maxpool1',net.shape,net.name) # 316x236x32 -> 158x118x32
【讨论】:
您的 'training' 和 'is_training' 是否实际用于图表中,或者您只是定义它们并将它们输入模型而不将它们放入图表中? is_training 在图中使用。 因为在您提供的 sn-p 中,您在代码中定义了它们,但没有将它们放在图表中。你能在你的实际代码中展示你在哪里使用它们吗? 不要将 True 传递给您的函数,而是将占位符传递给您的 load_cnn 函数。在您的第一个代码中,您正在创建一个新的“is_training”占位符,它与图表中使用的占位符无关。 加载检查点后我只有一个检查点文件,如何将其设为假并用于推理。以上是关于张量流中推理时的批量标准化的主要内容,如果未能解决你的问题,请参考以下文章