tensorflow状态下的未知变量导致训练操作出错
Posted
技术标签:
【中文标题】tensorflow状态下的未知变量导致训练操作出错【英文标题】:Unknown variables in tensorflow state causes error in training operation 【发布时间】:2018-06-24 10:17:00 【问题描述】:我创建了两个张量(一个取决于另一个),如下所示:
weights = tf.random_normal(shape=(3, 3, 1, 64))
filters = get_filters(weights) # get_filters does some operation on weights
所以,经过上述操作,权重和过滤器看起来像
<tf.Tensor 'random_normal_1:0' shape=(3, 3, 1, 64) dtype=float32>
<tf.Tensor 'filters_1/weights:0' shape=(5, 3, 3, 1, 64) dtype=float32>
现在,我将这些张量传递给以下函数
def get_alphas(weights, filters, no_filters=5,
epochs=500, name=None):
with tf.name_scope(name, default_name="alpha_scope"):
weights = tf.reshape(weights, [-1], name="reshaped_weights")
filters = tf.reshape(filters, [no_filters, -1], name="reshaped_binary_filters")
alphas = tf.Variable(tf.zeros(shape=(no_filters, 1)), name="alphas")
weighted_sum = tf.reduce_sum(tf.multiply(alphas, filters), axis=0, name="weighted_sum")
error = tf.square(weights - weighted_sum, name="error")
loss = tf.reduce_mean(tf.reshape(error, [-1]), name="loss")
# Optimizer
optimizer = tf.train.AdamOptimizer()
training_op = optimizer.minimize(loss, name="training_op")
print(tf.global_variables())
init = tf.variables_initializer([alphas])
with tf.Session() as sess:
init.run()
epoch = 0
while epoch < epochs:
_, loss_train = sess.run([training_op, loss]) # <-- this is where the error is generated
print("\rIteration: / (:.1f%) Loss: :.5f".format(
epoch+1, epochs,
epoch * 100 / epochs,
loss_train),
end="")
epoch += 1
return tf.convert_to_tensor(sess.run(alphas))
在调用get_alphas(weights, filters)
时,我收到以下错误
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value alpha_scope/beta1_power
[[Node: alpha_scope/beta1_power/read = Identity[T=DT_FLOAT, _class=["loc:@alpha_scope/alphas"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](alpha_scope/beta1_power)]]
[[Node: alpha_scope/loss/_1 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_115_alpha_scope/loss", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
所以,我使用 tf.global_variables()
打印 tensorflow 中的所有变量,并且有一些我没有定义的未知变量(beta1_power
、beta2_power
),这就是导致此错误的原因
[<tf.Variable 'alpha_scope/alphas:0' shape=(5, 1) dtype=float32_ref>,
<tf.Variable 'alpha_scope/beta1_power:0' shape=() dtype=float32_ref>,
<tf.Variable 'alpha_scope/beta2_power:0' shape=() dtype=float32_ref>,
<tf.Variable 'alpha_scope/alphas/Adam:0' shape=(5, 1) dtype=float32_ref>,
<tf.Variable 'alpha_scope/alphas/Adam_1:0' shape=(5, 1) dtype=float32_ref>]
任何想法,这些变量是如何创建的?或者如何初始化它们?
我不能使用tf.global_variables_initializer()
,因为它可能会重置一些可能处于状态的变量。
【问题讨论】:
【参考方案1】:这些变量来自tf.train.AdamOptimizer
(参见this question)。既然你这样做了
init = tf.variables_initializer([alphas])
...
init.run()
...您只初始化了alphas
,而不是AdamOptimizer
中的插槽。如果您不能使用tf.global_variables_initializer()
,则必须手动按名称获取所有这些变量并初始化所有变量。
【讨论】:
我将优化器更改为GradientDescentOptimizer
,现在它可以工作,并将尝试初始化AdamOptimizer
的变量。作为一个附带问题,为什么tf.local_variables_initializer
没有帮助?
AdamOptimizer
的手动变量初始化有效。谢谢!以上是关于tensorflow状态下的未知变量导致训练操作出错的主要内容,如果未能解决你的问题,请参考以下文章
如何在 Tensorflow 中为未知单词添加新的嵌入(训练和预设测试)
如何在Tensorflow中添加未知单词的新嵌入(训练和预设测试)
我的训练数据集中的隐藏文件使 tensorflow 返回“未知的图像文件格式。需要 JPEG、PNG、GIF、BMP 之一。”