计算 tf.while_loop 的每个时间步长的梯度

Posted 2023-02-16

技术标签:

【中文标题】计算 tf.while_loop 的每个时间步长的梯度【英文标题】：Compute gradients for each time step of tf.while_loop 【发布时间】：2018-09-08 08:23:09 【问题描述】：

给定一个 TensorFlow tf.while_loop，我如何计算 x_out 相对于每个时间步的网络所有权重的梯度？

network_input = tf.placeholder(tf.float32, [None])
steps = tf.constant(0.0)

weight_0 = tf.Variable(1.0)
layer_1 = network_input * weight_0

def condition(steps, x):
    return steps <= 5

def loop(steps, x_in):
    weight_1 = tf.Variable(1.0)
    x_out = x_in * weight_1
    steps += 1
    return [steps, x_out]

_, x_final = tf.while_loop(
    condition,
    loop,
    [steps, layer_1]
)

一些笔记

tf.gradients(x, tf.trainable_variables())

AttributeError: 'WhileContext' object has no attribute 'pred'

tf.gradients

weight_1

x_in

【问题讨论】：

你确定你对x_out感兴趣而不是x_final吗？是的，网络是像image captioning这样的自注册模型。网络在每个时间步输出动作的概率分布，直到它决定“完成”。我需要每个输出（动作）的梯度，而不仅仅是最后一个。您是否尝试在每次tf.while_loop 迭代中创建一个新变量？ TensorFlow 无法做到这一点。使用您当前的代码，您只创建了两个变量，一个用于layer_1，另一个用于每次循环迭代。不，我不想在每次迭代中都创建新变量。我只是想通过时间反向传播：计算每个时间步的x_out 相对于weight_0 和weight_1 的梯度。那么你为什么要在循环内声明weight_1 = tf.Variable(1.0)？你真的打算tf.get_variable吗？ 【参考方案1】：

在基于this 和this 的Tensorflow 中，您永远不能在tf.while_loop 中调用tf.gradients，当我尝试将共轭梯度下降完全创建到Tensorflow 中时，我发现了这一点。图表。

但如果我正确理解了您的模型，您可以制作自己的 RNNCell 版本并将其包装在 tf.dynamic_rnn 中，但实际的单元格实现会有点复杂，因为您需要在运行时动态评估条件。

对于初学者，您可以查看 Tensorflow 的 dynamic_rnn 代码 here。

另外，动态图从来都不是Tensorflow 的强项，因此请考虑使用PyTorch 等其他框架，或者您可以尝试eager_execution 看看是否有帮助。

【讨论】：

您是否找到了完全在 TF 中的共轭梯度的解决方法（并避免急切执行）？我有，是的。这不是微不足道的。查看this repository。他们有一个非常强大的实现。

以上是关于计算 tf.while_loop 的每个时间步长的梯度的主要内容，如果未能解决你的问题，请参考以下文章

使用 tf.while_loop 对张量进行切片

tf.while_loop

tf.while_loop - ValueError：这两个结构的元素数量不同

tensorflow-tf.while_loop

11 tensorflow在tf.while_loop循环(非一般循环)中使用操纵变量该怎么做

循环使用变量，使用特定步长更改每个循环计算