如何使用Tensorflows GradientTape（）计算偏差

Question

我正在寻求在自定义NN体系结构上实现GradientTape（），但是我看不到任何关于如何使用它来计算偏差的解释。回答了类似的问题here，但没有完全回答。

作为一个简单的例子，我对我的NN有这样的训练步骤：

self.W = ## Initialized earlier on
self.b = ## Initialized earlier on

@tf.function
    def train(self):
        with tf.GradientTape() as tape:
            pred = self.feedforward()
            loss = self.loss_evaluation()
        grad = tape.gradient(loss, self.W)
        grad = tape.gradient(loss, self.b) ## How do I do this?

        optimizer.apply_gradients(zip(grad, self.W))
        optimizer.apply_gradients(zip(grad, self.b)) ## How do I do this?

简单地说，我无法评估与偏差有关的梯度，因为在任何文档或教程中都没有发现偏差术语。那么，如何在我的代码中将偏见项实现为可训练变量呢？我不想使用keras来实现此功能，因此不建议使用trainable_variables，因为我想从头开始。