权重没有使用 Gradient Tape 和 apply_gradients() 更新

Posted 2023-03-28

技术标签:

【中文标题】权重没有使用 Gradient Tape 和 apply_gradients() 更新【英文标题】：Weights were not updated using Gradient Tape and apply_gradients() 【发布时间】：2021-08-20 12:30:35 【问题描述】：

我正在构建一个具有自定义损失函数的 DNN，并且我正在使用 TensorFlow.kerasenter code here 中的 Gradient Tape 训练这个 DNN。代码运行没有任何错误，但是，就我可以检查 DNN 的权重而言，权重根本没有更新。我完全按照 TensorFlow 网站上的建议搜索了答案，但仍然不明白是什么原因。这是我的代码：

import numpy as np

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, LeakyReLU, Concatenate
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
from tensorflow.keras import optimizers

# Generate a random train data
c0_train = np.array([30 * np.random.uniform() for i in range(10000)])

# Build a simple DNN
c0_input = Input(shape=(1,), name='c0')
hidden_1 = Dense(100)(c0_input)
activation_1 = LeakyReLU(alpha=0.1)(hidden_1)
hidden_2 = Dense(100)(activation_1)
activation_2 = LeakyReLU(alpha=0.1)(hidden_2)
hidden_3 = Dense(100)(activation_2)
activation_3 = LeakyReLU(alpha=0.1)(hidden_3)
x0_output = Dense(1, name='x0')(activation_3)

model = Model(inputs=c0_input, outputs=x0_output)

# Calculating the loss function 
def cal_loss(c0_input):
  x0_output = model(c0_input)
  loss = tf.reduce_mean(
      tf.multiply(c0_input, tf.square(tf.subtract(x0_output, c0_input))))
  return loss

# Compute the gradient calculation
@tf.function
def compute_loss_grads(c0_input):
  with tf.GradientTape() as tape:
    loss = cal_loss(c0_input)
  grads = tape.gradient(loss, model.trainable_variables)
  return loss, grads

# Optimizer
opt = optimizers.Adam(learning_rate=0.01)

# Start looping
for epoch in range(50):
  print('Epoch = ', epoch)
  # Compute the loss and gradients
  [loss, grads] = compute_loss_grads(tf.cast(c0_train, tf.float32))
  # Adjust the weights of the model
  opt.apply_gradients(zip(grads, model.trainable_variables))

我已经使用model.get_weights() 检查了模型的权重，它们在运行循环之前和之后看起来完全一样。那么这里的问题是什么？还有一个问题，我怎样才能打印出每个时期的损失？

【问题讨论】：

【参考方案1】：

重量确实发生了变化。您可以检查如下；建立模型后保存您的权重文件（这些是初始权重）。

model = Model(inputs=c0_input, outputs=x0_output)
a_weg = model.get_weights()

现在，运行您的训练循环。训练完成后，得到新的权重如下，前后对比。

b_weg = model.get_weights()

a_weg[:1]
[array([[ 0.03541631, -0.02134866,  0.17080751,  0.10538128,  0.1361396 ,
          0.08645812,  0.114059  ,  0.216836  , -0.22464292, -0.21979895,
         -0.23927784, -0.00685263,  0.2167016 ,  0.09989142, -0.17772573,
          0.16095945, -0.10120587, -0.22456157, -0.22947621,  0.04009536,
          0.01029667, -0.18134505, -0.11318983,  0.10220072,  0.10100928,

b_weg[:1]
[array([[ 0.05140253,  0.00969543,  0.15155758,  0.07171137,  0.15917814,
          0.10883425,  0.11428417,  0.17012525, -0.25049415, -0.20693016,
         -0.20231842,  0.005939  ,  0.19197173,  0.07405043, -0.14260964,
          0.12490476, -0.11532102, -0.24605738, -0.25135723,  0.01863468,
          0.0311144 , -0.20050383, -0.11864465,  0.07961675,  0.11557189,

这就是您可以在每个时期打印损失分数的方式。

# Start looping
for epoch in range(5):
  # Compute the loss and gradients
  [loss, grads] = compute_loss_grads(tf.cast(c0_train, tf.float32))
  # Adjust the weights of the model
  opt.apply_gradients(zip(grads, model.trainable_variables))
  print('Epoch = ', epoch, ' - loss = ', loss.numpy())

Epoch =  0  - loss =  5962.977
Epoch =  1  - loss =  3042.2874
Epoch =  2  - loss =  2877.9978
Epoch =  3  - loss =  2607.5347
Epoch =  4  - loss =  2173.3213

【讨论】：

感谢您的帮助。我已经检查并同意重量确实按照您所说的那样改变。这是另一个问题，你可以看到损失基本上是：Minimize (x0_ouput - c0_input)**2，这意味着这个 DNN 将给出输出 x0_output，它应该非常接近 c0_input。但是，我已经将此模型与测试输入 = np.array([2., 3., 6.]) 一起使用，结果不会接近。我这里有什么问题吗？对此不确定，但您可以更好地了解代码中使用的损失函数。但是，请随时提出新问题并提供适当的详细信息。

以上是关于权重没有使用 Gradient Tape 和 apply_gradients() 更新的主要内容，如果未能解决你的问题，请参考以下文章