在 tf.keras 中理解线性回归模型调优的问题

Posted 2023-03-12

技术标签:

【中文标题】在 tf.keras 中理解线性回归模型调优的问题【英文标题】：Problems understanding linear regression model tuning in tf.keras 【发布时间】：2020-10-10 14:18:50 【问题描述】：

我正在研究Linear Regression with Synthetic Data Colab exercise，它使用玩具数据集探索线性回归。建立和训练了一个线性回归模型，可以使用学习率、时期和批量大小。我很难理解迭代是如何完成的，以及它如何与“epoch”和“batch size”相关联。我基本上不知道如何训练实际模型、如何处理数据以及如何完成迭代。为了理解这一点，我想通过手动计算每个步骤来遵循这一点。因此，我希望每个步骤都有斜率和截距系数。这样我就可以看到“计算机”使用什么样的数据，放入模型中，每次特定迭代的模型结果是什么，以及迭代是如何完成的。我首先尝试获取每一步的斜率和截距，但是失败了，因为只有在最后才输出斜率和截距。我修改后的代码（原创，刚刚添加：）

  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)

代码：

import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt

#@title Define the functions that build and train a model
def build_model(my_learning_rate):
  """Create and compile a simple linear regression model."""
  # Most simple tf.keras models are sequential. 
  # A sequential model contains one or more layers.
  model = tf.keras.models.Sequential()

  # Describe the topography of the model.
  # The topography of a simple linear regression model
  # is a single node in a single layer. 
  model.add(tf.keras.layers.Dense(units=1, 
                                  input_shape=(1,)))

  # Compile the model topography into code that 
  # TensorFlow can efficiently execute. Configure 
  # training to minimize the model's mean squared error. 
  model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=my_learning_rate),
                loss="mean_squared_error",
                metrics=[tf.keras.metrics.RootMeanSquaredError()])
 
  return model           


def train_model(model, feature, label, epochs, batch_size):
  """Train the model by feeding it data."""

  # Feed the feature values and the label values to the 
  # model. The model will train for the specified number 
  # of epochs, gradually learning how the feature values
  # relate to the label values. 
  history = model.fit(x=feature,
                      y=label,
                      batch_size=batch_size,
                      epochs=epochs)

  # Gather the trained model's weight and bias.
  trained_weight = model.get_weights()[0]
  trained_bias = model.get_weights()[1]
  print("Slope")
  print(trained_weight)
  print("Intercept")
  print(trained_bias)
  # The list of epochs is stored separately from the 
  # rest of history.
  epochs = history.epoch

  # Gather the history (a snapshot) of each epoch.
  hist = pd.DataFrame(history.history)

 # print(hist)
  # Specifically gather the model's root mean 
  #squared error at each epoch. 
  rmse = hist["root_mean_squared_error"]

  return trained_weight, trained_bias, epochs, rmse

print("Defined create_model and train_model")

#@title Define the plotting functions
def plot_the_model(trained_weight, trained_bias, feature, label):
  """Plot the trained model against the training feature and label."""

  # Label the axes.
  plt.xlabel("feature")
  plt.ylabel("label")

  # Plot the feature values vs. label values.
  plt.scatter(feature, label)

  # Create a red line representing the model. The red line starts
  # at coordinates (x0, y0) and ends at coordinates (x1, y1).
  x0 = 0
  y0 = trained_bias
  x1 = my_feature[-1]
  y1 = trained_bias + (trained_weight * x1)
  plt.plot([x0, x1], [y0, y1], c='r')

  # Render the scatter plot and the red line.
  plt.show()

def plot_the_loss_curve(epochs, rmse):
  """Plot the loss curve, which shows loss vs. epoch."""

  plt.figure()
  plt.xlabel("Epoch")
  plt.ylabel("Root Mean Squared Error")

  plt.plot(epochs, rmse, label="Loss")
  plt.legend()
  plt.ylim([rmse.min()*0.97, rmse.max()])
  plt.show()

print("Defined the plot_the_model and plot_the_loss_curve functions.")

my_feature = ([1.0, 2.0,  3.0,  4.0,  5.0,  6.0,  7.0,  8.0,  9.0, 10.0, 11.0, 12.0])
my_label   = ([5.0, 8.8,  9.6, 14.2, 18.8, 19.5, 21.4, 26.8, 28.9, 32.0, 33.8, 38.2])

learning_rate=0.05
epochs=1
my_batch_size=12

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                         my_label, epochs,
                                                         my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

在我的具体情况下，我的输出是：

现在我尝试在一个简单的 Excel 表中复制它并手动计算 rmse：

但是，我得到的是 21.8 而不是 23.1？另外我的损失不是535.48，而是476.82

因此，我的第一个问题是：我的错误在哪里，rmse 是如何计算的？

第二个问题：如何获得每次特定迭代的 rmse？假设 epoch 为 4，batch size 为 4。

这给出了 4 个时期和 3 个批次，每 4 个示例（观察）。我不明白如何通过这些迭代训练模型。那么如何获得每个回归模型和 rmse 的系数呢？不仅针对每个 epoch（所以 4），而且针对每个迭代。我认为每个时代都有 3 次迭代。所以我认为总共有 12 个线性回归模型？我想看看这12个模型。在没有给出信息的情况下，起点使用的初始值是多少，使用什么样的斜率和截距？从真正的第一点开始。我没有具体说明这一点。然后我希望能够了解每一步如何调整斜率和截距。这将来自我认为的梯度下降算法。但这将是超级优势。对我来说更重要的是首先了解这些迭代是如何完成的，以及它们是如何连接到 epoch 和 batch 的。

更新：我知道初始值（斜率和截距）是随机选择的。

【问题讨论】：

【参考方案1】：

我试着玩了一下，我认为它是这样工作的：

并打印

在最后一批损失和指标之后不会打印，因此您在屏幕上看到的是在 epoch 中最后一次更新之前的损失和指标

所以基本上我认为可以直观地说，首先计算损失，然后更新权重，这意味着，权重更新是 epoch 中的最后一次操作。

如果您的模型是使用一个时期和一批训练的，那么您在屏幕上看到的就是根据初始权重和偏差计算的损失。如果您想在每个 epoch 结束后查看损失和指标（具有大多数“实际”权重），您可以将参数 validation_data=(X,y) 传递给 fit 方法。这告诉算法在 epoch 结束时再次根据给定的验证数据计算损失和指标。

关于模型的初始权重，您可以在手动为层设置一些初始权重时尝试（使用kernel_initializer参数）：

  model.add(tf.keras.layers.Dense(units=1,
                                  input_shape=(1,),
                                  kernel_initializer=tf.constant_initializer(.5)))

这是train_model函数的更新部分，它说明了我的意思：

  def train_model(model, feature, label, epochs, batch_size):
        """Train the model by feeding it data."""

        # Feed the feature values and the label values to the
        # model. The model will train for the specified number
        # of epochs, gradually learning how the feature values
        # relate to the label values.
        init_slope = model.get_weights()[0][0][0]
        init_bias = model.get_weights()[1][0]
        print('init slope is '.format(init_slope))
        print('init bias is '.format(init_bias))

        history = model.fit(x=feature,
                          y=label,
                          batch_size=batch_size,
                          epochs=epochs,
                          validation_data=(feature,label))

        # Gather the trained model's weight and bias.
        #print(model.get_weights())
        trained_weight = model.get_weights()[0]
        trained_bias = model.get_weights()[1]
        print("Slope")
        print(trained_weight)
        print("Intercept")
        print(trained_bias)
        # The list of epochs is stored separately from the
        # rest of history.
        prediction_manual = [trained_weight[0][0]*i + trained_bias[0] for i in feature]

        manual_loss = np.mean(((np.array(label)-np.array(prediction_manual))**2))
        print('manually computed loss after slope and bias update is '.format(manual_loss))
        print('manually computed rmse after slope and bias update is '.format(manual_loss**(1/2)))

        prediction_manual_init = [init_slope*i + init_bias for i in feature]
        manual_loss_init = np.mean(((np.array(label)-np.array(prediction_manual_init))**2))
        print('manually computed loss with init slope and bias is '.format(manual_loss_init))
        print('manually copmuted loss with init slope and bias is '.format(manual_loss_init**(1/2)))

输出：

"""
init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 402.9850 - root_mean_squared_error: 20.0745 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
Slope
[[0.65811384]]
Intercept
[0.15811387]
manually computed loss after slope and bias update is 352.3350379264957
manually computed rmse after slope and bias update is 18.77058970641295
manually computed loss with init slope and bias is 402.98499999999996
manually copmuted loss with init slope and bias is 20.074486294797182
"""

请注意，在坡度和偏差更新后手动计算的损失和指标与验证损失和指标相匹配，而在更新前手动计算的损失和指标与初始坡度和偏差的损失和指标相匹配。

关于第二个问题，我认为您可以手动将数据分成批次，然后遍历每个批次并适应它。然后，在每次迭代中，模型打印验证数据的损失和指标。像这样的：

  init_slope = model.get_weights()[0][0][0]
  init_bias = model.get_weights()[1][0]
  print('init slope is '.format(init_slope))
  print('init bias is '.format(init_bias))
  batch_size = 3

  for idx in range(0,len(feature),batch_size):
      model.fit(x=feature[idx:idx+batch_size],
                y=label[idx:idx+batch_size],
                batch_size=1000,
                epochs=epochs,
                validation_data=(feature,label))
      print('slope: '.format(model.get_weights()[0][0][0]))
      print('intercept: '.format(model.get_weights()[1][0]))
      print('x data used: '.format(feature[idx:idx+batch_size]))
      print('y data used: '.format(label[idx:idx+batch_size]))

输出：

init slope is 0.5
init bias is 0.0
1/1 [==============================] - 0s 117ms/step - loss: 48.9000 - root_mean_squared_error: 6.9929 - val_loss: 352.3351 - val_root_mean_squared_error: 18.7706
slope: 0.6581138372421265
intercept: 0.15811386704444885
x data used: [1.0, 2.0, 3.0]
y data used: [5.0, 8.8, 9.6]
1/1 [==============================] - 0s 21ms/step - loss: 200.9296 - root_mean_squared_error: 14.1750 - val_loss: 306.3082 - val_root_mean_squared_error: 17.5017
slope: 0.8132714033126831
intercept: 0.3018075227737427
x data used: [4.0, 5.0, 6.0]
y data used: [14.2, 18.8, 19.5]
1/1 [==============================] - 0s 22ms/step - loss: 363.2630 - root_mean_squared_error: 19.0595 - val_loss: 266.7119 - val_root_mean_squared_error: 16.3313
slope: 0.9573485255241394
intercept: 0.42669767141342163
x data used: [7.0, 8.0, 9.0]
y data used: [21.4, 26.8, 28.9]
1/1 [==============================] - 0s 22ms/step - loss: 565.5593 - root_mean_squared_error: 23.7815 - val_loss: 232.1553 - val_root_mean_squared_error: 15.2366
slope: 1.0924618244171143
intercept: 0.5409283638000488
x data used: [10.0, 11.0, 12.0]
y data used: [32.0, 33.8, 38.2]

【讨论】：

感谢您的回答，但是我需要每个时期和每个批次的每个步骤的系数。正如我在示例中所说，据我所知 12 个模型的结果，所以我需要 12 个模型中的每一个的斜率和截距系数。所以考虑你的最后一个数字，输出：当我按照此处所示的 4 个时期执行此操作时，我需要这 4 行的系数。但实际上不仅如此，每个批次和每个批次使用的数据也是如此。这样我就可以真正跟踪调整权重的每一个细节。好的，所以我更新了我的答案。基本上，您可以看到我在 .fit 方法中将 batch_size 设置为 1000。由于每次迭代的数据大小仅为 3，所有数据将一次处理。换句话说，现在每个时期只计算一次系数。我改变了它，那个时代实际上只有一批 3 个项目，所以你在大小为 3 的数据上适合 4 次，这就像批量大小为 3 并适合 12 个项目一样。但是在这里您可以看到每个“批次”之后的所有指标和系数 +1 感谢您的更新。我看到了手动创建批次的解决方法的想法。然而，我想并且认为必须有一个解决方案来解决这个问题。真正看到每个步骤和每个时期的每个细节（批次、斜率和系数）。【参考方案2】：

线性回归模型

线性回归模型只有一个具有线性激活函数的神经元。训练模型的基础是我们使用梯度下降。每次整个数据通过模型并更新权重时，称为 1 epoch。但是迭代和历元的概念在这里没有什么不同。

基本训练步骤：

Prepare data
Initialize the model and its parameters (weights and biases)
for each epoch:  #(both iteration and epoch same here)
    Forward Propagation
    Compute Cost
    Back Propagation
    Update Parameters

梯度下降有三种变体：

批量梯度下降 (BDG) 随机梯度下降 (SDG) 小批量梯度下降 (MDG)

批量梯度下降就是我们之前谈到的（传递整个数据）。通常也称为梯度下降。

在随机梯度下降中，我们一次通过 1 个随机示例，并且权重随着每个示例的通过而更新。现在迭代开始发挥作用。用 1 个示例完成模型训练后，1 次迭代 完成。然而，数据集中还有更多模型尚未看到的示例。完全训练所有这些示例称为 1 epoch。由于一次通过 1 个示例，SDG 对于较大的数据集非常慢，因为它失去了矢量化的效果。

所以我们一般使用Mini-Batch Gradient Descent。在这里，数据集被分成许多固定大小的块。每个数据块的大小称为batch size，它可以介于 1 和数据大小之间。在每个 Epoch 上，这些批次的数据用于训练模型。

1 次迭代处理 1 批数据。 1 epoch 处理整批数据。 1 个 epoch 包含 1 次或多次迭代。

因此，如果数据的大小为 m ，则每次迭代期间输入的数据为：

BDG = 米可持续发展目标 = 1 千年发展目标 = 1

MGD 的基本训练步骤：

Prepare data
Initialize the model and its parameters (weights and biases)
for each epoch:  #(epoch)
    for each mini_batch: #(iteration)
        Forward Propagation
        Compute Cost
        Back Propagation
        Update Parameters

这是梯度下降、批处理、历元和迭代背后的理论概念。

现在转到 Keras 和您的代码：

我运行了 Colab Code，它运行良好。在您发布的代码中，时期数为 1，这对于模型学习来说非常小，因为数据非常少而且模型本身非常简单。所以你需要增加数据量或创建更复杂的模型或训练更多的纪元到目前为止我发现从 400-500从笔记本。在适当调整学习率的情况下，可以减少 epoch 数

learning_rate=0.14
epochs=70
my_batch_size= 32 

my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature, 
                                                        my_label, epochs,
                                                        my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)

如果学习率非常小，模型将学习缓慢，因此需要更大的训练周期（epoch）才能进行更准确的预测。增加学习率可以加快学习过程，因此可以减少时期。请比较 colab 中代码的不同部分以获得适当的示例。

关于获取每次迭代的指标：

Keras 是 TensorFlow 的高级 API。到目前为止，我知道（不考虑 API 的定制），在 Keras 的训练期间，它会在每次迭代结束时计算训练集的损失、错误和准确度，并在每个 epoch 结束时返回它们各自的平均值。因此，如果有 n 个 epoch，那么无论中间有多少次迭代，每个指标都会有 n 个。

关于斜率和截距：

线性回归模型在输出层使用线性激活函数，即y = mx + c。对于我们拥有的价值观

y - 指输出 x - 指输入 m - 指坡度（必须调整） c - 指截距（也可以调整）

在我们的模型中，这些 m 和 c 是我们调整的。它们是我们模型的 weight 和 bias。所以我们的函数看起来像 y = Wx + b 其中 b 给出截距，w 给出斜率。权重和偏差在开始时随机初始化。

从头开始的线性回归模型的 Colab 链接

请根据需要调整值。由于模型是从头开始实现的，因此请收集或打印您想要在训练期间跟踪的任何值。您也可以使用自己的数据集，但请确保它是有效的或由某些库生成以进行模型验证（sklearn）。

https://colab.research.google.com/drive/1RfuRNMoVv-l6KyM_SegdJOHiXD_0xBHq?usp=sharing

附：如果您发现任何令人困惑的事情，请发表评论。我很乐意回复。

【讨论】：

感谢您的广泛回答，但我的问题有所不同。我想跟踪每一步并为其输出并能够手动计算它，以遵循它。我非常了解什么是线性模型以及如何计算这些值。请查看我在主评论底部添加的 Colab 链接。我尝试创建一个单一的神经元模型，就像您使用 keras 所做的那样，但从头开始，因此您可以在模型中的任何位置使用任何值。我手动创建了训练和测试数据集，并训练了模型来预测华氏温度，当它以摄氏温度给出时。由于数据集很小，我没有创建任何小批量（其概念与我之前提到的类似）。所以，这里的时代和迭代都意味着同样的事情。希望这会有所帮助。我还以为你问的是epoch、iteration和batch之间的关系，所以我不得不详细说明一下。【参考方案3】：

基础

问题陈述

让我们考虑一组样本X 的线性回归模型，其中每个样本由一个特征x 表示。作为模型训练的一部分，我们正在搜索w.x + b 使得((w.x+b) -y )^2（平方损失）最小。对于一组数据点，我们采用每个样本的均方损失，即所谓的均方误差 (MSE)。代表权重和偏差的w 和b 统称为权重。

拟合线/训练模型

(X^T.X)^-1.X^T.y

渐变体面

学习回归的梯度体面算法看起来很糟糕

w, b = some initial value
While model has not converged:
    y_hat = w.X + b
    error = MSE(y, y_hat) 
    back propagate (BPP) error and adjust weights

上述循环的每次运行称为一个时期。但是由于资源限制，y_hat、error 和 BPP 的计算不是在完整的数据集上进行的，而是将数据分成更小的批次，一次对一批执行上述操作。此外，我们通常会固定 epoch 的数量并监控模型是否收敛。

w, b = some initial value
for i in range(number_of_epochs)
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat) 
    back propagate (BPP) error and adjust weights

Keras 批量实现

假设我们想添加均方根误差，以便在训练时跟踪模型性能。 Keras的实现方式如下

w, b = some initial value
for i in range(number_of_epochs)
    all_y_hats = []
    all_ys = []
    for X_batch,y_batch in get_next_batch(X, y)
        y_hat = w.X_batch + b
        error = MSE(y_batch, y_hat)

        all_y_hats.extend(y_hat) 
        all_ys.extend(y_batch)

        batch_rms_error = RMSE(all_ys, all_y_hats)

    back propagate (BPP) error and adjust weights

如您在上面看到的，预测是累积的，RMSE 是根据累积的预测计算的，而不是取所有先前批次 RMSE 的平均值。

在 keras 中的实现

现在我们的基础已经很清楚了，让我们看看如何在 keras 中实现跟踪。 keras 有回调，所以我们可以挂钩on_batch_begin 回调并累积all_y_hats 和all_ys。在on_batch_end 回调上，keras 为我们提供了计算出的RMSE。我们将使用我们累积的all_y_hats 和all_ys 手动计算RMSE，并验证它是否与 keras 计算的相同。我们还将保存权重，以便稍后绘制正在学习的线。

import numpy as np
from sklearn.metrics import mean_squared_error
import keras
import matplotlib.pyplot as plt

# Some training data
X = np.arange(16)
y = 0.5*X +0.2

batch_size = 8
all_y_hats = []
learned_weights = [] 

class CustomCallback(keras.callbacks.Callback):
  def on_batch_begin(self, batch, logs=):    
    w = self.model.layers[0].weights[0].numpy()[0][0]
    b = self.model.layers[0].weights[1].numpy()[0]    
    s = batch*batch_size
    all_y_hats.extend(b + w*X[s:s+batch_size])    
    learned_weights.append([w,b])

  def on_batch_end(self, batch, logs=):    
    calculated_error = np.sqrt(mean_squared_error(all_y_hats, y[:len(all_y_hats)]))
    print (f"\n Calculated: calculated_error,  Actual: logs['root_mean_squared_error']")
    assert np.isclose(calculated_error, logs['root_mean_squared_error'])

  def on_epoch_end(self, batch, logs=):
    del all_y_hats[:]    


model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape=(1,)))
model.compile(optimizer=keras.optimizers.RMSprop(lr=0.01), loss="mean_squared_error",  metrics=[keras.metrics.RootMeanSquaredError()])
# We should set shuffle=False so that we know how baches are divided
history = model.fit(X,y, epochs=100, callbacks=[CustomCallback()], batch_size=batch_size, shuffle=False)

输出：

Epoch 1/100
 8/16 [==============>...............] - ETA: 0s - loss: 16.5132 - root_mean_squared_error: 4.0636
 Calculated: 4.063645694548688,  Actual: 4.063645839691162

 Calculated: 8.10112834945773,  Actual: 8.101128578186035
16/16 [==============================] - 0s 3ms/step - loss: 65.6283 - root_mean_squared_error: 8.1011
Epoch 2/100
 8/16 [==============>...............] - ETA: 0s - loss: 14.0454 - root_mean_squared_error: 3.7477
 Calculated: 3.7477213352845675,  Actual: 3.7477214336395264
-------------- truncated -----------------------

哒哒！断言assert np.isclose(calculated_error, logs['root_mean_squared_error']) 从未失败，因此我们的计算/理解是正确的。

线

最后，让我们根据均方误差损失绘制由 BPP 算法调整的线。我们可以使用下面的代码来创建每批学习的线条的 png 图像以及训练数据。

for i, (w,b) in enumerate(learned_weights):
  plt.close()
  plt.axis([-1, 18, -1, 10])
  plt.scatter(X, y)
  plt.plot([-1,17], [-1*w+b, 17*w+b], color='green')
  plt.savefig(f'imgi+1.png')

以下是以上图片按照学习顺序的gif动画。

y = 0.5*X +5.2 时学习的超平面（本例中为直线）

【讨论】：

+1 感谢您的回答，但是为什么绿线中的斜率从未更新？我不明白。据我了解，每个时期的每个批次的系数都会更新。所以这给了我 12 种不同的线性模型。就我的理解而言。所以我想要的是这 12 个模型的系数。我知道您有更多线性模型，因为您使用了更多时期，但我仍然不明白为什么在您的示例中截距（偏差）是固定的。绿线得到了调整，所以斜率，而不是截距？没有截距是不固定的，它最初的变化非常缓慢，最后还要看看它是如何变化的。这主要是因为我们的实际截距是 0.2，并且由于它被初始化为 0，所以它变化缓慢。尝试使用不同的截距，例如 y = 0.5*X +3.5，您应该会看到更明显的变化。已在答案中添加了相同的 gif。尝试在on_batch_begin 中打印截距b，您会发现它正在发生变化，但速度非常缓慢。

以上是关于在 tf.keras 中理解线性回归模型调优的问题的主要内容，如果未能解决你的问题，请参考以下文章