Keras Autoencoder：将权重从编码器绑定到解码器不起作用

Posted 2023-02-16

技术标签:

【中文标题】Keras Autoencoder：将权重从编码器绑定到解码器不起作用【英文标题】：Keras Autoencoder: Tying Weights from Encoder To Decoder not working 【发布时间】：2020-01-09 15:25:05 【问题描述】：

我正在创建一个自动编码器，作为我参加 Kaggle 比赛的完整模型的一部分。我试图绑定编码器的权重，转置到解码器。在第一个 Epoch 之前，权重正确同步，之后，Decoder 权重只是冻结，并且跟不上由 Gradient Descent 更新的 Encoder 权重。

我在谷歌上找到的几乎每一篇关于这个问题的帖子都找了 12 个小时，似乎没有人知道我的案例的答案。最接近的是Tying Autoencoder Weights in a Dense Keras Layer，但问题是通过不使用变量张量作为内核来解决的，但我已经没有使用那种类型的张量作为我的解码器内核，所以没有用。

我使用本文https://towardsdatascience.com/build-the-right-autoencoder-tune-and-optimize-using-pca-principles-part-ii-24b9cca69bd6 中定义的DenseTied Keras 自定义Layer 类，完全一样，只是改变我引用支持的Keras 的方式以适应我的导入风格。

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

这是自定义层定义

class DenseTied(tf.keras.layers.Layer):

    def __init__(self, units,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 tied_to=None,
                 **kwargs):
        self.tied_to = tied_to
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super().__init__(**kwargs)
        self.units = units
        self.activation = tf.keras.activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = tf.keras.initializers.get(kernel_initializer)
        self.bias_initializer = tf.keras.initializers.get(bias_initializer)
        self.kernel_regularizer = tf.keras.regularizers.get(kernel_regularizer)
        self.bias_regularizer = tf.keras.regularizers.get(bias_regularizer)
        self.activity_regularizer = tf.keras.regularizers.get(activity_regularizer)
        self.kernel_constraint = tf.keras.constraints.get(kernel_constraint)
        self.bias_constraint = tf.keras.constraints.get(bias_constraint)
        self.input_spec = tf.keras.layers.InputSpec(min_ndim=2)
        self.supports_masking = True

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]

        if self.tied_to is not None:
            self.kernel = tf.keras.backend.transpose(self.tied_to.kernel)
            self.non_trainable_weights.append(self.kernel)
        else:
            self.kernel = self.add_weight(shape=(input_dim, self.units),
                                          initializer=self.kernel_initializer,
                                          name='kernel',
                                          regularizer=self.kernel_regularizer,
                                          constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None
        self.input_spec = tf.keras.layers.InputSpec(min_ndim=2, axes=-1: input_dim)
        self.built = True

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) >= 2
        output_shape = list(input_shape)
        output_shape[-1] = self.units
        return tuple(output_shape)

    def call(self, inputs):
        output = tf.keras.backend.dot(inputs, self.kernel)
        if self.use_bias:
            output = tf.keras.backend.bias_add(output, self.bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output

这是使用虚拟数据集进行模型训练和测试

rand_samples = np.random.rand(16, 51)
dummy_ds = tf.data.Dataset.from_tensor_slices((rand_samples, rand_samples)).shuffle(16).batch(16)

encoder = tf.keras.layers.Dense(1, activation="linear", input_shape=(51,), use_bias=True)
decoder = DenseTied(51, activation="linear", tied_to=encoder, use_bias=True)

autoencoder = tf.keras.Sequential()
autoencoder.add(encoder)
autoencoder.add(decoder)

autoencoder.compile(metrics=['accuracy'],
                    loss='mean_squared_error',
                    optimizer='sgd')

autoencoder.summary()

print("Encoder Kernel Before 1 Epoch", encoder.kernel[0])
print("Decoder Kernel Before 1 Epoch", decoder.kernel[0][0])

autoencoder.fit(dummy_ds, epochs=1)

print("Encoder Kernel After 1 Epoch", encoder.kernel[0])
print("Decoder Kernel After 1 Epoch", decoder.kernel[0][0])

预期的输出是在第一个元素中具有完全相同的两个内核（为简单起见，仅打印一个权重）

当前输出显示Decoder Kernel没有像Transposed Encoder Kernel一样更新

2019-09-06 14:55:42.070003: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-09-06 14:55:42.984580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733
pciBusID: 0000:01:00.0
2019-09-06 14:55:43.088109: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.        
2019-09-06 14:55:43.166145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-06 14:55:43.203865: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-09-06 14:55:43.277988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733
pciBusID: 0000:01:00.0
2019-09-06 14:55:43.300888: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.        
2019-09-06 14:55:43.309040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-06 14:55:44.077814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-06 14:55:44.094542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-06 14:55:44.099411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-06 14:55:44.103424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4712 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 1)                 52
_________________________________________________________________
dense_tied (DenseTied)       (None, 51)                103
=================================================================
Total params: 103
Trainable params: 103
Non-trainable params: 0
_________________________________________________________________
Encoder Kernel Before 1 Epoch tf.Tensor([0.20486075], shape=(1,), dtype=float32)
Decoder Kernel Before 1 Epoch tf.Tensor(0.20486075, shape=(), dtype=float32)
1/1 [==============================] - 1s 657ms/step - loss: 0.3396 - accuracy: 0.0000e+00
Encoder Kernel After 1 Epoch tf.Tensor([0.20530733], shape=(1,), dtype=float32)
Decoder Kernel After 1 Epoch tf.Tensor(0.20486075, shape=(), dtype=float32)
PS C:\Users\whitm\Desktop\CodeProjects\ForestClassifier-DEC>

我不明白我做错了什么。

【问题讨论】：

您是否尝试逐行运行 TDS 文章中的代码？我尝试运行文章中的代码，一次训练一个 epoch，并检查编码器和解码器的权重是否相等。他们是匹配的。我建议尝试使用大小大于 1 的编码器进行健全性测试。我已经测试了不同尺寸的编码器，为了简单起见，我放了 1 同样在复制样本中，我将模型仅训练一个 epoch 这是一个最小的复制示例，我的完整自动编码器有点复杂 【参考方案1】：

要绑定权重，我建议使用Keras functional API，它可以共享层。也就是说，这里有一个替代实现，它将编码器和解码器之间的权重联系起来：

class TransposableDense(tf.keras.layers.Dense):

    def __init__(self, units, **kwargs):
        super().__init__(units, **kwargs)

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]
        self.t_output_dim = input_dim

        self.kernel = self.add_weight(shape=(int(input_dim), self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
            self.bias_t = self.add_weight(shape=(input_dim,),
                                          initializer=self.bias_initializer,
                                          name='bias_t',
                                          regularizer=self.bias_regularizer,
                                          constraint=self.bias_constraint)
        else:
            self.bias = None
            self.bias_t = None
        # self.input_spec = tf.keras.layers.InputSpec(min_ndim=2, axes=-1: input_dim)
        self.built = True

    def call(self, inputs, transpose=False):
        bs, input_dim = inputs.get_shape()

        kernel = self.kernel
        bias = self.bias
        if transpose:
            assert input_dim == self.units
            kernel = tf.keras.backend.transpose(kernel)
            bias = self.bias_t

        output = tf.keras.backend.dot(inputs, kernel)
        if self.use_bias:
            output = tf.keras.backend.bias_add(output, bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output

    def compute_output_shape(self, input_shape):
        bs, input_dim = input_shape
        output_dim = self.units
        if input_dim == self.units:
            output_dim = self.t_output_dim
        return bs, output_dim

这个密集层的内核可以通过transpose=True调用层来进行转置。请注意，这可能破坏一些基本的 Keras 原则（例如，图层具有多个输出形状），但它应该适用于您的情况。

这是一个示例，展示了如何使用它来定义模型：

a = tf.keras.layers.Input((51,))
dense = TransposableDense(1, activation='linear', use_bias=True)
encoder_out = dense(a)
decoder_out = dense(encoder_out, transpose=True)
encoder = tf.keras.Model(a, encoder_out)
autoencoder = tf.keras.Model(a, decoder_out)

【讨论】：

我会测试这个解决方案并使其适应我的完整模型，我会告诉你什么时候可以工作这不是我阅读的文章所采用的原始方法，但它是一种非常聪明的方法，自动编码器正在工作，并且权重具有允许保存和加载的结构自动编码器训练完成后的 Keras Dense 层（这最后还有待确认，但我的直觉告诉我这是可能的）。这有利于在下一步开发完整模型时摆脱这个自定义类【参考方案2】：

权重没有捆绑。您只是用第一层的转置权重初始化绑定层的权重，然后从不训练它们。 transpose 返回一个新的张量/不同对象，add_weight 创建一个新变量，因此在build 之后两层之间的任何关系都丢失了。我认为这样做会更好：

def call(self, inputs):
    output = tf.keras.backend.dot(inputs, tf.keras.backend.transpose(self.tied_to.kernel))
    if self.use_bias:
        output = tf.keras.backend.bias_add(output, self.tied_to.bias, data_format='channels_last')
    if self.activation is not None:
        output = self.activation(output)
    return output

在这里，绑定层始终明确使用第一层的权重，并且本身没有任何权重（即从build 中删除add_weight 部分）。

【讨论】：

我已经尝试过了，并且我知道这个解决方案，但是，如果是这样，为什么会有大量的帖子和文章提出与我展示给你的完全相同的自定义层？他们都错了吗？你说得对，我有点误读了你的代码；您实际上并没有在并列的情况下创建新的权重。恐怕我现在没有时间对此进行更多研究，但我希望稍后会更新我的答案。

以上是关于Keras Autoencoder：将权重从编码器绑定到解码器不起作用的主要内容，如果未能解决你的问题，请参考以下文章