在学习期间将数据添加到自动编码器中的解码器

Posted 2023-02-23

技术标签:

【中文标题】在学习期间将数据添加到自动编码器中的解码器【英文标题】：adding data to decoder in autoencoder during learning 【发布时间】：2019-02-19 14:25:51 【问题描述】：

我想使用 Keras 实现一个自动编码器，这个结构是一个大型网络，它对自动编码器的输出进行了一些操作，然后我们应该考虑两个损失我附上了一张显示我提出的结构的图像。链接如下。

autoencoder structure

w 与输入图像具有相同的大小，并且在此自动编码器中，我不使用最大池化，因此每个阶段的输出与输入图像具有相同的大小。我想将 w 和潜在空间表示发送到解码器部分，然后在向解码器输出添加噪声后尝试使用网络的第三部分提取 w。所以我需要我的损失函数考虑输入图像和潜在空间表示之间以及w和w'之间的差异。但我在实施上有几个问题。我不知道如何将 w 添加到解码器输出中，因为使用了这一行“merge_encoded_w=cv2.merge(encoded,w) “产生错误并且不起作用。我不确定我的损失函数是否基于我需要的内容是否正确？请帮助我处理这段代码。我是初学者，找到解决方案对我来说很困难。我问了这个问题之前没有人帮助我。请指导我。我的代码如下：

from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, Activation,UpSampling2D,Conv2D, MaxPooling2D, GaussianNoise
from keras.models import Model
from keras.optimizers import SGD
from keras.datasets import mnist
from keras import regularizers
from keras import backend as K
import keras as k
import numpy as np
import matplotlib.pyplot as plt
import cv2
from time import time
from keras.callbacks import TensorBoard
# Embedding phase
##encoder

w=np.random.random((1, 28,28))
input_img = Input(shape=(28, 28, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(8, (5, 5), activation='relu', padding='same')(input_img)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(2, (3, 3), activation='relu', padding='same')(x)
encoded = Conv2D(1, (3, 3), activation='relu', padding='same')(x)
merge_encoded_w=cv2.merge(encoded,w)
#
#decoder

x = Conv2D(2, (5, 5), activation='relu', padding='same')(merge_encoded_w)
#x = UpSampling2D((2, 2))(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu',padding='same')(x)
#x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

#Extraction phase
decodedWithNois=k.layers.GaussianNoise(0.5)(decoded)
x = Conv2D(8, (5, 5), activation='relu', padding='same')(decodedWithNois)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = MaxPooling2D((2, 2), padding='same')(x)
final_image_watermark = Conv2D(2, (3, 3), activation='relu', padding='same')(x)


autoencoder = Model([input_img,w], [decoded,final_image_watermark(2)])
encoder=Model(input_img,encoded)
autoencoder.compile(optimizer='adadelta', loss=['mean_squared_error','mean_squared_error'],metrics=['accuracy'])
(x_train, _), (x_test, _) = mnist.load_data()
x_validation=x_train[1:10000,:,:]
x_train=x_train[10001:60000,:,:]
#
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_validation = x_validation.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_validation = np.reshape(x_validation, (len(x_validation), 28, 28, 1))  # adapt this if using `channels_first` image data format
autoencoder.fit(x_train, x_train,
                epochs=5,
                batch_size=128,
                shuffle=True,
                validation_data=(x_validation, x_validation),
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

decoded_imgs = autoencoder.predict(x_test)
encoded_imgs=encoder.predict(x_test)

【问题讨论】：

w 是您提供的数据还是随机数组？可以是学习时添加的随机数组。你说的这两件事有什么区别？只是想知道为什么要添加一个随机数组并恢复它。编程上的区别是 keras 要求所有数据在第一个维度上具有相同的保存。无论如何，我认为您不能使用 cv2.merge 来合并张量。请改用Concatenate。如果 w 是馈送到网络的数据，我该怎么办？我想将 w 和编码输出作为 28X28X2 过滤器发送到解码部分。可能吗？连接会为我做这个吗？当我使用连接时它会产生这个错误“所有输入到层应该是张量”???? 【参考方案1】：

对于这种大型架构，我建议您从小块构建，然后将这些块组合在一起。首先，编码器部分。它接收大小为(28,28,1) 的图像并返回形状为(28,28,1) 的编码图像。

from keras.layers import Input, Concatenate, GaussianNoise
from keras.layers import Conv2D
from keras.models import Model

def make_encoder():
    image = Input((28, 28, 1))
    x = Conv2D(8, (5, 5), activation='relu', padding='same')(image)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(2, (3, 3), activation='relu', padding='same')(x)
    encoded =  Conv2D(1, (3, 3), activation='relu', padding='same')(x)

    return Model(inputs=image, outputs=encoded)
encoder = make_encoder()
encoder.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_1 (InputLayer)         (None, 28, 28, 1)         0         
#_________________________________________________________________
#conv2d_1 (Conv2D)            (None, 28, 28, 8)         208       
_________________________________________________________________
#conv2d_2 (Conv2D)            (None, 28, 28, 4)         292       
#_________________________________________________________________
#conv2d_3 (Conv2D)            (None, 28, 28, 2)         74        
#_________________________________________________________________
#conv2d_4 (Conv2D)            (None, 28, 28, 1)         19        
#=================================================================
#Total params: 593
#Trainable params: 593
#Non-trainable params: 0
#_________________________________________________________________

形状过渡符合理论。接下来，解码器部分将编码与另一个数组合并，形状(28, 28, 2)，最后恢复原始图像，形状（28,28,1）。

def make_decoder():
    encoded_merged = Input((28, 28, 2))
    x = Conv2D(2, (5, 5), activation='relu', padding='same')(encoded_merged)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu',padding='same')(x)
    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) 

    return Model(inputs=encoded_merged, outputs=decoded)
decoder = make_decoder()
decoder.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_2 (InputLayer)         (None, 28, 28, 2)         0         
#_________________________________________________________________
#conv2d_5 (Conv2D)            (None, 28, 28, 2)         102       
#_________________________________________________________________
#conv2d_6 (Conv2D)            (None, 28, 28, 4)         76        
#_________________________________________________________________
#conv2d_7 (Conv2D)            (None, 28, 28, 8)         296       
#_________________________________________________________________
#conv2d_8 (Conv2D)            (None, 28, 28, 1)         73        
#=================================================================
#Total params: 547
#Trainable params: 547
#Non-trainable params: 0
#_________________________________________________________________

然后模型也会尝试恢复 W 数组。输入是重构图像加噪声（形状为(28, 28, 1)）。

def make_w_predictor():
    decoded_noise = Input((28, 28, 1))
    x = Conv2D(8, (5, 5), activation='relu', padding='same')(decoded_noise)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    pred_w = Conv2D(1, (3, 3), activation='relu', padding='same')(x)  
    # reconsider activation (is W positive?)
    # should be filter=1 to match W
    return Model(inputs=decoded_noise, outputs=pred_w)

w_predictor = make_w_predictor()
w_predictor.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_3 (InputLayer)         (None, 28, 28, 1)         0         
#_________________________________________________________________
#conv2d_9 (Conv2D)            (None, 28, 28, 8)         208       
#_________________________________________________________________
#conv2d_10 (Conv2D)           (None, 28, 28, 4)         292       
#_________________________________________________________________
#conv2d_11 (Conv2D)           (None, 28, 28, 1)         37        
#=================================================================
#Total params: 537
#Trainable params: 537
#Non-trainable params: 0
#_________________________________________________________________

在手头有所有部件的情况下，将部件组合在一起构建整个模型并不难。请注意，您在上面构建的模型可以像层一样使用。

def put_together(encoder, decoder, w_predictor):
    image = Input((28, 28, 1))
    w = Input((28, 28, 1))
    encoded = encoder(image)

    encoded_merged = Concatenate(axis=3)([encoded, w])
    decoded = decoder(encoded_merged)

    decoded_noise = GaussianNoise(0.5)(decoded)
    pred_w = w_predictor(decoded_noise)

    return Model(inputs=[image, w], outputs=[decoded, pred_w])

model = put_together(encoder, decoder, w_predictor)
model.summary()

#__________________________________________________________________________________________________
#Layer (type)                    Output Shape         Param #     Connected to                     
#==================================================================================================
#input_4 (InputLayer)            (None, 28, 28, 1)    0                                            
#__________________________________________________________________________________________________
#model_1 (Model)                 (None, 28, 28, 1)    593         input_4[0][0]                    
#__________________________________________________________________________________________________
#input_5 (InputLayer)            (None, 28, 28, 1)    0                                            
#__________________________________________________________________________________________________
#concatenate_1 (Concatenate)     (None, 28, 28, 2)    0           model_1[1][0]                    
#                                                                 input_5[0][0]                    
#__________________________________________________________________________________________________
#model_2 (Model)                 (None, 28, 28, 1)    547         concatenate_1[0][0]              
#__________________________________________________________________________________________________
#gaussian_noise_1 (GaussianNoise (None, 28, 28, 1)    0           model_2[1][0]                    
#__________________________________________________________________________________________________
#model_3 (Model)                 (None, 28, 28, 1)    537         gaussian_noise_1[0][0]           
#==================================================================================================
#Total params: 1,677
#Trainable params: 1,677
#Non-trainable params: 0
#__________________________________________________________________________________________________

下面的代码使用虚拟数据训练模型。当然，只要形状匹配，你也可以使用自己的。

import numpy as np

# dummy data
images = np.random.random((1000, 28, 28, 1))
w = np.random.lognormal(size=(1000, 28, 28, 1))

# is accuracy sensible metric for this model?
model.compile(optimizer='adadelta', loss='mse', metrics=['accuracy'])
model.fit([images, w], [images, w], batch_size=64, epochs=5)

以下编辑

我对您放在这里的代码有一些疑问。在 make_w_ 预测器中，您说：“#重新考虑激活（W 是否为正？）# 应该是 filter=1 以匹配 W”是什么意思？ W 是一个包含 0 和 1 的数组。“重新考虑激活”是什么意思，我应该更改这部分的代码吗？

relu 激活在 [0, +inf) 中返回正数，因此如果 W 采用不同的值集，它可能不是一个好的选择。典型的选择如下。

W 可以是正数和负数：“线性”激活。 W in [0, 1]：“sigmoid”激活。 W in [-1, 1]：“tanh”激活。 W 是正数：“relu”激活。

在原始代码中，您有：

w=np.random.random((1, 28, 28))

取值在 0 和 1 之间。所以我建议从“relu”切换到“sigmoid”。但我没有更改我的代码示例，因为我不确定这是否是有意的。

您说过滤器应该是 1 这意味着将 (3,3) 更改为 (1,1)？我对这些问题感到非常抱歉。但我是初学者，我找不到你说的其中一些。你能帮我解释一下吗？

我在原始问题中引用了这一行：

final_image_watermark = Conv2D(2, (3, 3), activation='relu', padding='same')(x)

如果我理解正确，这在附加图像中定义了W'，它应该预测W，它的大小是(28, 28, 1)。那么Conv2D 的第一个参数应该是一个。否则输出形状变为(28, 28, 2)。我在我的代码示例中进行了此更改，否则它会发出形状不匹配错误：

pred_w = Conv2D(1, (3, 3), activation='relu', padding='same')(x)

我认为 (3, 3) 部分，kernel size 在 keras 中，按原样很好。

【讨论】：

非常感谢您的回复。它对我有帮助，我学习应该如何编写代码。但我还有一些其他问题。我认为这个损失函数只使用 w 来计算准确性。是真的吗？我想要两个损失函数。在输入图像和解码器输出之间，以及 w 和 w' 之间。我该如何编写这个混合损失函数？两者都用。对于多输出模型，keras 计算每个的损失，并取平均值作为总损失。 keras.io/getting-started/functional-api-guide/… 我对您放在这里的代码有一些疑问。在 make_w_ 预测器中，您说：“#重新考虑激活（W 是否为正？）# 应该是 filter=1 以匹配 W”是什么意思？ W 是一个包含 0 和 1 的数组。“重新考虑激活”是什么意思，我应该更改这部分的代码吗？你说过滤器应该是 1 这意味着将（3,3）更改为（1,1）？我对这些问题感到非常抱歉。但我是初学者，我找不到你说的其中一些。你能帮我解释一下吗？ @maede 我对此评论添加了答案。你可以这样做，但如果你对两个输出使用相同的损失函数，你可以简单地给loss='mse'。然后 keras 通过相同的损失函数计算每个输出的损失。 keras.io/getting-started/functional-api-guide/…

以上是关于在学习期间将数据添加到自动编码器中的解码器的主要内容，如果未能解决你的问题，请参考以下文章