Variational AutoEncoder- Keras-logits 和标签必须具有相同的形状?

Posted

技术标签:

【中文标题】Variational AutoEncoder- Keras-logits 和标签必须具有相同的形状?【英文标题】:Variational AutoEncoder- Keras- logits and labels must have the same shape? 【发布时间】:2021-10-22 01:18:34 【问题描述】:

我正在 Keras https://keras.io/examples/generative/vae/ 上浏览这个示例

但是,我正在尝试使用 200 x 200 尺寸的图像来复制它。我收到的具体错误是:

ValueError: logits and labels must have the same shape ((None, 8, 8, 1) vs (None, 200, 200, 3))

这是我的编码器,修改后的 keras.Input(shape=(200, 200, 3))

latent_dim = 2

encoder_inputs = keras.Input(shape=(200, 200, 3))
x = layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(encoder_inputs)
x = layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")
encoder.summary()
Model: "encoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_17 (InputLayer)           [(None, 200, 200, 3) 0                                            
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 100, 100, 32) 896         input_17[0][0]                   
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 50, 50, 64)   18496       conv2d_8[0][0]                   
__________________________________________________________________________________________________
flatten_4 (Flatten)             (None, 160000)       0           conv2d_9[0][0]                   
__________________________________________________________________________________________________
dense_16 (Dense)                (None, 16)           2560016     flatten_4[0][0]                  
__________________________________________________________________________________________________
z_mean (Dense)                  (None, 2)            34          dense_16[0][0]                   
__________________________________________________________________________________________________
z_log_var (Dense)               (None, 2)            34          dense_16[0][0]                   
__________________________________________________________________________________________________
sampling_3 (Sampling)           (None, 2)            0           z_mean[0][0]                     
                                                                 z_log_var[0][0]                  
==================================================================================================
Total params: 2,579,476
Trainable params: 2,579,476
Non-trainable params: 0

我相信错误出在我的解码器上,我试图将图层修改为 200 倍。

latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(2 * 2 * 50, activation="relu")(latent_inputs)
x = layers.Reshape((2, 2, 50))(x) ##changed this
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")
decoder.summary()
Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_18 (InputLayer)        [(None, 2)]               0         
_________________________________________________________________
dense_17 (Dense)             (None, 200)               600       
_________________________________________________________________
reshape_12 (Reshape)         (None, 2, 2, 50)          0         
_________________________________________________________________
conv2d_transpose_13 (Conv2DT (None, 4, 4, 64)          28864     
_________________________________________________________________
conv2d_transpose_14 (Conv2DT (None, 8, 8, 32)          18464     
_________________________________________________________________
conv2d_transpose_15 (Conv2DT (None, 8, 8, 1)           289       
=================================================================
Total params: 48,217
Trainable params: 48,217
Non-trainable params: 0

我的 pic1 的图片尺寸是:

(312, 465)

然后我运行它,并遇到错误:

pic_1 = np.expand_dims(pic1, 0).astype("float32") / 255 

pic_1 = pic_1[:,-201:-1, 0:200] #Trim the picture to fit the input 200-by-200 dimensions

vae = VAE(encoder, decoder)
vae.compile(optimizer=keras.optimizers.Adam())
vae.fit(pic_1, epochs=30, batch_size=128)

我收到这个错误是什么原因:

        raise ValueError("logits and labels must have the same shape (%s vs %s)" %

    ValueError: logits and labels must have the same shape ((None, 8, 8, 1) vs (None, 200, 200, 3))

就像我说的,我认为我的解码器输出形状有问题,因为它与编码器的输入形状不匹配?任何帮助将不胜感激。

【问题讨论】:

您需要确保将潜在表示放大到 ( 200 , 200 , 3 ) 而不是 ( 8 , 8 , 3 )。您可以通过添加额外的Conv2DTranspose 层来做到这一点。 如何添加正确的层数以等于 200、200、3?例如我添加了 x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x) x = layers.Conv2DTranspose(64, 3, activation="relu", strides =2, padding="same")(x) x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x) x = layers.Conv2DTranspose(32, 3 , activation="relu", strides=2, padding="same")(x) x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x) x = layers.Conv2DTranspose(16, 3, activation="relu", strides=2, padding="same")(x) conv2d_transpose_269 (Conv2D (None, 10, 10, 64) 4672 conv2d_transpose_270 (Conv2D (None, 20, 20, 64) 36928 conv2d_transpose_271 (Conv2D (None, 40, 40, 64) (Conv2D_transpose) 32228) (None, 80, 80, 32) 18464 conv2d_transpose_273 (Conv2D (None, 160, 160, 32) 9248 conv2d_transpose_274 (Conv2D (None, 320, 320, 16) 4624 conv2d_transpose_275 (Conv2D (None, 3425, 3) /跨度> 【参考方案1】:

解码器不需要完全等于编码器(相同的层和长度),但需要输入与输出具有相同的形状。基本上,你可以改变解码器的层,但最终的层形状必须是(200,200,3)。简单的方法是根据编码器镜像解码器。举个例子:

latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(16, activation="relu")(latent_inputs)
x = layers.Dense(50 * 50 * 64, activation="relu")(x)
x = layers.Reshape((50, 50, 64))(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(3, 3, activation="sigmoid", strides=2, padding="same")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")
decoder.summary()

解码器与编码器不同但输出形状始终为 (200,200,3) 的其他示例是:

latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(8, activation="relu")(latent_inputs)
x = layers.Dense(16, activation="relu")(x)
x = layers.Dense(50 * 50 * 64, activation="relu")(x)
x = layers.Reshape((50, 50, 64))(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=1, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=1, padding="same")(x)
x = layers.Conv2DTranspose(16, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(3, 3, activation="sigmoid", strides=2, padding="same")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")
decoder.summary()

【讨论】:

以上是关于Variational AutoEncoder- Keras-logits 和标签必须具有相同的形状?的主要内容,如果未能解决你的问题,请参考以下文章

解析Variational AutoEncoder(VAE)

变分自编码器(Variational Autoencoder, VAE)通俗教程

Variational AutoEncoder- Keras-logits 和标签必须具有相同的形状?

文献阅读·62-Variational Autoencoder based Anomaly Detection using Reconstruction Probability

torch09:variational_autoencoder(VAE)--MNIST和自己数据集

Semi-supervised Segmentation of Optic Cup in Retinal Fundus Images Using Variational Autoencoder 论文笔