Keras中add_loss函数的作用是什么?

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Keras中add_loss函数的作用是什么?相关的知识,希望对你有一定的参考价值。

[目前,我偶然发现了各种变型自动编码器,并试图使它们使用keras在MNIST上工作。我在github上找到了一个教程。

我的问题涉及以下代码行:

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)

# Compile
vae.add_loss(vae_loss)
vae.compile(optimizer='rmsprop')

为什么使用add_loss而不是将其指定为编译选项?像vae.compile(optimizer='rmsprop', loss=vae_loss)之类的东西似乎不起作用,并引发以下错误:

ValueError: The model cannot be compiled because it has no loss to optimize.

此函数和自定义损失函数之间的区别是什么,我可以将其添加为Model.fit()的参数?

谢谢!

P.S .:我知道github上有几个与此有关的问题,但是其中大多数是开放的,没有评论。如果此问题已解决,请共享链接!


编辑1

我删除了将损失添加到模型的行,并使用了compile函数的loss参数。现在看起来像这样:

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)

# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)

这将引发TypeError:

TypeError: Using a 'tf.Tensor' as a Python 'bool' is not allowed. Use 'if t is not None:' instead of 'if t:' to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

编辑2

由于@MarioZ的努力,我得以找出解决方法。

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss in separate function
def vae_loss(x, x_decoded_mean):
    xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
    kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
    vae_loss = K.mean(xent_loss + kl_loss)
    return vae_loss

# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)

...

vae.fit(x_train, 
    x_train,        # <-- did not need this previously
    shuffle=True,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(x_test, x_test))     # <-- worked with (x_test, None) before

出于某些奇怪的原因,在拟合模型时,我必须明确指定y和y_test。最初,我不需要这样做。产生的样本对我来说似乎很合理。

尽管我可以解决此问题,但我仍然不知道这两种方法的区别和缺点(除了需要不同的语法外)。有人可以给我更多的见解吗?

答案

我将尝试回答为什么使用model.add_loss()而不是为model.compile(loss=...)指定自定义损失函数的原始问题。

Keras中的所有损失函数始终采用两个参数y_truey_pred。看一下Keras中可用的各种标准损失函数的定义,它们都有这两个参数。它们是“目标”(许多教科书中的Y变量)和模型的实际输出。大多数标准损失函数可以写为这两个张量的表达式。但是,不能以这种方式写出一些更复杂的损失。对于您的VAE示例,就是这种情况,因为损失函数还取决于附加张量,即z_log_varz_mean,这些张量对于损失函数不可用。使用model.add_loss()没有这样的限制,并且允许您编写依赖于许多其他张量的更复杂的损耗,但是它具有更依赖于模型的不便,而标准损耗函数仅适用于任何模型。

((注:此处其他答案中提出的代码在某种程度上是作弊的,因为它们只是使用全局变量来潜入所需的其他依赖项。这使得损失函数在数学上不是真正的函数。我认为干净得多的代码,我希望它更容易出错。)

另一答案

JIH的答案当然是正确的,但添加它也许很有用:

model.add_loss()没有任何限制,但也使使用model.fit()中的目标变得很不方便。

如果损失取决于模型,其他模型或外部变量的其他参数,则仍可以通过在其中传递所有其他参数的封装函数来使用Keras类型的封装损失函数:

def loss_carrier(extra_param1, extra_param2):
    def loss(y_true, y_pred):
        #x = complicated math involving extra_param1, extraparam2, y_true, y_pred
        #remember to use tensor objects, so for example keras.sum, keras.square, keras.mean
        #also remember that if extra_param1, extra_maram2 are variable tensors instead of simple floats,
        #you need to have them defined as inputs=(main,extra_param1, extraparam2) in your keras.model instantiation.
        #and have them defind as keras.Input or tf.placeholder with the right shape.
        return x
    return loss

model.compile(optimizer='adam', loss=loss_carrier)

诀窍是您返回函数的最后一行,因为Keras期望它们仅带有两个参数y_truey_pred

可能看起来比model.add_loss版本复杂,但是损失仍然是模块化的。

另一答案

尝试一下:

import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
from scipy import stats
import tensorflow as tf
import seaborn as sns
from pylab import rcParams
from sklearn.model_selection import train_test_split
from keras.models import Model, load_model, Sequential
from keras.layers import Input, Lambda, Dense, Dropout, Layer, Bidirectional, Embedding, Lambda, LSTM, RepeatVector, TimeDistributed, BatchNormalization, Activation, Merge
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras import regularizers
from keras import backend as K
from keras import metrics
from scipy.stats import norm
from keras.utils import to_categorical
from keras import initializers
bias = bias_initializer='zeros'

from keras import objectives




np.random.seed(22)



data1 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')

data2 = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')


data3 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')

#train = np.zeros(shape=(992,54))
#test = np.zeros(shape=(921,54))

train = np.zeros(shape=(300,54))
test = np.zeros(shape=(300,54))

for n, i in enumerate(train):
    if (n<=100):
        train[n] = data1
    elif (n>100 and n<=200):
        train[n] = data2
    elif(n>200):
        train[n] = data3


for n, i in enumerate(test):
    if (n<=100):
        test[n] = data1
    elif(n>100 and n<=200):
        test[n] = data2
    elif(n>200):
        test[n] = data3


batch_size = 5
original_dim = train.shape[1]

intermediate_dim45 = 45
intermediate_dim35 = 35
intermediate_dim25 = 25
intermediate_dim15 = 15
intermediate_dim10 = 10
intermediate_dim5 = 5
latent_dim = 3
epochs = 50
epsilon_std = 1.0

def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim), mean=0.,
                              stddev=epsilon_std)
    return z_mean + K.exp(z_log_var / 2) * epsilon

x = Input(shape=(original_dim,), name = 'first_input_mario')

h1 = Dense(intermediate_dim45, activation='relu', name='h1')(x)
hD = Dropout(0.5)(h1)
h2 = Dense(intermediate_dim25, activation='relu', name='h2')(hD)
h3 = Dense(intermediate_dim10, activation='relu', name='h3')(h2)
h = Dense(intermediate_dim5, activation='relu', name='h')(h3) #bilo je relu
h = Dropout(0.1)(h)

z_mean = Dense(latent_dim, activation='relu')(h)
z_log_var = Dense(latent_dim, activation='relu')(h)

z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])

decoder_h = Dense(latent_dim, activation='relu')
decoder_h1 = Dense(intermediate_dim5, activation='relu')
decoder_h2 = Dense(intermediate_dim10, activation='relu')
decoder_h3 = Dense(intermediate_dim25, activation='relu')
decoder_h4 = Dense(intermediate_dim45, activation='relu')

decoder_mean = Dense(original_dim, activation='sigmoid')


h_decoded = decoder_h(z)
h_decoded1 = decoder_h1(h_decoded)
h_decoded2 = decoder_h2(h_decoded1)
h_decoded3 = decoder_h3(h_decoded2)
h_decoded4 = decoder_h4(h_decoded3)

x_decoded_mean = decoder_mean(h_decoded4)

vae = Model(x, x_decoded_mean)


def vae_loss(x, x_decoded_mean):
    xent_loss = objectives.binary_crossentropy(x, x_decoded_mean)
    kl_loss = -0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var))
    loss = xent_loss + kl_loss
    return loss

vae.compile(optimizer='rmsprop', loss=vae_loss)

vae.fit(train, train, batch_size = batch_size, epochs=epochs, shuffle=True,
        validation_data=(test, test))


vae = Model(x, x_decoded_mean)

encoder = Model(x, z_mean)

decoder_input = Input(shape=(latent_dim,))

_h_decoded = decoder_h  (decoder_input)
_h_decoded1 = decoder_h1  (_h_decoded)
_h_decoded2 = decoder_h2  (_h_decoded1)
_h_decoded3 = decoder_h3  (_h_decoded2)
_h_decoded4 = decoder_h4  (_h_decoded3)

_x_decoded_mean = decoder_mean(_h_decoded4)
generator = Model(decoder_input, _x_decoded_mean)
generator.summary()
另一答案

您需要将编译行更改为vae.compile(optimizer ='rmsprop',loss = vae_loss)

以上是关于Keras中add_loss函数的作用是什么?的主要内容,如果未能解决你的问题,请参考以下文章

自定义 Keras binary_crossentropy 损失函数不起作用

keras 中的 preprocess_input() 方法

Keras 中 TimeDistributed 层的作用是啥?

什么时候在keras的源代码中调用了Layer.build()?

Keras 预测不会在 celery 任务中返回

keras 的指标返回啥值?标量还是张量?