Keras 中具有样本权重的自定义损失函数
Posted
技术标签:
【中文标题】Keras 中具有样本权重的自定义损失函数【英文标题】:Custom Loss Function in Keras with Sample Weights 【发布时间】:2021-12-15 15:07:57 【问题描述】:我是 Tensorflow 和 Keras 的新手。我想在自定义损失函数中使用样本权重。
如果我理解正确,这篇文章 (Custom loss function with weights in Keras) 建议将权重作为网络的输入。 还有这个: Custom weighted loss function in Keras for weighing each element
我想知道我是否遗漏了什么(我也不想将权重定义为全局变量)。我也有点惊讶没有直接使用它的方法,因为 Loss 类 _ _ call _ _ 方法接受 sample_weight 作为参数,但如果我理解正确,损失函数必须只有参数 y_true 和 y_pred。
然而,从文档 (https://keras.io/api/losses/#creating-custom-losses):
创建自定义损失 任何带有签名 loss_fn(y_true, y_pred) 且返回损失数组(输入批次中的样本之一)的可调用函数都可以作为损失传递给 compile()。请注意,任何此类损失都会自动支持样本加权。
听起来应该可以通过 model.fit(..., sample_weight=sample_weight) 方法使用样本加权。
在这篇文章中(Should the custom loss function in Keras return a single loss value for the batch or an arrary of losses for every sample in the training batch? ) 关于损失函数的输出大小有一个冗长的讨论。
最后还提到,当创建自定义损失函数时,应该返回一组损失(单个样本损失)。它们的减少由框架处理。
在我看来,如果 custom_loss(y_true, y_pred) 返回一个大小为 (batch_size, ) 的张量,那么应该能够在 fit 方法中使用 sample_weight。我错过了什么?
非常感谢您的帮助!
代码sn-ps:
class NegLogLikMixedGaussian(Loss):
"""
Negative Log-Likelihood of Mixed Gaussian with:
num_components: number of components
mu: means of the Gaussian components
sg: standard deviations of the Gaussian components
"""
def __init__(self, num_params=NUM_PARAMS_MG,
num_components=2, name='neg_log_lik_mixed_gaussian'):
super(NegLogLikMixedGaussian, self).__init__(name=name)
self.num_params = num_params
self.num_components = num_components
def call(self, y_true, p_predict):
"""
Rem: for MDN the output of the networks are _parameters_ of the
predicted distribution, _not_ point-estimates
Parameters
----------
y_true: (batch_size, 1)
Observed value of the random variable
p_predict: (batch_size, num_components)
Output parameters of the network given some input
Returns
-------
Negative log likelihood of the batch (batch_size, 1)
"""
alpha, mu, sg = tf.split(p_predict,
num_or_size_splits=self.num_params, axis=1)
gm = tfd.MixtureSameFamily(
mixture_distribution=tfd.Categorical(probs=alpha),
components_distribution=tfd.Normal(loc=mu, scale=sg))
log_likelihood = tf.transpose(gm.log_prob(tf.transpose(y_true)))
return -tf.reduce_mean(log_likelihood, axis=-1)
我希望那时能够使用:
model.compile(optimizer=Adam(learning_rate=0.005),
loss=NegLogLikMixedGaussian(
num_components=2, num_params=3))
还有:
# For testing purposes
sample_weight = np.ones(len(y_train)) / len(dh.y_train_scaled) # this should give same results as un-weighted
# Some non-trivial weights
sample_weights = np.zeros(len(y_train))
sample_weights[:5] = 1
# This will give me same results as above
model.fit(x_train, y_train, sample_weight=sample_weight,
batch_size=128, epochs=10)
【问题讨论】:
【参考方案1】:如果我理解你想要做什么,你的代码是正确的,除了一些细节。 样本权重应该是维度(样本数),尽管损失应该是维度(batch_size)。 样本权重可以传递给 fit 方法,它似乎有效。 在您的自定义损失类中,num_components 和 num_params 已初始化,但在 call 方法中仅使用了两个参数之一。 我不确定我是否理解张量的维度(alpha、mu、sg),它是否是模型预测的维度(batch_size、3、num_components)? 根据我对您的问题的理解,以下是根据您的代码改编的代码。
import tensorflow as tf
import numpy as np
from tensorflow.keras.losses import Loss, BinaryCrossentropy
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import Dense, Concatenate
import tensorflow_probability as tfp
tfd = tfp.distributions
# parameters
num_components = 2
num_samples = 1001
num_features = 10
# synthetic data
x_train = np.random.normal(size=(num_samples, num_features))
y_train = np.random.normal(size=(num_samples, 1, num_components))
print(x_train.shape)
print(y_train.shape)
class NegLogLikMixedGaussian(Loss):
"""
Negative Log-Likelihood of Mixed Gaussian with:
num_components: number of components
mu: means of the Gaussian components
sg: standard deviations of the Gaussian components
"""
def __init__(self, num_components=2, name='neg_log_lik_mixed_gaussian'):
super(NegLogLikMixedGaussian, self).__init__(name=name)
self.num_components = num_components
def call(self, y_true, p_predict):
"""
Rem: for MDN the output of the networks are _parameters_ of the
predicted distribution, _not_ point-estimates
Parameters
----------
y_true: (batch_size, 1, num_components)
Observed value of the random variable
p_predict: (batch_size, 3, num_components)
Output parameters of the network given some input
Returns
-------
Negative log likelihood of the batch (batch_size, 1)
"""
alpha, mu, sg = tf.split(p_predict, num_or_size_splits=3, axis=1)
gm = tfd.MixtureSameFamily(
mixture_distribution=tfd.Categorical(probs=alpha),
components_distribution=tfd.Normal(loc=mu, scale=sg))
log_likelihood = gm.log_prob(y_true)
return -tf.reduce_mean(log_likelihood, axis=[1, 2])
# the model (simple predicting (alpha, mu, sigma))
input = Input((num_features,))
alpha = tf.expand_dims(Dense(num_components, 'relu')(input), axis=1)+0.0001
# normalization
alpha = alpha/tf.reduce_sum(alpha, axis=2, keepdims=True)
mu = tf.expand_dims(Dense(num_components)(input), axis=1)
# sg > 0
sg = tf.expand_dims(Dense(num_components, 'relu')(input), axis=1)+ 0.0001
outputs = Concatenate(axis=1)([alpha, mu, sg])
model = Model(inputs=input, outputs=outputs, name='gmm_params')
model.compile(optimizer='adam', loss=NegLogLikMixedGaussian(num_components=num_components), run_eagerly=False)
sample_weight=np.ones((num_samples, ))
sample_weight[500:] = 0.
model.fit(x_train, y_train, batch_size=16, epochs=20, sample_weight=sample_weight)
【讨论】:
嗨易北河,非常感谢,太棒了!我正在查看它。我看到通过改变权重预测是不同的(我在运行比较时修复了种子)。一个后续问题是损失函数的输出大小是多少。它似乎是一个标量,我验证了通过分别计算前两个数据点的损失,将给出与一批前两个数据点相同的损失。 fit 函数如何知道如何称量单个样本?非常感谢! 损失函数的输出是大小(batch_size,)。顺便问一下,你的代码的目标应用是什么? 一个玩具示例,其中输入为标量 (Input(shape=(1, )),输出为双峰分布。再次感谢!以上是关于Keras 中具有样本权重的自定义损失函数的主要内容,如果未能解决你的问题,请参考以下文章
如何在生成器提供的 Keras 自定义损失函数中访问样本权重?