如何在 LGBM 中编写自定义损失函数？

Posted 2023-03-28

技术标签:

【中文标题】如何在 LGBM 中编写自定义损失函数？【英文标题】：How to write a custom loss function in LGBM? 【发布时间】：2021-03-22 22:02:16 【问题描述】：

我在 Keras 中有一个二进制交叉熵实现。我想在 LGBM 中实现相同的自定义损失。现在我知道 LGBM 当然内置了“二进制”目标，但我想自己实现这个定制的目标，作为未来增强功能的启动器。

这是代码，

def custom_binary_loss(y_true, y_pred): 
    """
    Keras version of binary cross-entropy (works like charm!)
    """
    # https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/backend.py#L4826
    y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
    
    term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())  # Cancels out when target is 1 
    term_1 = y_true * K.log(y_pred + K.epsilon()) # Cancels out when target is 0

    return -K.mean(term_0 + term_1, axis=1)

# --------------------
def custom_binary_loss_lgbm(y_pred, train_data):
    """
    LGBM version of binary cross-entropy
    """
    y_pred = 1.0 / (1.0 + np.exp(-y_pred))

    y_true = train_data.get_label()
    y_true = np.expand_dims(y_true, axis=1)
    y_pred = np.expand_dims(y_pred, axis=1)
    
    epsilon_ = 1e-7
    y_pred = np.clip(y_pred, epsilon_, 1 - epsilon_)

    term_0 = (1 - y_true) * np.log(1 - y_pred + epsilon_)   # Cancels out when target is 1 
    term_1 = y_true * np.log(y_pred + epsilon_)  # Cancels out when target is 0

    grad = -np.mean(term_0 + term_1, axis=1)
    hess = np.ones(grad.shape)
    return grad, hess

但是使用上面我的 LGBM 模型只能预测零。现在我的数据集是平衡的，一切看起来都很酷，那么这里的错误是什么？

params = 
    'objective': 'binary',
    'num_iterations': 100,
    'seed': 21

ds_train = lgb.Dataset(df_train[predictors], y, free_raw_data=False)
reg_lgbm = lgb.train(params=params, train_set=ds_train, fobj=custom_binary_loss_lgbm)

我还尝试了不同的粗麻布hess = (y_pred * (1. - y_pred)).flatten()。虽然我不知道 hessian 到底是什么意思，但也没有用！

list(map(lambda x: 1.0 / (1.0 + np.exp(-x)), reg_lgbm.predict(df_train[predictors])))

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,  .............]

【问题讨论】：

【参考方案1】：

尝试将metric参数设置为params中的字符串"None"，像这样：

params = 
    'objective': 'binary',
    'metric': 'None',
    'num_iterations': 100,
    'seed': 21

否则，根据文档，该算法会为objective 选择默认评估方法设置为'binary'

【讨论】：

以上是关于如何在 LGBM 中编写自定义损失函数？的主要内容，如果未能解决你的问题，请参考以下文章