XGBoost 的自定义目标函数,包括外部数据列

Posted

技术标签:

【中文标题】XGBoost 的自定义目标函数,包括外部数据列【英文标题】:Custom objective function for XGBoost including an external data column 【发布时间】:2020-10-09 12:47:38 【问题描述】:

我正在使用 XGBoost 进行销售预测。我需要一个自定义目标函数,因为预测值取决于商品的销售价格。我正在努力将销售价格输入到标签和预测旁边的损失函数中。这是我的方法:

def monetary_value_objective(predt: np.ndarray, dtrain: Union[xgb.DMatrix, np.ndarray]) -> Tuple[np.ndarray, np.ndarray]:
  """
  predt = model prediction
  dtrain = labels 
  Currently, dtrain is a numpy array.
  """

  y = dtrain

  mask1 = predt <= y  # Predict too few
  mask2 = predt > y  # Predict too much

  price = train[0]["salesPrice"]

  grad = price **2 * (predt - y)  
  # Gradient is negative if prediction is too low, and positive if it is too high
  # Here scale it (0.72 = 0.6**2 * 2)
  grad[mask1] = 2 * grad[mask1]
  grad[mask2] = 0.72 * grad[mask2]

  hess = np.empty_like(grad)
  hess[mask1] = 2 * price[mask1]**2
  hess[mask2] = 0.72 * price[mask2]**2

  grad = -grad

  return grad, hess

超参数调优时出现以下错误:

[09:11:35] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
  0%|          | 0/1 [00:00<?, ?it/s, best loss: ?]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-34-2c64dc1b5a76> in <module>()
      1 # set runtime environment to GPU at: Runtime -> Change runtime type
----> 2 trials, best_hyperparams = hyperpara_tuning(para_space)
      3 final_xgb_model = trials.best_trial['result']['model']
      4 assert final_xgb_model is not None, "Oooops there is no model created :O "
      5 

17 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexers.py in check_array_indexer(array, indexer)
    399         if len(indexer) != len(array):
    400             raise IndexError(
--> 401                 f"Boolean index has wrong length: "
    402                 f"len(indexer) instead of len(array)"
    403             )

IndexError: Boolean index has wrong length: 1 instead of 136019

有人知道如何在目标函数中使用销售价格吗?这可能吗?

谢谢!

【问题讨论】:

【参考方案1】:

您可以在自定义目标函数中使用weights 向量,如果您将外部变量编码为权重分布,它可以工作,但我不知道权重本身是否仅用于目标函数本身,或者也可能在级别数据采样,如果是这样你会得到更复杂的情况......

【讨论】:

以上是关于XGBoost 的自定义目标函数,包括外部数据列的主要内容,如果未能解决你的问题,请参考以下文章

xgboost 自定义评价函数(metric)与目标函数

R语言构建xgboost模型:自定义损失函数(目标函数loss functionobject function)评估函数(evaluation function)

XGBoost:在默认指标上提前停止,而不是自定义评估函数

R语言构建xgboost模型使用早停法训练模型(early stopping):自定义损失函数(目标函数,loss function)评估函数(evaluation function)

xgboost和gbdt区别

机器学习集成学习进阶Xgboost算法原理