GridSearchCV上的自定义评分，具有折叠相关参数

Question

The problem

我正在研究一个学习排名问题，其中规范是评估点预测，但是组评估模型性能。

更具体地说，估计器输出一个连续变量（很像一个回归量）

> y = est.predict(X); y
array([71.42857143,  0.        , 71.42857143, ...,  0.        ,
       28.57142857,  0.        ])

但评分函数需要通过查询进行聚合，即分组预测，类似于发送到groups的GridSearchCV参数以尊重折叠分区。

> ltr_score(y_true, y_pred, groups=g)
0.023

The roadblock

到现在为止还挺好。当向GridSearchCV提供自定义评分函数时，事情向南，我不能根据CV折叠动态改变评分函数中的groups参数：

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer

ltr_scorer = make_scorer(ltr_score, groups=g)  # Here's the problem, g is fixed
param_grid = {...}

gcv = GridSearchCV(estimator=est, groups=g, param_grid=param_grid, scoring=ltr_scorer)

解决这个问题最不容易的方法是什么？

One (failed) approach

在similar question，一条评论问/建议：

为什么你不能在本地存储{分组列}并在必要时通过使用分离器提供的列车测试索引进行索引来利用它？

OP回答“似乎可行”。我认为这也是可行的，但无法使其发挥作用。显然，GridSearchCV将首先使用所有交叉验证拆分索引，然后才执行拆分，拟合，预测和scorings。这意味着我不能（似乎）尝试猜测创建当前拆分子选择的原始索引的得分时间。

为了完整起见，我的代码：

class QuerySplitScorer:
    def __init__(self, X, y, groups):
        self._X = np.array(X)
        self._y = np.array(y)
        self._groups = np.array(groups)
        self._splits = None
        self._current_split = None

    def __iter__(self):
        self._splits = iter(GroupShuffleSplit().split(self._X, self._y, self._groups))
        return self

    def __next__(self):
        self._current_split = next(self._splits)
        return self._current_split

    def get_scorer(self):
        def scorer(y_true, y_pred):
            _, test_idx = self._current_split
            return _score(
                y_true=y_true,
                y_pred=y_pred,
                groups=self._groups[test_idx]
            )

用法：

qss = QuerySplitScorer(X, y_true, g)
gcv = GridSearchCV(estimator=est, cv=qss, scoring=qss.get_scorer(), param_grid=param_grid, verbose=1)
gcv.fit(X, y_true)

它不起作用，self._current_split固定在最后生成的分割。