LightGBM 错误:ValueError:对于提前停止,评估需要至少一个数据集和评估指标
Posted
技术标签:
【中文标题】LightGBM 错误:ValueError:对于提前停止,评估需要至少一个数据集和评估指标【英文标题】:LightGBM error : ValueError: For early stopping, at least one dataset and eval metric is required for evaluation 【发布时间】:2020-08-24 21:23:41 【问题描述】:我正在尝试使用 gridsearch 训练 LightGBM,当我尝试训练模型时出现以下错误。
ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
我提供了验证数据集和评估指标。不知道为什么我仍然遇到这个问题。这是我的代码。
train_data = rtotal[rtotal['train_Y'] == 1]
test_data = rtotal[rtotal['train_Y'] == 0]
trainData, validData = train_test_split(train_data, test_size=0.007, random_state = 123)
#train data prep
X_train = trainData.iloc[:,2:71]
y_train = trainData.loc[:,['a_class']]
#validation data prep
X_valid = validData.iloc[:,2:71]
y_valid = validData.loc[:,['a_class']]
#X_test
X_test = test_data.iloc[:,2:71]
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV
gridParams =
'learning_rate': [0.005],
'n_estimators': [40],
'num_leaves': [16,32, 64],
'objective' : ['multiclass'],
'random_state' : [501],
'num_boost_round' : [3000],
'colsample_bytree' : [0.65, 0.66],
'subsample' : [0.7,0.75],
'reg_alpha' : [1,1.2],
'reg_lambda' : [1,1.2,1.4],
lgb_estimator = lgb.LGBMClassifier(boosting_type = 'gbdt',
n_estimators=500,
objective = 'multiclass',
learning_rate = 0.05, num_leaves = 64,
eval_metric = 'multi_logloss',
verbose_eval=20,
eval_set = [X_valid, y_valid],
early_stopping_rounds=100)
g_lgbm = GridSearchCV(estimator=lgb_estimator, param_grid=gridParams, n_jobs = 3, cv= 3)
lgb_model = g_lgbm.fit(X=X_train, y=y_train)
【问题讨论】:
【参考方案1】:从我在提供的代码中看到,您有几个问题:
您将分类定义为多类,但并非完全如此,因为您将输出定义为一列,我相信其中可能有多个标签。
如果您想提前停止,您需要提供验证集,正如错误消息中明确指出的那样。你需要用合适的方法来做。
如果您纠正了这些错误的代码,它会愉快地运行:
gridParams =
'learning_rate': [0.005],
'n_estimators': [40],
'num_leaves': [16,32, 64],
'random_state' : [501],
'num_boost_round' : [3000],
'colsample_bytree' : [0.65, 0.66],
'subsample' : [0.7,0.75],
'reg_alpha' : [1,1.2],
'reg_lambda' : [1,1.2,1.4],
lgb_estimator = lgb.LGBMClassifier(boosting_type = 'gbdt',
n_estimators=500,
learning_rate = 0.05, num_leaves = 64,
eval_metric = 'logloss',
verbose_eval=20,
early_stopping_rounds=10)
g_lgbm = GridSearchCV(estimator=lgb_estimator, param_grid=gridParams, n_jobs = 3, cv= 3)
lgb_model = g_lgbm.fit(X=X_train, y=y_train, eval_set = (X_valid, y_valid))
...
[370] valid_0's binary_logloss: 0.422895
[371] valid_0's binary_logloss: 0.423064
[372] valid_0's binary_logloss: 0.422681
[373] valid_0's binary_logloss: 0.423206
[374] valid_0's binary_logloss: 0.423142
[375] valid_0's binary_logloss: 0.423414
[376] valid_0's binary_logloss: 0.423338
[377] valid_0's binary_logloss: 0.423864
[378] valid_0's binary_logloss: 0.42381
[379] valid_0's binary_logloss: 0.42409
[380] valid_0's binary_logloss: 0.423476
[381] valid_0's binary_logloss: 0.423759
[382] valid_0's binary_logloss: 0.423804
Early stopping, best iteration is:
[372] valid_0's binary_logloss: 0.422681
【讨论】:
以上是关于LightGBM 错误:ValueError:对于提前停止,评估需要至少一个数据集和评估指标的主要内容,如果未能解决你的问题,请参考以下文章
Python 错误帮助:“ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。”
sklearn错误ValueError:输入包含NaN,无穷大或对于dtype('float64')来说太大的值
决策树回归错误-ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 而言太大的值