xgboost: AttributeError: 'DMatrix' 对象没有属性 'handle'
Posted
技术标签:
【中文标题】xgboost: AttributeError: \'DMatrix\' 对象没有属性 \'handle\'【英文标题】:xgboost: AttributeError: 'DMatrix' object has no attribute 'handle'xgboost: AttributeError: 'DMatrix' 对象没有属性 'handle' 【发布时间】:2016-07-04 02:26:42 【问题描述】:这个问题真的很奇怪,因为那部分与其他数据集一起工作得很好。
完整代码:
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.cross_validation import train_test_split
# # Split the Learning Set
X_fit, X_eval, y_fit, y_eval= train_test_split(
train, target, test_size=0.2, random_state=1
)
clf = xgb.XGBClassifier(missing=np.nan, max_depth=6,
n_estimators=5, learning_rate=0.15,
subsample=1, colsample_bytree=0.9, seed=1400)
# fitting
clf.fit(X_fit, y_fit, early_stopping_rounds=50, eval_metric="logloss", eval_set=[(X_eval, y_eval)])
#print y_pred
y_pred= clf.predict_proba(test)[:,1]
最后一行导致以下错误(提供完整输出):
Will train until validation_0 error hasn't decreased in 50 rounds.
[0] validation_0-logloss:0.554366
[1] validation_0-logloss:0.451454
[2] validation_0-logloss:0.372142
[3] validation_0-logloss:0.309450
[4] validation_0-logloss:0.259002
Traceback (most recent call last):
File "../src/script.py", line 57, in
y_pred= clf.predict_proba(test)[:,1]
File "/opt/conda/lib/python3.4/site-packages/xgboost-0.4-py3.4.egg/xgboost/sklearn.py", line 435, in predict_proba
test_dmatrix = DMatrix(data, missing=self.missing)
File "/opt/conda/lib/python3.4/site-packages/xgboost-0.4-py3.4.egg/xgboost/core.py", line 220, in __init__
feature_types)
File "/opt/conda/lib/python3.4/site-packages/xgboost-0.4-py3.4.egg/xgboost/core.py", line 147, in _maybe_pandas_data
raise ValueError('DataFrame.dtypes for data must be int, float or bool')
ValueError: DataFrame.dtypes for data must be int, float or bool
Exception ignored in: >
Traceback (most recent call last):
File "/opt/conda/lib/python3.4/site-packages/xgboost-0.4-py3.4.egg/xgboost/core.py", line 289, in __del__
_check_call(_LIB.XGDMatrixFree(self.handle))
AttributeError: 'DMatrix' object has no attribute 'handle'
这里有什么问题?我不知道如何解决这个问题
UPD1:其实这是 kaggle 问题:https://www.kaggle.com/insaff/bnp-paribas-cardif-claims-management/xgboost
【问题讨论】:
X_fit.dtypes
和 X_eval.dtypes
的输出是什么?
这是为X_fit.dtypes
target int64 v1 float64 v2 float64 v3 int64 v4 float64 ; test
具有偶数对象类型
【参考方案1】:
这里的问题与初始数据有关:一些值是浮点数或整数和一些对象。这就是我们需要强制转换它们的原因:
from sklearn import preprocessing
for f in train.columns:
if train[f].dtype=='object':
lbl = preprocessing.LabelEncoder()
lbl.fit(list(train[f].values))
train[f] = lbl.transform(list(train[f].values))
for f in test.columns:
if test[f].dtype=='object':
lbl = preprocessing.LabelEncoder()
lbl.fit(list(test[f].values))
test[f] = lbl.transform(list(test[f].values))
train.fillna((-999), inplace=True)
test.fillna((-999), inplace=True)
train=np.array(train)
test=np.array(test)
train = train.astype(float)
test = test.astype(float)
【讨论】:
【参考方案2】:您可能还想看看categorical variable
解决方案,如下所示:
for col in train.select_dtypes(include=['object']).columns:
train[col] = train[col].astype('category')
test[col] = test[col].astype('category')
# Encoding categorical features
for col in train.select_dtypes(include=['category']).columns:
train[col] = train[col].cat.codes
test[col] = test[col].cat.codes
train.fillna((-999), inplace=True)
test.fillna((-999), inplace=True)
train=np.array(train)
test=np.array(test)
【讨论】:
哇,谢谢,不知道pandas有这种数据类型以上是关于xgboost: AttributeError: 'DMatrix' 对象没有属性 'handle'的主要内容,如果未能解决你的问题,请参考以下文章
R语言构建xgboost模型:使用GPU加速xgboost模型构建