Python:float() 参数必须是字符串或数字,而不是 'pandas._libs.interval.Interval'

Posted

技术标签:

【中文标题】Python:float() 参数必须是字符串或数字,而不是 \'pandas._libs.interval.Interval\'【英文标题】:Python: float() argument must be a string or a number, not 'pandas._libs.interval.Interval'Python:float() 参数必须是字符串或数字,而不是 'pandas._libs.interval.Interval' 【发布时间】:2019-09-24 20:37:02 【问题描述】:

我正在尝试解决来自 Analytics Vidhya 的贷款预测的机器学习实践问题。当我使用随机森林分类器时,它显示:

TypeError:float() 参数必须是字符串或数字,而不是 'pandas._libs.interval.Interval'

代码:

train['Loan_Status']=np.where(train['Loan_Status']=='Y', 1,0)

train_test_data=[train,test]

#Gender Feature
for dataset in train_test_data:
  dataset["Gender"]=dataset["Gender"].fillna('Male')
for dataset in train_test_data:
  dataset["Gender"]=dataset["Gender"].map( "Female" : 1 , "Male" : 0).astype(int)

#Married Feature 
for dataset in train_test_data:
  dataset['Married']=dataset['Married'].fillna('Yes')
for dataset in train_test_data:
  dataset['Married']=dataset['Married'].map("Yes" : 1 , "No" : 0).astype(int)

#Education Feature
for dataset in train_test_data:
  dataset['Education']=dataset['Education'].map('Graduate' : 1 , 'Not Graduate' : 0).astype(int)

#Combine Applicant income and coapplicant income
for dataset in train_test_data:
dataset['Income']=dataset['ApplicantIncome']+dataset['CoapplicantIncome']
train['IncomeBand']= pd.cut(train['Income'] , 4)
print(train[['IncomeBand' , 'Loan_Status']].groupby(['IncomeBand'] , as_index=False).mean())

for dataset in train_test_data:
  dataset.loc[dataset['Income'] <= 21331.5, 'Income'] =0
  dataset.loc[(dataset['Income'] > 21331.5) & (dataset['Income'] <= 41221.0), 'Income'] =1
  dataset.loc[(dataset['Income'] > 41221.0) & (dataset['Income'] <= 61110.5), 'Income'] =2
  dataset.loc[dataset['Income'] > 61110.5, 'Income'] =3
  dataset['Income']=dataset['Income'].astype(int)

# Loan Amount Feature
fillin=train.LoanAmount.median()
for dataset in train_test_data:
  dataset['LoanAmount']=dataset['LoanAmount'].fillna(fillin)
train['LoanAmountBand']=pd.cut(train['LoanAmount'] , 4)
print(train[['LoanAmountBand' , 'Loan_Status']].groupby(['LoanAmountBand'] , as_index=False).mean())

for dataset in train_test_data:
  dataset.loc[dataset['LoanAmount'] <= 181.75, 'LoanAmount'] =0
  dataset.loc[(dataset['LoanAmount'] >181.75) & (dataset['LoanAmount'] <= 354.5), 'LoanAmount'] =1
  dataset.loc[(dataset['LoanAmount'] > 354.5) & (dataset['LoanAmount'] <= 527.25), 'LoanAmount'] =2
  dataset.loc[dataset['LoanAmount'] > 527.25, 'LoanAmount'] =3
  dataset['LoanAmount']=dataset['LoanAmount'].astype(int)

#Loan Amount Term Feature
for dataset in train_test_data:
       dataset['Loan_Amount_Term']=dataset['Loan_Amount_Term'].fillna(360.0)

Loan_Amount_Term_mapping=360.0 : 1 , 180.0 : 2 , 480.0 : 3 , 300.0 : 4 , 84.0 : 5 , 240.0 : 6, 120.0 :7 , 36.0:8 , 60.0 : 9, 12.0 :10

for dataset in train_test_data:
              dataset['Loan_Amount_Term']=dataset['Loan_Amount_Term'].map(Loan_Amount_Term_mapping)

# Credit History Feature
for dataset in train_test_data:
  dataset['Credit_History']=dataset['Credit_History'].fillna(2)

# Property Area Feature
for dataset in train_test_data:
 dataset['Property_Area']=dataset['Property_Area'].map('Semiurban' : 0 , 'Urban' : 1 , 'Rural' : 2).astype(int)

# Feature Selection
features_drop=['Self_Employed' , 'ApplicantIncome' , 'CoapplicantIncome', 'Dependents']
train=train.drop(features_drop, axis=1)
test=test.drop(features_drop, axis=1)
train.drop(['Loan_ID' , 'IncomeBand' , 'LoanAmountBand'] , axis=1)

X_train=train.drop('Loan_Status' , axis=1)
y_train=train['Loan_Status']
X_test=test.drop('Loan_ID' , axis=1).copy()

X_train.shape , y_train.shape , X_test.shape

clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
y_pred_random_forest = clf.predict(X_test)
acc_random_forest = round(clf.score(X_train, y_train) * 100, 2)
print (acc_random_forest)

我不明白浮动错误是从哪里来的。任何建议都非常感谢。

【问题讨论】:

尝试运行X.train.dtypes 并在此处添加结果。如果您将代码最小化以说明问题,这也会有所帮助,请参阅minimal reproducible example。 浮动错误发生在哪一行? @doctorlove clf.fit(X_train, y_train) 上发生浮动错误 @Shaido 谢谢你的建议。我添加了 X_train 的结果。类型 【参考方案1】:

问题是具有类别数据类型的列。 例如,这些可以使用pd.cut 函数创建。随机森林分类器无法将这些作为输入,因此您需要将它们转换为数字。

这可以通过使用cat.codes 轻松完成。

在上面的代码中,IncomeBandLoanAmountBand这两列需要从类别改为数字:

train['IncomeBand']= pd.cut(train['Income'] , 4).cat.codes
train['LoanAmountBand']=pd.cut(train['LoanAmount'] , 4).cat.codes

【讨论】:

感谢错误已修复,但现在我收到此错误:无法将字符串转换为浮点数:'LP002990'。 LP002990 是训练数据集中的 Loan_ID。我已经在上面的代码中删除了 Loan_ID。你有什么建议吗? @ALT:从您的X_train.dtypes 您可以看到该列仍然存在。调用drop时可以加inplace=True,否则需要给输出5he赋值一个新变量。

以上是关于Python:float() 参数必须是字符串或数字,而不是 'pandas._libs.interval.Interval'的主要内容,如果未能解决你的问题,请参考以下文章

Python:float() 参数必须是字符串或数字,而不是 'pandas._libs.interval.Interval'

python 内置函数

Python函数

Python内置函数简记

python内置函数大全

内置函数大全