ValueError:输入包含 NaN、无穷大或对于 dtype 来说太大的值
Posted
技术标签:
【中文标题】ValueError:输入包含 NaN、无穷大或对于 dtype 来说太大的值【英文标题】:ValueError: Input contains NaN, infinity or a value too large for dtype 【发布时间】:2019-10-12 01:39:27 【问题描述】:谁能帮我做模型?
我已经执行了 EDA 和数据清理,但是当我进行模型预测时,我得到了以下错误。这发生在 Lasso 和线性回归中
ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。
任何人都可以帮我解决正在发生的事情吗?如何解决?
尝试删除 NaN 值,但仍然无法解决问题。
#!/usr/bin/env python
# coding: utf-8
# In[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# In[2]:
df = pd.read_csv('train.csv')
# In[3]:
display(df)
# In[4]:
df.describe()
# In[5]:
df.count().plot(kind='barh',figsize=(10,50))
# In[6]:
df.shape
# In[7]:
#Completing the variables with NaN values
df.columns
df.describe()
# In[8]:
df['MiscFeature']= df['MiscFeature'].fillna('None')
df['Fence'] = df['Fence'].fillna('None')
df['PoolQC'] = df['PoolQC'].fillna('None')
df['GarageCond'] = df['GarageCond'].fillna('None')
df['GarageQual'] = df['GarageQual'].fillna('None')
df['GarageFinish'] = df['GarageFinish'].fillna('None')
df['GarageYrBlt'] = df['GarageYrBlt'].fillna('None')
df['GarageQual'] = df['GarageQual'].fillna('None')
df['GarageType'] = df['GarageType'].fillna('None')
df['FireplaceQu'] = df['FireplaceQu'].fillna('None')
df['YearBuilt'] = df['YearBuilt'].fillna('None')
df['YearRemodAdd'] = df['YearRemodAdd'].fillna('None')
df['BsmtFinType2'] = df['BsmtFinType2'].fillna('None')
df['BsmtFinType1'] = df['BsmtFinType1'].fillna('None')
df['BsmtFinType2'] = df['BsmtFinType2'].fillna('None')
df['BsmtExposure'] = df['BsmtExposure'].fillna('None')
df['BsmtQual'] = df['BsmtQual'].fillna('None')
df['BsmtCond'] = df['BsmtCond'].fillna('None')
df['MasVnrType'] = df['MasVnrType'].fillna('None')
df['Alley'] = df['Alley'].fillna('None')
# In[9]:
df['MSSubClass'] = df['MSSubClass'].astype(str)
df['MSZoning'] = df['MSZoning'].astype(str)
# In[10]:
df['LotFrontage'] = df['LotFrontage'].fillna(df['LotFrontage'].mode()[0])
# In[11]:
df['BsmtFinSF1'] = df['BsmtFinSF1'].fillna(0)
df['BsmtFinSF2'] = df['BsmtFinSF2'].fillna(0)
df['BsmtUnfSF'] = df['BsmtUnfSF'].fillna(0)
df['MasVnrArea'] = df['MasVnrArea'].fillna(0)
# In[12]:
df.count().plot(kind='barh',figsize=(15,15))
# In[13]:
plt.figure(figsize=(30,15))
sns.heatmap(df.corr(), annot = True)
# In[14]:
df.corr()
# In[15]:
x = pd.get_dummies(df.drop("SalePrice", axis=1))
y = df['SalePrice']
# In[16]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)
# In[17]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
# In[18]:
model.fit(X_train,y_train)
# In[19]:
model.score(X_train,y_train)
# In[20]:
result = model.predict(X_test)
# In[21]:
from sklearn import metrics
# In[22]:
print(np.sqrt(metrics.mean_squared_error(y_test,result)))
print(np.mean(y_test))
print(metrics.mean_absolute_error(y_test,result))
# In[23]:
compare = pd.DataFrame(result,y_test)
# In[24]:
error = np.mean(y_test-result)
print(error)
# In[25]:
#Now applying same model to test set
# In[26]:
dftest = pd.read_csv('test.csv')
# In[27]:
dftest.count().plot(kind='barh',figsize=(10,50))
# In[28]:
#Completing the variables with NaN values
dftest.describe()
# In[29]:
dftest['MiscFeature']= dftest['MiscFeature'].fillna('None')
dftest['Fence'] = dftest['Fence'].fillna('None')
dftest['PoolQC'] = dftest['PoolQC'].fillna('None')
dftest['GarageCond'] = dftest['GarageCond'].fillna('None')
dftest['GarageQual'] = dftest['GarageQual'].fillna('None')
dftest['GarageFinish'] = dftest['GarageFinish'].fillna('None')
dftest['GarageYrBlt'] = dftest['GarageYrBlt'].fillna('None')
dftest['GarageQual'] = dftest['GarageQual'].fillna('None')
dftest['GarageType'] = dftest['GarageType'].fillna('None')
dftest['FireplaceQu'] = dftest['FireplaceQu'].fillna('None')
dftest['YearBuilt'] = dftest['YearBuilt'].fillna('None')
dftest['YearRemodAdd'] = dftest['YearRemodAdd'].fillna('None')
dftest['BsmtFinType2'] = dftest['BsmtFinType2'].fillna('None')
dftest['BsmtFinType1'] = dftest['BsmtFinType1'].fillna('None')
dftest['BsmtFinType2'] = dftest['BsmtFinType2'].fillna('None')
dftest['BsmtExposure'] = dftest['BsmtExposure'].fillna('None')
dftest['BsmtQual'] = dftest['BsmtQual'].fillna('None')
dftest['BsmtCond'] = dftest['BsmtCond'].fillna('None')
dftest['MasVnrType'] = dftest['MasVnrType'].fillna('None')
dftest['Alley'] = dftest['Alley'].fillna('None')
# In[30]:
dftest['MSSubClass'] = dftest['MSSubClass'].astype(str)
dftest['MSZoning'] = dftest['MSZoning'].astype(str)
# In[31]:
dftest['LotFrontage'] = dftest['LotFrontage'].fillna(dftest['LotFrontage'].mode()[0])
# In[32]:
dftest['BsmtFinSF1'] = dftest['BsmtFinSF1'].fillna(0)
dftest['BsmtFinSF2'] = dftest['BsmtFinSF2'].fillna(0)
dftest['BsmtUnfSF'] = dftest['BsmtUnfSF'].fillna(0)
dftest['MasVnrArea'] = dftest['MasVnrArea'].fillna(0)
# In[33]:
#Confirming that all values are filled, no NaN values
dftest.count().plot(kind='barh',figsize=(15,15))
# In[34]:
xtest = pd.get_dummies(dftest)
# In[35]:
testresult = model.predict(xtest)
# In[ ]:
【问题讨论】:
代码太长,无法发布,但在我上传的笔记本文件中。 【参考方案1】:如果您需要从数据框中删除 None/Nan 值,则您无法训练/测试您的模型
在 pandas 中使用 dropna 方法(它返回一个新的数据帧,不影响原来的数据帧)
df=df.dropna(YOUR_DATA_FRAME)
有关更多信息,请查看 pandas 的文档link
【讨论】:
非常感谢!您知道除了 NaN 值之外是否还有其他原因导致该错误? 有 3 种情况 Nan,infinity 和大于 64 字节的值,如果你想用特定值替换 Nan 值,你可以使用 pd.DataFrame(X).fillna() 它会用零填充以上是关于ValueError:输入包含 NaN、无穷大或对于 dtype 来说太大的值的主要内容,如果未能解决你的问题,请参考以下文章
如何解决:ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值?
StandardScaler -ValueError:输入包含 NaN、无穷大或对于 dtype('float64')来说太大的值
Python 错误帮助:“ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。”
sklearn错误ValueError:输入包含NaN,无穷大或对于dtype('float64')来说太大的值
ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。随机森林运行
ValueError:使用 KNeighborsRegressor 的拟合,输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值