ValueError:输入包含 NaN、无穷大或对于 dtype 来说太大的值

Posted

技术标签:

【中文标题】ValueError:输入包含 NaN、无穷大或对于 dtype 来说太大的值【英文标题】:ValueError: Input contains NaN, infinity or a value too large for dtype 【发布时间】:2019-10-12 01:39:27 【问题描述】:

谁能帮我做模型?

我已经执行了 EDA 和数据清理,但是当我进行模型预测时,我得到了以下错误。这发生在 Lasso 和线性回归中

ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。

任何人都可以帮我解决正在发生的事情吗?如何解决?

尝试删除 NaN 值,但仍然无法解决问题。

#!/usr/bin/env python
# coding: utf-8

# In[1]:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# In[2]:


df = pd.read_csv('train.csv')


# In[3]:


display(df)


# In[4]:


df.describe()


# In[5]:


df.count().plot(kind='barh',figsize=(10,50))


# In[6]:


df.shape


# In[7]:


#Completing the variables with NaN values
df.columns
df.describe()


# In[8]:


df['MiscFeature']= df['MiscFeature'].fillna('None')
df['Fence'] = df['Fence'].fillna('None')
df['PoolQC'] = df['PoolQC'].fillna('None')
df['GarageCond'] = df['GarageCond'].fillna('None')
df['GarageQual'] = df['GarageQual'].fillna('None')
df['GarageFinish'] = df['GarageFinish'].fillna('None')
df['GarageYrBlt'] = df['GarageYrBlt'].fillna('None')
df['GarageQual'] = df['GarageQual'].fillna('None')
df['GarageType'] = df['GarageType'].fillna('None')
df['FireplaceQu'] = df['FireplaceQu'].fillna('None')
df['YearBuilt'] = df['YearBuilt'].fillna('None')
df['YearRemodAdd'] = df['YearRemodAdd'].fillna('None')
df['BsmtFinType2'] = df['BsmtFinType2'].fillna('None')
df['BsmtFinType1'] = df['BsmtFinType1'].fillna('None')
df['BsmtFinType2'] = df['BsmtFinType2'].fillna('None')
df['BsmtExposure'] = df['BsmtExposure'].fillna('None')
df['BsmtQual'] = df['BsmtQual'].fillna('None')
df['BsmtCond'] = df['BsmtCond'].fillna('None')
df['MasVnrType'] = df['MasVnrType'].fillna('None')
df['Alley'] = df['Alley'].fillna('None')


# In[9]:


df['MSSubClass'] = df['MSSubClass'].astype(str)
df['MSZoning'] = df['MSZoning'].astype(str)


# In[10]:


df['LotFrontage'] = df['LotFrontage'].fillna(df['LotFrontage'].mode()[0])


# In[11]:


df['BsmtFinSF1'] = df['BsmtFinSF1'].fillna(0)
df['BsmtFinSF2'] = df['BsmtFinSF2'].fillna(0)
df['BsmtUnfSF'] = df['BsmtUnfSF'].fillna(0)
df['MasVnrArea'] = df['MasVnrArea'].fillna(0)


# In[12]:


df.count().plot(kind='barh',figsize=(15,15))


# In[13]:


plt.figure(figsize=(30,15))
sns.heatmap(df.corr(), annot = True)


# In[14]:


df.corr()


# In[15]:


x = pd.get_dummies(df.drop("SalePrice", axis=1))
y = df['SalePrice']


# In[16]:


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)


# In[17]:


from sklearn.linear_model import LinearRegression
model = LinearRegression()


# In[18]:


model.fit(X_train,y_train)


# In[19]:


model.score(X_train,y_train)


# In[20]:


result = model.predict(X_test)


# In[21]:


from sklearn import metrics


# In[22]:


print(np.sqrt(metrics.mean_squared_error(y_test,result)))
print(np.mean(y_test))
print(metrics.mean_absolute_error(y_test,result))


# In[23]:


compare = pd.DataFrame(result,y_test)


# In[24]:


error = np.mean(y_test-result)
print(error)


# In[25]:


#Now applying same model to test set


# In[26]:


dftest = pd.read_csv('test.csv')


# In[27]:


dftest.count().plot(kind='barh',figsize=(10,50))


# In[28]:


#Completing the variables with NaN values
dftest.describe()


# In[29]:


dftest['MiscFeature']= dftest['MiscFeature'].fillna('None')
dftest['Fence'] = dftest['Fence'].fillna('None')
dftest['PoolQC'] = dftest['PoolQC'].fillna('None')
dftest['GarageCond'] = dftest['GarageCond'].fillna('None')
dftest['GarageQual'] = dftest['GarageQual'].fillna('None')
dftest['GarageFinish'] = dftest['GarageFinish'].fillna('None')
dftest['GarageYrBlt'] = dftest['GarageYrBlt'].fillna('None')
dftest['GarageQual'] = dftest['GarageQual'].fillna('None')
dftest['GarageType'] = dftest['GarageType'].fillna('None')
dftest['FireplaceQu'] = dftest['FireplaceQu'].fillna('None')
dftest['YearBuilt'] = dftest['YearBuilt'].fillna('None')
dftest['YearRemodAdd'] = dftest['YearRemodAdd'].fillna('None')
dftest['BsmtFinType2'] = dftest['BsmtFinType2'].fillna('None')
dftest['BsmtFinType1'] = dftest['BsmtFinType1'].fillna('None')
dftest['BsmtFinType2'] = dftest['BsmtFinType2'].fillna('None')
dftest['BsmtExposure'] = dftest['BsmtExposure'].fillna('None')
dftest['BsmtQual'] = dftest['BsmtQual'].fillna('None')
dftest['BsmtCond'] = dftest['BsmtCond'].fillna('None')
dftest['MasVnrType'] = dftest['MasVnrType'].fillna('None')
dftest['Alley'] = dftest['Alley'].fillna('None')


# In[30]:


dftest['MSSubClass'] = dftest['MSSubClass'].astype(str)
dftest['MSZoning'] = dftest['MSZoning'].astype(str)


# In[31]:


dftest['LotFrontage'] = dftest['LotFrontage'].fillna(dftest['LotFrontage'].mode()[0])


# In[32]:


dftest['BsmtFinSF1'] = dftest['BsmtFinSF1'].fillna(0)
dftest['BsmtFinSF2'] = dftest['BsmtFinSF2'].fillna(0)
dftest['BsmtUnfSF'] = dftest['BsmtUnfSF'].fillna(0)
dftest['MasVnrArea'] = dftest['MasVnrArea'].fillna(0)


# In[33]:


#Confirming that all values are filled, no NaN values
dftest.count().plot(kind='barh',figsize=(15,15))


# In[34]:


xtest = pd.get_dummies(dftest)


# In[35]:


testresult = model.predict(xtest)


# In[ ]:

【问题讨论】:

代码太长,无法发布,但在我上传的笔记本文件中。 【参考方案1】:

如果您需要从数据框中删除 None/Nan 值,则您无法训练/测试您的模型

在 pandas 中使用 dropna 方法(它返回一个新的数据帧,不影响原来的数据帧)

df=df.dropna(YOUR_DATA_FRAME)

有关更多信息,请查看 pandas 的文档link

【讨论】:

非常感谢!您知道除了 NaN 值之外是否还有其他原因导致该错误? 有 3 种情况 Nan,infinity 和大于 64 字节的值,如果你想用特定值替换 Nan 值,你可以使用 pd.DataFrame(X).fillna() 它会用零填充

以上是关于ValueError:输入包含 NaN、无穷大或对于 dtype 来说太大的值的主要内容,如果未能解决你的问题,请参考以下文章

如何解决:ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值?

StandardScaler -ValueError:输入包含 NaN、无穷大或对于 dtype('float64')来说太大的值

Python 错误帮助:“ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。”

sklearn错误ValueError:输入包含NaN,无穷大或对于dtype('float64')来说太大的值

ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。随机森林运行

ValueError:使用 KNeighborsRegressor 的拟合,输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值