对于 Keras ANN,Pandas 数据框的形状不正确
Posted
技术标签:
【中文标题】对于 Keras ANN,Pandas 数据框的形状不正确【英文标题】:Pandas dataframe not shaped correctly for Keras ANN 【发布时间】:2019-12-21 03:17:57 【问题描述】:我正在研究基于 this question 的回归神经网络
我的代码如下所示:
import numpy as np
from keras.layers import Dense, Activation
from keras.models import Sequential
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
data = pd.read_csv('binned.csv')
# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']
# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)
data = data.drop('SERIALNUM', axis=1)
print(data)
# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Initialising the ANN
model = Sequential()
# Adding the input layer and the first hidden layer
model.add(Dense(32, activation = 'relu', input_dim = 6))
# Adding the second hidden layer
model.add(Dense(units = 32, activation = 'relu'))
# Adding the third hidden layer
model.add(Dense(units = 32, activation = 'relu'))
# Adding the output layer
model.add(Dense(units = 1))
#model.add(Dense(1))
# Compiling the ANN
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fitting the ANN to the Training set
model.fit(X_train, y_train, batch_size = 10, epochs = 100)
y_pred = model.predict(X_test)
plt.plot(y_test, color = 'red', label = 'Real data')
plt.plot(y_pred, color = 'blue', label = 'Predicted data')
plt.title('Prediction')
plt.legend()
plt.show()
运行时出现错误:
Traceback(最近一次调用最后一次):文件“ann.py”,第 50 行,在 model.fit(X_train[0:1], y_train, batch_size = 10, epochs = 100) 文件 "C:\Python367-64\lib\site-packages\keras\engine\training.py", 第 952 行,合适 batch_size=batch_size) 文件“C:\Python367-64\lib\site-packages\keras\engine\training.py”,行 第751章 exception_prefix='input') 文件 "C:\Python367-64\lib\site-packages\keras\engine\training_utils.py", 第 138 行,在 standardize_input_data str(data_shape)) ValueError: 检查输入时出错:预期dense_1_input 的形状为(6,) 但得到的数组的形状为(24,)
我也用过
# Importing the dataset
data = pd.read_csv('binned.csv')
# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']
# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)
data = data.drop('SERIALNUM', axis=1)
print(data)
# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)
在各种 sklearn
库中运行良好。
我做错了什么?
我的print(data)
看起来像:
(我手动删除了与我的建筑公司有关的知识产权问题的列标题)
0 7 2 3 2 2 1 ... 8 2 2 2 5 1
1 1 3 1 1 1 1 ... 2 1 1 1 1 1
2 2 2 3 1 1 1 ... 6 1 1 5 1 2
3 7 5 1 1 1 1 ... 1 1 1 1 1 1
4 5 6 1 1 1 1 ... 2 1 1 1 2 1
5 5 4 1 3 1 1 ... 8 4 3 2 7 3
6 4 6 3 7 5 1 ... 7 2 2 6 7 2
7 9 10 4 1 3 1 ... 4 1 1 1 8 2
8 4 2 2 1 1 1 ... 2 1 1 1 1 1
9 1 2 1 5 2 3 ... 2 4 3 6 3 3
10 7 9 1 3 2 1 ... 7 1 1 5 8 1
11 8 6 1 1 1 1 ... 1 1 1 1 1 1
12 8 8 2 1 1 2 ... 9 3 5 2 3 1
13 2 3 1 1 1 1 ... 2 2 2 2 1 1
14 2 2 2 1 1 2 ... 2 2 2 1 3 1
15 5 1 2 1 1 2 ... 2 1 1 1 1 2
16 1 2 5 8 7 3 ... 2 4 3 7 7 5
17 6 4 1 3 1 3 ... 9 3 3 1 5 5
18 10 1 1 2 1 2 ... 1 1 1 5 1 3
19 3 3 2 3 2 1 ... 2 1 1 1 1 1
20 6 2 2 7 3 4 ... 7 5 4 3 5 5
21 1 2 1 3 1 2 ... 2 1 1 5 1 2
22 10 4 2 3 2 1 ... 1 2 2 6 3 2
23 3 4 1 1 1 1 ... 1 2 2 1 5 1
24 4 4 4 2 2 1 ... 1 1 1 1 5 1
25 9 8 2 2 2 1 ... 2 1 1 1 7 1
26 1 1 3 3 2 1 ... 2 1 1 5 1 1
27 6 4 3 3 2 3 ... 5 2 2 1 3 2
28 4 7 3 7 5 1 ... 5 3 6 2 5 5
29 5 1 1 2 1 1 ... 1 2 1 2 3 3
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
285 3 3 9 8 9 10 ... 10 10 6 8 3 5
286 4 6 4 7 5 7 ... 7 7 8 3 3 5
287 5 6 5 9 8 9 ... 4 9 9 5 5 5
288 5 5 9 7 9 9 ... 4 8 8 7 5 5
289 4 6 9 9 10 10 ... 10 10 9 8 5 5
290 10 9 6 5 7 8 ... 2 7 6 7 3 5
291 4 9 9 2 7 5 ... 7 3 8 9 8 5
292 7 9 8 8 9 8 ... 10 9 10 10 8 5
293 9 10 6 9 9 10 ... 8 10 10 10 8 5
294 5 9 8 9 10 9 ... 6 10 10 10 8 5
295 5 10 8 8 9 9 ... 10 5 9 9 8 5
296 6 9 8 8 9 9 ... 6 8 10 9 8 5
297 1 10 8 9 10 9 ... 4 10 10 9 8 2
298 2 10 8 7 9 9 ... 4 8 10 9 8 1
299 8 9 9 9 10 10 ... 10 10 10 10 8 3
300 9 10 9 9 10 9 ... 8 10 10 10 8 3
301 7 10 8 7 9 8 ... 8 8 9 9 8 3
302 10 10 8 10 10 10 ... 4 9 10 9 8 3
303 6 9 10 9 10 10 ... 10 10 10 10 5 10
304 9 10 9 10 10 10 ... 6 10 9 10 8 5
305 7 9 10 9 10 10 ... 9 10 10 10 8 5
306 9 9 9 8 10 10 ... 10 10 10 9 8 5
307 7 9 10 8 10 10 ... 10 8 10 10 8 5
308 9 8 6 9 9 10 ... 7 9 9 4 8 5
309 9 9 9 10 10 10 ... 10 10 10 10 8 5
310 1 9 8 10 10 10 ... 8 10 10 9 8 5
311 9 10 10 10 10 10 ... 10 10 10 10 8 10
312 7 10 9 9 10 10 ... 7 10 10 9 8 5
313 2 5 8 10 10 10 ... 4 10 10 9 8 5
314 7 9 9 10 10 9 ... 7 10 10 9 8 5
我不明白那个错误在说什么,或者如何解决它。
【问题讨论】:
【参考方案1】:这个错误:
ValueError: Error when checking input: expected dense_1_input to have shape (6,) but got array with shape (24,)
可以翻译成英文为:你告诉 Keras 输入会有 6 个维度,但实际输入有 24 个维度。一个可能的解决方法是将模型定义的第一行更改为:
model.add(Dense(32, activation = 'relu', input_dim = 24))
【讨论】:
在那之前我不理解input_dim
,并且在阅读文档时迷路了。谢谢你的澄清。这解决了问题。以上是关于对于 Keras ANN,Pandas 数据框的形状不正确的主要内容,如果未能解决你的问题,请参考以下文章
如何将 numpy 数组存储在 Pandas 数据框的列中?