python SVM:执行错误
Posted
技术标签:
【中文标题】python SVM:执行错误【英文标题】:python SVM: execution error 【发布时间】:2018-08-22 18:52:04 【问题描述】:我正在使用 Python 3.6 和 Windows,并且正在学习 Python SVM 预测。我得到下面的代码。但是,经过彻底运行和检查后,我仍然收到如下错误:
File "C:\Users\Lawrence\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 614, in column_or_1d
raise ValueError("bad input shape 0".format(shape))
ValueError: bad input shape ()
原python代码如下:
import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVR
input_file = r"C:\Users\Lawrence\Desktop\traffic_data.txt"
# Reading the data
X = []
count = 0
with open(input_file, 'r') as f:
for line in f.readlines():
data = line[:-1].split(',')
X.append(data)
X = np.array(X)
# Convert string data to numerical data
label_encoder = []
X_encoded = np.empty(X.shape)
for i,item in enumerate(X[0]):
if item.isdigit():
X_encoded[:, i] = X[:, i]
else:
label_encoder.append(preprocessing.LabelEncoder())
X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])
X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)
# Build SVR
params = 'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2
regressor = SVR(**params)
regressor.fit(X, y)
# Cross validation
import sklearn.metrics as sm
y_pred = regressor.predict(X)
print ("Mean absolute error =", round(sm.mean_absolute_error(y, y_pred), 2))
# Testing encoding on single data instance
input_data = ['Tuesday', '13:35', 'San Francisco', 'yes']
input_data_encoded = [-1] * len(input_data)
count = 0
for i,item in enumerate(input_data):
if item.isdigit():
input_data_encoded[i] = int(input_data[i])
else:
input_data_encoded[i] = int(label_encoder[count].transform(input_data[i]))
count = count + 1
input_data_encoded = np.array(input_data_encoded)
# Predict and print output for a particular datapoint
print ("Predicted traffic:", int(regressor.predict(input_data_encoded)[0]))
输入文件数据(traffic_data.txt)如下:
Tuesday,00:00,San Francisco,no,3
Tuesday,00:05,San Francisco,no,8
Tuesday,00:10,San Francisco,no,10
Tuesday,00:15,San Francisco,no,6
Tuesday,00:20,San Francisco,no,1
Tuesday,00:25,San Francisco,no,4
Tuesday,00:30,San Francisco,no,9
Tuesday,00:35,San Francisco,no,4
Tuesday,00:40,San Francisco,no,6
Tuesday,00:45,San Francisco,no,13
Tuesday,00:50,San Francisco,no,5
Tuesday,00:55,San Francisco,no,5
Tuesday,01:00,San Francisco,no,4
Tuesday,01:05,San Francisco,no,7
Tuesday,01:10,San Francisco,no,5
Tuesday,01:15,San Francisco,no,4
Tuesday,01:20,San Francisco,no,5
Tuesday,01:25,San Francisco,no,1
Tuesday,01:30,San Francisco,no,8
Tuesday,01:35,San Francisco,no,2
Tuesday,01:40,San Francisco,no,3
Tuesday,01:45,San Francisco,no,0
Tuesday,01:50,San Francisco,no,2
Tuesday,01:55,San Francisco,no,1
Tuesday,02:00,San Francisco,no,1
Tuesday,02:05,San Francisco,no,0
Tuesday,02:10,San Francisco,no,2
Tuesday,02:15,San Francisco,no,1
Tuesday,02:20,San Francisco,no,2
Tuesday,02:25,San Francisco,no,4
Tuesday,02:30,San Francisco,no,0
Tuesday,02:35,San Francisco,no,0
Tuesday,02:40,San Francisco,no,0
Tuesday,02:45,San Francisco,no,3
Tuesday,02:50,San Francisco,no,1
Tuesday,02:55,San Francisco,no,0
Tuesday,03:00,San Francisco,no,3
Tuesday,03:05,San Francisco,no,0
Tuesday,03:10,San Francisco,no,3
Tuesday,03:15,San Francisco,no,0
Tuesday,03:20,San Francisco,no,0
Tuesday,03:25,San Francisco,no,2
Tuesday,03:30,San Francisco,no,1
Tuesday,03:35,San Francisco,no,1
Tuesday,03:40,San Francisco,no,1
Tuesday,03:45,San Francisco,no,1
Tuesday,03:50,San Francisco,no,0
Tuesday,03:55,San Francisco,no,3
Tuesday,04:00,San Francisco,no,1
Tuesday,04:05,San Francisco,no,2
Tuesday,04:10,San Francisco,no,1
Tuesday,04:15,San Francisco,no,1
Tuesday,04:20,San Francisco,no,2
Tuesday,04:25,San Francisco,no,1
Tuesday,04:30,San Francisco,no,2
Tuesday,04:35,San Francisco,no,2
Tuesday,04:40,San Francisco,no,5
Tuesday,04:45,San Francisco,no,2
Tuesday,04:50,San Francisco,no,5
Tuesday,04:55,San Francisco,no,4
Tuesday,05:00,San Francisco,no,6
Tuesday,05:05,San Francisco,no,5
Tuesday,05:10,San Francisco,no,5
Tuesday,05:15,San Francisco,no,7
Tuesday,05:20,San Francisco,no,4
Tuesday,05:25,San Francisco,no,5
Tuesday,05:30,San Francisco,no,12
Tuesday,05:35,San Francisco,no,12
Tuesday,05:40,San Francisco,no,11
Tuesday,05:45,San Francisco,no,12
Tuesday,05:50,San Francisco,no,11
Tuesday,05:55,San Francisco,no,13
Tuesday,06:00,San Francisco,no,19
Tuesday,06:05,San Francisco,no,16
Tuesday,06:10,San Francisco,no,19
Tuesday,06:15,San Francisco,no,15
Tuesday,06:20,San Francisco,no,8
Tuesday,06:25,San Francisco,no,14
Tuesday,06:30,San Francisco,no,30
Tuesday,06:35,San Francisco,no,35
Tuesday,06:40,San Francisco,no,20
Tuesday,06:45,San Francisco,no,27
Tuesday,06:50,San Francisco,no,33
Tuesday,06:55,San Francisco,no,24
Tuesday,07:00,San Francisco,no,39
Tuesday,07:05,San Francisco,no,42
Tuesday,07:10,San Francisco,no,36
Tuesday,07:15,San Francisco,no,50
Tuesday,07:20,San Francisco,no,42
Tuesday,07:25,San Francisco,no,38
Tuesday,07:30,San Francisco,no,38
Tuesday,07:35,San Francisco,no,40
Tuesday,07:40,San Francisco,no,49
Tuesday,07:45,San Francisco,no,39
Tuesday,07:50,San Francisco,no,43
Tuesday,07:55,San Francisco,no,44
Tuesday,08:00,San Francisco,no,40
Tuesday,08:05,San Francisco,no,22
Tuesday,08:10,San Francisco,no,25
Tuesday,08:15,San Francisco,no,42
Tuesday,08:20,San Francisco,no,37
Tuesday,08:25,San Francisco,no,36
Tuesday,08:30,San Francisco,no,34
Tuesday,08:35,San Francisco,no,41
Tuesday,08:40,San Francisco,no,37
Tuesday,08:45,San Francisco,no,36
Tuesday,08:50,San Francisco,no,40
Tuesday,08:55,San Francisco,no,37
Tuesday,09:00,San Francisco,no,41
Tuesday,09:05,San Francisco,no,38
Tuesday,09:10,San Francisco,no,36
Tuesday,09:15,San Francisco,no,44
Tuesday,09:20,San Francisco,no,33
Tuesday,09:25,San Francisco,no,30
Tuesday,09:30,San Francisco,no,41
Tuesday,09:35,San Francisco,no,36
Tuesday,09:40,San Francisco,no,35
Tuesday,09:45,San Francisco,no,36
Tuesday,09:50,San Francisco,no,35
Tuesday,09:55,San Francisco,no,42
Tuesday,10:00,San Francisco,no,31
Tuesday,10:05,San Francisco,no,25
Tuesday,10:10,San Francisco,no,28
Tuesday,10:15,San Francisco,no,27
Tuesday,10:20,San Francisco,no,23
Tuesday,10:25,San Francisco,no,25
希望有人能解决这个问题。
【问题讨论】:
【参考方案1】:问题是由我的以下事实引起的:
当您 fit_transform
label_encoder
时,您使用 X[:, i]
作为输入,其大小为 (126,)
。
另一方面,你调用:
label_encoder[count].transform(input_data[i])
现在,您输入形状为(1, )
的input_data[i]
编辑 1:
import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVR
from sklearn.model_selection import KFold
input_file = r"traffic_data.txt"
# Reading the data
X = []
count = 0
with open(input_file, 'r') as f:
for line in f.readlines():
data = line[:-1].split(',')
X.append(data)
X = np.array(X)
# Convert string data to numerical data
label_encoder = []
X_encoded = np.empty(X.shape)
for i,item in enumerate(X[0]):
if item.isdigit():
X_encoded[:, i] = X[:, i]
else:
label_encoder.append(preprocessing.LabelEncoder())
X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])
X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)
# Build SVR
params = 'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2
regressor = SVR(**params)
# OPTION 1: Cross validation with 2 folds AND LOOP
kf = KFold(n_splits = 2)
# In this loop, the model is fitted using oNLY the training samples and then the model predicts using ONLY the test samples.
# The predicted values are stored in the predicted_values
# The actual (true) values are stored in the true_values
predicted_values = []
true_values=[]
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
predicted_values.append(y_pred)
true_values.append(y_test)
# Now, you can use the predicted_values and true_values to calculate things like accuracy, MSE, MAE e.t.c
# OPTION 2: use cross_val_predict function directly
from sklearn.model_selection import cross_val_predict
# The cross validated predicted values are stored in the y_pred
y_pred = cross_val_predict(regressor, X, y, cv = kf)
## OPTION 3: use train_test_split function
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
【讨论】:
你好 Sera,我是 python 新手,你能显示代码吗?因为我不确定根据 126 数据样本我应该在哪里更改;我应该将“X_encoded = np.empty(X.shape)”更改为“X_encoded = np.empty(1,shape)”吗?它不起作用 为什么不一开始就转换所有数据,然后使用K-Fold交叉验证?这样,您将转换所有数据,然后使用折叠将数据拆分为训练和测试数据。 对高级python用户的好建议!但是,我很新,我不知道如何开始,甚至不知道如何更改您提供的代码。 “谈话很便宜,给我看代码”引用.. 嗨 Sera,我的原始代码试图预测结果,但是您的代码显示了测试大小和训练大小?? 没有。 y_pred 包含交叉验证的预测。尝试逐行阅读代码。测试和训练集的大小只是为了告诉你我把数据分成训练和测试以上是关于python SVM:执行错误的主要内容,如果未能解决你的问题,请参考以下文章
使用 R 的 CMA Bioconductor 包时,解决 SVM 分类交叉验证中的“模型空”错误
使用来自 LDA 的主题建模信息作为特征,通过 SVM 执行文本分类
Windows 7 64 位 libsvm 和 python 错误:函数 'svm_get_sv_indices' 未找到