python SVM:执行错误

Posted

技术标签:

【中文标题】python SVM:执行错误【英文标题】:python SVM: execution error 【发布时间】:2018-08-22 18:52:04 【问题描述】:

我正在使用 Python 3.6 和 Windows,并且正在学习 Python SVM 预测。我得到下面的代码。但是,经过彻底运行和检查后,我仍然收到如下错误:

  File "C:\Users\Lawrence\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 614, in column_or_1d
    raise ValueError("bad input shape 0".format(shape))

ValueError: bad input shape ()

原python代码如下:

import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVR

input_file = r"C:\Users\Lawrence\Desktop\traffic_data.txt"

# Reading the data
X = []
count = 0
with open(input_file, 'r') as f:
    for line in f.readlines():
        data = line[:-1].split(',')
        X.append(data)

X = np.array(X)

# Convert string data to numerical data
label_encoder = [] 
X_encoded = np.empty(X.shape)
for i,item in enumerate(X[0]):
    if item.isdigit():
        X_encoded[:, i] = X[:, i]
    else:
        label_encoder.append(preprocessing.LabelEncoder())
        X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)

# Build SVR
params = 'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2 
regressor = SVR(**params)
regressor.fit(X, y)

# Cross validation
import sklearn.metrics as sm

y_pred = regressor.predict(X)
print ("Mean absolute error =", round(sm.mean_absolute_error(y, y_pred), 2))

# Testing encoding on single data instance
input_data = ['Tuesday', '13:35', 'San Francisco', 'yes']
input_data_encoded = [-1] * len(input_data)
count = 0
for i,item in enumerate(input_data):
    if item.isdigit():
        input_data_encoded[i] = int(input_data[i])
    else:
        input_data_encoded[i] = int(label_encoder[count].transform(input_data[i]))
        count = count + 1 

input_data_encoded = np.array(input_data_encoded)

# Predict and print output for a particular datapoint
print ("Predicted traffic:", int(regressor.predict(input_data_encoded)[0]))

输入文件数据(traffic_data.txt)如下:

Tuesday,00:00,San Francisco,no,3
Tuesday,00:05,San Francisco,no,8
Tuesday,00:10,San Francisco,no,10
Tuesday,00:15,San Francisco,no,6
Tuesday,00:20,San Francisco,no,1
Tuesday,00:25,San Francisco,no,4
Tuesday,00:30,San Francisco,no,9
Tuesday,00:35,San Francisco,no,4
Tuesday,00:40,San Francisco,no,6
Tuesday,00:45,San Francisco,no,13
Tuesday,00:50,San Francisco,no,5
Tuesday,00:55,San Francisco,no,5
Tuesday,01:00,San Francisco,no,4
Tuesday,01:05,San Francisco,no,7
Tuesday,01:10,San Francisco,no,5
Tuesday,01:15,San Francisco,no,4
Tuesday,01:20,San Francisco,no,5
Tuesday,01:25,San Francisco,no,1
Tuesday,01:30,San Francisco,no,8
Tuesday,01:35,San Francisco,no,2
Tuesday,01:40,San Francisco,no,3
Tuesday,01:45,San Francisco,no,0
Tuesday,01:50,San Francisco,no,2
Tuesday,01:55,San Francisco,no,1
Tuesday,02:00,San Francisco,no,1
Tuesday,02:05,San Francisco,no,0
Tuesday,02:10,San Francisco,no,2
Tuesday,02:15,San Francisco,no,1
Tuesday,02:20,San Francisco,no,2
Tuesday,02:25,San Francisco,no,4
Tuesday,02:30,San Francisco,no,0
Tuesday,02:35,San Francisco,no,0
Tuesday,02:40,San Francisco,no,0
Tuesday,02:45,San Francisco,no,3
Tuesday,02:50,San Francisco,no,1
Tuesday,02:55,San Francisco,no,0
Tuesday,03:00,San Francisco,no,3
Tuesday,03:05,San Francisco,no,0
Tuesday,03:10,San Francisco,no,3
Tuesday,03:15,San Francisco,no,0
Tuesday,03:20,San Francisco,no,0
Tuesday,03:25,San Francisco,no,2
Tuesday,03:30,San Francisco,no,1
Tuesday,03:35,San Francisco,no,1
Tuesday,03:40,San Francisco,no,1
Tuesday,03:45,San Francisco,no,1
Tuesday,03:50,San Francisco,no,0
Tuesday,03:55,San Francisco,no,3
Tuesday,04:00,San Francisco,no,1
Tuesday,04:05,San Francisco,no,2
Tuesday,04:10,San Francisco,no,1
Tuesday,04:15,San Francisco,no,1
Tuesday,04:20,San Francisco,no,2
Tuesday,04:25,San Francisco,no,1
Tuesday,04:30,San Francisco,no,2
Tuesday,04:35,San Francisco,no,2
Tuesday,04:40,San Francisco,no,5
Tuesday,04:45,San Francisco,no,2
Tuesday,04:50,San Francisco,no,5
Tuesday,04:55,San Francisco,no,4
Tuesday,05:00,San Francisco,no,6
Tuesday,05:05,San Francisco,no,5
Tuesday,05:10,San Francisco,no,5
Tuesday,05:15,San Francisco,no,7
Tuesday,05:20,San Francisco,no,4
Tuesday,05:25,San Francisco,no,5
Tuesday,05:30,San Francisco,no,12
Tuesday,05:35,San Francisco,no,12
Tuesday,05:40,San Francisco,no,11
Tuesday,05:45,San Francisco,no,12
Tuesday,05:50,San Francisco,no,11
Tuesday,05:55,San Francisco,no,13
Tuesday,06:00,San Francisco,no,19
Tuesday,06:05,San Francisco,no,16
Tuesday,06:10,San Francisco,no,19
Tuesday,06:15,San Francisco,no,15
Tuesday,06:20,San Francisco,no,8
Tuesday,06:25,San Francisco,no,14
Tuesday,06:30,San Francisco,no,30
Tuesday,06:35,San Francisco,no,35
Tuesday,06:40,San Francisco,no,20
Tuesday,06:45,San Francisco,no,27
Tuesday,06:50,San Francisco,no,33
Tuesday,06:55,San Francisco,no,24
Tuesday,07:00,San Francisco,no,39
Tuesday,07:05,San Francisco,no,42
Tuesday,07:10,San Francisco,no,36
Tuesday,07:15,San Francisco,no,50
Tuesday,07:20,San Francisco,no,42
Tuesday,07:25,San Francisco,no,38
Tuesday,07:30,San Francisco,no,38
Tuesday,07:35,San Francisco,no,40
Tuesday,07:40,San Francisco,no,49
Tuesday,07:45,San Francisco,no,39
Tuesday,07:50,San Francisco,no,43
Tuesday,07:55,San Francisco,no,44
Tuesday,08:00,San Francisco,no,40
Tuesday,08:05,San Francisco,no,22
Tuesday,08:10,San Francisco,no,25
Tuesday,08:15,San Francisco,no,42
Tuesday,08:20,San Francisco,no,37
Tuesday,08:25,San Francisco,no,36
Tuesday,08:30,San Francisco,no,34
Tuesday,08:35,San Francisco,no,41
Tuesday,08:40,San Francisco,no,37
Tuesday,08:45,San Francisco,no,36
Tuesday,08:50,San Francisco,no,40
Tuesday,08:55,San Francisco,no,37
Tuesday,09:00,San Francisco,no,41
Tuesday,09:05,San Francisco,no,38
Tuesday,09:10,San Francisco,no,36
Tuesday,09:15,San Francisco,no,44
Tuesday,09:20,San Francisco,no,33
Tuesday,09:25,San Francisco,no,30
Tuesday,09:30,San Francisco,no,41
Tuesday,09:35,San Francisco,no,36
Tuesday,09:40,San Francisco,no,35
Tuesday,09:45,San Francisco,no,36
Tuesday,09:50,San Francisco,no,35
Tuesday,09:55,San Francisco,no,42
Tuesday,10:00,San Francisco,no,31
Tuesday,10:05,San Francisco,no,25
Tuesday,10:10,San Francisco,no,28
Tuesday,10:15,San Francisco,no,27
Tuesday,10:20,San Francisco,no,23
Tuesday,10:25,San Francisco,no,25

希望有人能解决这个问题。

【问题讨论】:

【参考方案1】:

问题是由我的以下事实引起的:

当您 fit_transform label_encoder 时,您使用 X[:, i] 作为输入,其大小为 (126,)

另一方面,你调用:

label_encoder[count].transform(input_data[i])

现在,您输入形状为(1, )input_data[i]


编辑 1:

import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVR
from sklearn.model_selection import KFold

input_file = r"traffic_data.txt"

# Reading the data
X = []
count = 0
with open(input_file, 'r') as f:
    for line in f.readlines():
        data = line[:-1].split(',')
        X.append(data)
X = np.array(X)

# Convert string data to numerical data
label_encoder = [] 
X_encoded = np.empty(X.shape)
for i,item in enumerate(X[0]):
    if item.isdigit():
        X_encoded[:, i] = X[:, i]
    else:
        label_encoder.append(preprocessing.LabelEncoder())
        X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)

# Build SVR
params = 'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2 
regressor = SVR(**params)

# OPTION 1: Cross validation with 2 folds AND LOOP
kf = KFold(n_splits = 2)

# In this loop, the model is fitted using oNLY the training samples and then the model predicts using ONLY the test samples. 

# The predicted values are stored in the predicted_values 
# The actual (true) values are stored in the true_values
predicted_values = []
true_values=[]
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)
    predicted_values.append(y_pred)
    true_values.append(y_test)

# Now, you can use the predicted_values and true_values to calculate things like accuracy, MSE, MAE e.t.c


# OPTION 2: use cross_val_predict function directly
from sklearn.model_selection import cross_val_predict

# The cross validated predicted values are stored in the y_pred 
y_pred = cross_val_predict(regressor, X, y, cv = kf)


## OPTION 3: use train_test_split function
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

【讨论】:

你好 Sera,我是 python 新手,你能显示代码吗?因为我不确定根据 126 数据样本我应该在哪里更改;我应该将“X_encoded = np.empty(X.shape)”更改为“X_encoded = np.empty(1,shape)”吗?它不起作用 为什么不一开始就转换所有数据,然后使用K-Fold交叉验证?这样,您将转换所有数据,然后使用折叠将数据拆分为训练和测试数据。 对高级python用户的好建议!但是,我很新,我不知道如何开始,甚至不知道如何更改您提供的代码。 “谈话很便宜,给我看代码”引用.. 嗨 Sera,我的原始代码试图预测结果,但是您的代码显示了测试大小和训练大小?? 没有。 y_pred 包含交叉验证的预测。尝试逐行阅读代码。测试和训练集的大小只是为了告诉你我把数据分成训练和测试

以上是关于python SVM:执行错误的主要内容,如果未能解决你的问题,请参考以下文章

python中返回概率的多类线性SVM

如何减少 SVM 的执行时间

使用 R 的 CMA Bioconductor 包时,解决 SVM 分类交叉验证中的“模型空”错误

使用来自 LDA 的主题建模信息作为特征,通过 SVM 执行文本分类

AdaBoost 与 SVM 基分类器的执行时间

Windows 7 64 位 libsvm 和 python 错误:函数 'svm_get_sv_indices' 未找到