如何根据类别标签手动拆分训练和测试数据
Posted
技术标签:
【中文标题】如何根据类别标签手动拆分训练和测试数据【英文标题】:How to split Train and Test data based on Class labels manually 【发布时间】:2018-05-04 14:54:17 【问题描述】:我有 traindata 文件和 testdata 文件如下:
TrainData:
1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5
47,64,50,39,66,51,46,37,43,37,37,35,36,34,37,38,37,39,104,102,103,103,102,108,109,107,106,115,116,116,120,122,121,121,116,116,131,131,130,132,126,127,131,128,127
47,65,58,30,39,48,47,35,42,37,38,37,37,36,38,38,38,40,104,103,103,103,101,108,110,108,106,116,115,116,121,121,119,121,116,116,133,131,129,132,127,128,132,126,127
49,69,55,28,56,64,50,30,41,37,39,37,38,36,39,39,39,40,105,103,104,104,103,110,110,108,107,116,115,117,120,120,117,121,115,116,134,131,129,134,128,125,134,126,127
51,78,52,46,56,74,50,28,38,38,39,38,38,37,40,39,39,41,96,101,99,104,97,101,111,101,104,115,116,116,119,110,112,119,116,116,135,130,129,135,120,108,133,120,125
55,79,53,65,52,102,55,28,36,39,40,38,39,37,40,39,40,42,79,86,84,105,84,57,110,85,76,117,118,115,110,66,86,117,117,118,123,130,130,129,106,93,130,113,114
48,80,59,81,50,120,63,26,31,39,40,39,40,38,42,37,41,42,53,73,77,90,47,34,76,52,63,106,102,97,80,33,68,105,105,113,115,130,124,111,83,91,128,105,110
TestData:
1,4,5,2,3
33,121,125,36,106
34,122,126,38,107
28,95,126,39,109
我正在为训练数据配置分类器以处理测试数据。但它没有正确划分训练数据/测试数据。请提出一些方法或功能,我可以用它来划分火车数据/测试数据
TrainFile = pd.read_csv('trainDataXY.txt', header = -1).as_matrix();
TestFile = pd.read_csv('trainDataXY.txt', header = -1).as_matrix();
Class1DataTrain = TrainFile[1:,1:9]
Class2DataTrain = TrainFile[1:,10:18]
Class3DataTrain = TrainFile[1:,19:27]
Class4DataTrain = TrainFile[1:,28:36]
Class5DataTrain = TrainFile[1:,37:45]
TrainData = np.transpose(np.hstack((Class1DataTrain,Class2DataTrain,Class3DataTrain,Class4DataTrain,Class5DataTrain)))
Class1HeadTrain = TrainFile[0:1,1:9]
Class2HeadTrain = TrainFile[0:1,10:18]
Class3HeadTrain = TrainFile[0:1,19:27]
Class4HeadTrain = TrainFile[0:1,28:36]
Class5HeadTrain = TrainFile[0:1,37:45]
HeadTrain = np.transpose(np.hstack((Class1HeadTrain,Class2HeadTrain,Class3HeadTrain,Class4HeadTrain,Class5HeadTrain)))
Class1DataTest = TestFile[0:1,1:1]
Class2DataTest = TestFile[0:1,4:4]
Class3DataTest = TestFile[0:1,5:5]
Class4DataTest = TestFile[0:1,2:2]
Class5DataTest = TestFile[0:1,3:3]
HeadTest =np.transpose(np.hstack((Class1DataTest,Class2DataTest,Class3DataTest,Class4DataTest,Class5DataTest)))
Class1Headtest = TestFile[1:,1:1]
Class2Headtest = TestFile[1:,4:4]
Class3Headtest = TestFile[1:,5:5]
Class4Headtest = TestFile[1:,2:2]
Class5Headtest = TestFile[1:,3:3]
Combine = np.transpose(np.hstack((Class1DataTest,Class2DataTest,Class3DataTest,Class4DataTest,Class5DataTest)))
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(TrainData, HeadTrain)
dec = clf.decision_function(Combine)
predictions = clf.predict(Combine)
但它没有按预期工作。谁能帮我解决这个问题,因为我是新手?
【问题讨论】:
【参考方案1】:您可以在 MATLAB 中使用函数cvpartition
。
【讨论】:
以上是关于如何根据类别标签手动拆分训练和测试数据的主要内容,如果未能解决你的问题,请参考以下文章
将分区数据集拆分为训练和测试(训练数据每个类有 200 个示例)