k近邻算法python实现 -- 《机器学习实战》
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了k近邻算法python实现 -- 《机器学习实战》相关的知识,希望对你有一定的参考价值。
1 ‘‘‘ 2 Created on Nov 06, 2017 3 kNN: k Nearest Neighbors 4 5 Input: inX: vector to compare to existing dataset (1xN) 6 dataSet: size m data set of known vectors (NxM) 7 labels: data set labels (1xM vector) 8 k: number of neighbors to use for comparison (should be an odd number) 9 10 Output: the most popular class label 11 12 @author: Liu Chuanfeng 13 ‘‘‘ 14 import operator 15 import numpy as np 16 import matplotlib.pyplot as plt 17 18 def classify0(inX, dataSet, labels, k): 19 dataSetSize = dataSet.shape[0] 20 diffMat = np.tile(inX, (dataSetSize,1)) - dataSet 21 sqDiffMat = diffMat ** 2 22 sqDistances = sqDiffMat.sum(axis=1) 23 distances = sqDistances ** 0.5 24 sortedDistIndicies = distances.argsort() 25 classCount = {} 26 for i in range(k): 27 voteIlabel = labels[sortedDistIndicies[i]] 28 classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1 29 sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True) 30 return sortedClassCount[0][0] 31 32 def file2matrix(filename): 33 fr = open(filename) 34 arrayLines = fr.readlines() 35 numberOfLines = len(arrayLines) 36 returnMat = np.zeros((numberOfLines, 3)) 37 classLabelVector = [] 38 index = 0 39 for line in arrayLines: 40 line = line.strip() 41 listFromLine = line.split(‘\t‘) 42 returnMat[index,:] = listFromLine[0:3] 43 classLabelVector.append(int(listFromLine[-1])) 44 index += 1 45 return returnMat, classLabelVector 46 47 def autoNorm(dataSet): 48 maxVals = dataSet.max(0) 49 minVals = dataSet.min(0) 50 ranges = maxVals - minVals 51 m = dataSet.shape[0] 52 normDataSet = (dataSet - np.tile(minVals, (m, 1))) / np.tile(ranges, (m, 1)) 53 return normDataSet, ranges, minVals 54 55 def datingClassTest(): 56 hoRatio = 0.10 57 datingDataMat, datingLabels = file2matrix(‘datingTestSet2.txt‘) 58 normMat, ranges, minVals = autoNorm(datingDataMat) 59 m = normMat.shape[0] 60 numTestVecs = int(m * hoRatio) 61 errorCount = 0.0 62 for i in range(numTestVecs): 63 classifyResult = classify0(normMat[i,:], normMat[numTestVecs:m, :], datingLabels[numTestVecs:m], 3) 64 print(‘theclassifier came back with: %d, the real answer is: %d‘ % (classifyResult, datingLabels[i])) 65 if ( classifyResult != datingLabels[i]): 66 errorCount += 1.0 67 print (‘the total error rate is: %.1f%%‘ % (errorCount/float(numTestVecs) * 100)) 68 69 def classifyPerson(): 70 resultList = [‘not at all‘, ‘in small doses‘, ‘in large doses‘] 71 percentTats = float(input("percentage of time spent playing video games?")) 72 ffMiles = float(input("frequent flier miles earned per year?")) 73 iceCream = float(input("liters of ice cream consumed per year?")) 74 datingDataMat, datingLabels = file2matrix(‘datingTestSet2.txt‘) 75 normMat, ranges, minVals = autoNorm(datingDataMat) 76 inArr = np.array([ffMiles, percentTats, iceCream]) 77 classifyResult = classify0((inArr-minVals)/ranges, normMat, datingLabels, 3) 78 print ("You will probably like this persoon:", resultList[classifyResult - 1]) 79 80 # Unit test of func: file2matrix() 81 #datingDataMat, datingLabels = file2matrix(‘datingTestSet2.txt‘) 82 #print (datingDataMat) 83 #print (datingLabels) 84 85 # Usage of figure construction of matplotlib 86 #fig=plt.figure() 87 #ax = fig.add_subplot(111) 88 #ax.scatter(datingDataMat[:,1], datingDataMat[:,2], 15.0*np.array(datingLabels), 15.0*np.array(datingLabels)) 89 #plt.show() 90 91 #Unit test of func: autoNorm() 92 #normMat, ranges, minVals = autoNorm(datingDataMat) 93 #print (normMat) 94 #print (ranges) 95 #print (minVals) 96 97 datingClassTest() 98 classifyPerson()
Output:
theclassifier came back with: 3, the real answer is: 3
the total error rate is: 0.0%
theclassifier came back with: 2, the real answer is: 2
the total error rate is: 0.0%
theclassifier came back with: 1, the real answer is: 1
the total error rate is: 0.0%
...
theclassifier came back with: 2, the real answer is: 2
the total error rate is: 4.0%
theclassifier came back with: 1, the real answer is: 1
the total error rate is: 4.0%
theclassifier came back with: 3, the real answer is: 1
the total error rate is: 5.0%
percentage of time spent playing video games?10
frequent flier miles earned per year?10000
liters of ice cream consumed per year?0.5
You will probably like this persoon: in small doses
Reference:
《机器学习实战》
以上是关于k近邻算法python实现 -- 《机器学习实战》的主要内容,如果未能解决你的问题,请参考以下文章