(理论和代码相结合)KNN(最近邻)算法⭐

Posted 土味儿大谢

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了(理论和代码相结合)KNN(最近邻)算法⭐相关的知识,希望对你有一定的参考价值。

KNN:

  • 一种非参数、惰性学习方法,导致预测时速度慢
  • 当训练样本集较大时,会导致其计算开销高
  • 样本不平衡时,对稀有类别的预测准确率低
  • KNN模型的可解释性不强

KNN(思想:物以类聚)

k近邻法(k-nearest neighbor, k-NN)可以做分类也可以做回归
k=1:最近邻(注:容易过拟合,不能用)
k≠1:k最近邻
k过大也不行,容易欠拟合

一、 距离度量

1.1 欧氏距离(L2距离)

1.2 曼哈顿距离(L1距离)

1.3 夹角余弦

二、 举例解析


电影所属类别距离度量

三、 k-近邻算法步骤

  1. 计算已知类别数据集中的点与当前点之间的距离
  2. 按照距离递增次序排序
  3. 选取与当前点距离最小的k个点
  4. 确定前k个点所在类别的出现频率
  5. 返回前k个点所出现频率最高的类别作为当前点的预测分类

四、 手写KNN算法代码实现案例

4.1 电影分类

4.1.1 基于numpy

import numpy as np
import collections #统计的库
#数据集
dataset=np.array([[1,101],[5,89],[108,5],[115,8]])#四组二维特征
labels = ['爱情片','爱情片','动作片','动作片']

test_data = [101,20]#测试数据,确定类别

k=3  #超参数,需要自己设定

distances = np.sum((test_data - dataset)**2, axis=1)**0.5#欧式距离
#argsort():表示对数据进行排序,返回排序后的索引。然后根据索引获取标签
k_labels = [labels[index] for index in distances.argsort()[0:k]]# k个最近的标签
label = collections.Counter(k_labels).most_common(1)[0][0]# 出现次数最多的标签即为最终类别
print(label)

#封装代码
#classify(测试样本点,训练集数据,训练集标签,k)
def classify(test, dataset, labels, k):  #测试样本点,训练数据的特征,训练数据的标签,超参数
    dist = np.sum((test - dataset)**2, axis=1)**0.5# 计算距离
    k_labels = [labels[index] for index in dist.argsort()[0 : k]]# k个最近的标签
    label = collections.Counter(k_labels).most_common(1)[0][0]# 出现次数最多的标签即为最终类别
    return label


#kNN分类
test_class = classify(test_data, dataset, labels, 3)
test_class

4.1.2 基于pandas(建议)

import pandas as pd 
df=pd.DataFrame(dataset,columns=['打斗镜头数','接吻镜头数'])
df['label']=labels

test_data = [101,20]#测试数据,确定类别

k=3  #超参数,需要自己设定

#求距离
df['distance']=np.sum((df.iloc[:,:2].values-test_data)**2, axis=1)**0.5
#基于distance进行排序,再取出前k个标签,然后计数,再提取出第1个的索引
df.iloc[:,2:].sort_values(by='distance')['label'][:k].value_counts().index[0]

4.2 约会网站配对效果判定

↓这是datingTestSet.txt里的内容,直接复制到记事本内保存

40920	8.326976	0.953952	largeDoses
14488	7.153469	1.673904	smallDoses
26052	1.441871	0.805124	didntLike
75136	13.147394	0.428964	didntLike
38344	1.669788	0.134296	didntLike
72993	10.141740	1.032955	didntLike
35948	6.830792	1.213192	largeDoses
42666	13.276369	0.543880	largeDoses
67497	8.631577	0.749278	didntLike
35483	12.273169	1.508053	largeDoses
50242	3.723498	0.831917	didntLike
63275	8.385879	1.669485	didntLike
5569	4.875435	0.728658	smallDoses
51052	4.680098	0.625224	didntLike
77372	15.299570	0.331351	didntLike
43673	1.889461	0.191283	didntLike
61364	7.516754	1.269164	didntLike
69673	14.239195	0.261333	didntLike
15669	0.000000	1.250185	smallDoses
28488	10.528555	1.304844	largeDoses
6487	3.540265	0.822483	smallDoses
37708	2.991551	0.833920	didntLike
22620	5.297865	0.638306	smallDoses
28782	6.593803	0.187108	largeDoses
19739	2.816760	1.686209	smallDoses
36788	12.458258	0.649617	largeDoses
5741	0.000000	1.656418	smallDoses
28567	9.968648	0.731232	largeDoses
6808	1.364838	0.640103	smallDoses
41611	0.230453	1.151996	didntLike
36661	11.865402	0.882810	largeDoses
43605	0.120460	1.352013	didntLike
15360	8.545204	1.340429	largeDoses
63796	5.856649	0.160006	didntLike
10743	9.665618	0.778626	smallDoses
70808	9.778763	1.084103	didntLike
72011	4.932976	0.632026	didntLike
5914	2.216246	0.587095	smallDoses
14851	14.305636	0.632317	largeDoses
33553	12.591889	0.686581	largeDoses
44952	3.424649	1.004504	didntLike
17934	0.000000	0.147573	smallDoses
27738	8.533823	0.205324	largeDoses
29290	9.829528	0.238620	largeDoses
42330	11.492186	0.263499	largeDoses
36429	3.570968	0.832254	didntLike
39623	1.771228	0.207612	didntLike
32404	3.513921	0.991854	didntLike
27268	4.398172	0.975024	didntLike
5477	4.276823	1.174874	smallDoses
14254	5.946014	1.614244	smallDoses
68613	13.798970	0.724375	didntLike
41539	10.393591	1.663724	largeDoses
7917	3.007577	0.297302	smallDoses
21331	1.031938	0.486174	smallDoses
8338	4.751212	0.064693	smallDoses
5176	3.692269	1.655113	smallDoses
18983	10.448091	0.267652	largeDoses
68837	10.585786	0.329557	didntLike
13438	1.604501	0.069064	smallDoses
48849	3.679497	0.961466	didntLike
12285	3.795146	0.696694	smallDoses
7826	2.531885	1.659173	smallDoses
5565	9.733340	0.977746	smallDoses
10346	6.093067	1.413798	smallDoses
1823	7.712960	1.054927	smallDoses
9744	11.470364	0.760461	largeDoses
16857	2.886529	0.934416	smallDoses
39336	10.054373	1.138351	largeDoses
65230	9.972470	0.881876	didntLike
2463	2.335785	1.366145	smallDoses
27353	11.375155	1.528626	largeDoses
16191	0.000000	0.605619	smallDoses
12258	4.126787	0.357501	smallDoses
42377	6.319522	1.058602	didntLike
25607	8.680527	0.086955	largeDoses
77450	14.856391	1.129823	didntLike
58732	2.454285	0.222380	didntLike
46426	7.292202	0.548607	largeDoses
32688	8.745137	0.857348	largeDoses
64890	8.579001	0.683048	didntLike
8554	2.507302	0.869177	smallDoses
28861	11.415476	1.505466	largeDoses
42050	4.838540	1.680892	didntLike
32193	10.339507	0.583646	largeDoses
64895	6.573742	1.151433	didntLike
2355	6.539397	0.462065	smallDoses
0	2.209159	0.723567	smallDoses
70406	11.196378	0.836326	didntLike
57399	4.229595	0.128253	didntLike
41732	9.505944	0.005273	largeDoses
11429	8.652725	1.348934	largeDoses
75270	17.101108	0.490712	didntLike
5459	7.871839	0.717662	smallDoses
73520	8.262131	1.361646	didntLike
40279	9.015635	1.658555	largeDoses
21540	9.215351	0.806762	largeDoses
17694	6.375007	0.033678	smallDoses
22329	2.262014	1.022169	didntLike
46570	5.677110	0.709469	didntLike
42403	11.293017	0.207976	largeDoses
33654	6.590043	1.353117	didntLike
9171	4.711960	0.194167	smallDoses
28122	8.768099	1.108041	largeDoses
34095	11.502519	0.545097	largeDoses
1774	4.682812	0.578112	smallDoses
40131	12.446578	0.300754	largeDoses
13994	12.908384	1.657722	largeDoses
77064	12.601108	0.974527	didntLike
11210	3.929456	0.025466	smallDoses
6122	9.751503	1.182050	largeDoses
15341	3.043767	0.888168	smallDoses
44373	4.391522	0.807100	didntLike
28454	11.695276	0.679015	largeDoses
63771	7.879742	0.154263	didntLike
9217	5.613163	0.933632	smallDoses
69076	9.140172	0.851300	didntLike
24489	4.258644	0.206892	didntLike
16871	6.799831	1.221171	smallDoses
39776	8.752758	0.484418	largeDoses
5901	1.123033	1.180352	smallDoses
40987	10.833248	1.585426	largeDoses
7479	3.051618	0.026781	smallDoses
38768	5.308409	0.030683	largeDoses
4933	1.841792	0.028099	smallDoses
32311	2.261978	1.605603	didntLike
26501	11.573696	1.061347	largeDoses
37433	8.038764	1.083910	largeDoses
23503	10.734007	0.103715	largeDoses
68607	9.661909	0.350772	didntLike
27742	9.005850	0.548737	largeDoses
11303	0.000000	0.539131	smallDoses
0	5.757140	1.062373	smallDoses
32729	9.164656	1.624565	largeDoses
24619	1.318340	1.436243	didntLike
42414	14.075597	0.695934	largeDoses
20210	10.107550	1.308398	largeDoses
33225	7.960293	1.219760	largeDoses
54483	6.317292	0.018209	didntLike
18475	12.664194	0.595653	largeDoses
33926	2.906644	0.581657	didntLike
43865	2.388241	0.913938	didntLike
26547	6.024471	0.486215	largeDoses
44404	7.226764	1.255329	largeDoses
16674	4.183997	1.275290	smallDoses
8123	11.850211	1.096981	largeDoses
42747	11.661797	1.167935	largeDoses
56054	3.574967	0.494666	didntLike
10933	0.000000	0.107475	smallDoses
18121	7.937657	0.904799	largeDoses
11272	3.365027	1.014085	smallDoses
16297	0.000000	0.367491	smallDoses
28168	13.860672	1.293270	largeDoses
40963	10.306714	1.211594	largeDoses
31685	7.228002	0.670670	largeDoses
55164	4.508740	1.036192	didntLike
17595	0.366328	0.163652	smallDoses
1862	3.299444	0.575152	smallDoses
57087	0.573287	0.607915	didntLike
63082	9.183738	0.012280	didntLike
51213	7.842646	1.060636	largeDoses
6487	4.750964	0.558240	smallDoses
4805	11.438702	1.556334	largeDoses
30302	8.243063	1.122768	largeDoses
68680	7.949017	0.271865	didntLike
17591	7.875477	0.227085	smallDoses
74391	9.569087	0.364856	didntLike
37217	7.750103	0.869094	largeDoses
42814	0.000000	1.515293	didntLike
14738	3.396030	0.633977	smallDoses
19896	11.916091	0.025294	largeDoses
14673	0.460758	0.689586	smallDoses
32011	13.087566	0.476002	largeDoses
58736	4.589016	1.672600	didntLike
54744	8.397217	1.534103	didntLike
29482	5.562772	1.689388	didntLike
27698	10.905159	0.619091	largeDoses
11443	1.311441	1.169887	smallDoses
56117	10.647170	0.980141	largeDoses
39514	0.000000	0.481918	didntLike
26627	8.503025	0.830861	largeDoses
16525	0.436880	1.395314	smallDoses
24368	6.127867	1.102179	didntLike
22160	12.112492	0.359680	largeDoses
6030	1.264968	1.141582	smallDoses
6468	6.067568	1.327047	smallDoses
22945	8.010964	1.681648	largeDoses
18520	3.791084	0.304072	smallDoses
34914	11.773195	1.262621	largeDoses
6121	8.339588	1.443357	smallDoses
38063	2.563092	1.464013	didntLike
23410	5.954216	0.953782	didntLike
35073	9.288374	0.767318	largeDoses
52914	3.976796	1.043109	didntLike
16801	8.585227	1.455708	largeDoses
9533	1.271946	0.796506	smallDoses
16721	0.000000	0.242778	smallDoses
5832	0.000000	0.089749	smallDoses
44591	11.521298	0.300860	largeDoses
10143	1.139447	0.415373	smallDoses
21609	5.699090	1.391892	smallDoses
23817	2.449378	1.322560	didntLike
15640	0.000000	1.228380	smallDoses
8847	3.168365	0.053993	smallDoses
50939	10.428610	1.126257	largeDoses
28521	2.943070	1.446816	didntLike
32901	10.441348	0.975283	largeDoses
42850	12.478764	1.628726	largeDoses
13499	5.856902	0.363883	smallDoses
40345	2.476420	0.096075	didntLike
43547	1.826637	0.811457	didntLike
70758	4.324451	0.328235	didntLike
19780	1.376085	1.178359	smallDoses
44484	5.342462	0.394527	didntLike
54462	11.835521	0.693301	largeDoses
20085	12.423687	1.424264	largeDoses
42291	12.161273	0.071131	largeDoses
47550	8.148360	1.649194	largeDoses
11938	1.531067	1.549756	smallDoses
40699	3.200912	0.309679	didntLike
70908	8.862691	0.530506	didntLike
73989	6.370551	0.369350	didntLike
11872	2.468841	0.145060	smallDoses
48463	11.054212	0.141508	largeDoses
15987	2.037080	0.715243	smallDoses
70036	13.364030	0.549972	didntLike
32967	10.249135	0.192735	largeDoses
63249	10.464252	1.669767	didntLike
42795	9.424574	0.013725	largeDoses
14459	4.458902	0.268444	smallDoses
19973	0.000000	0.575976	smallDoses
5494	9.686082	1.029808	largeDoses
67902	13.649402	1.052618	didntLike
25621	13.181148	0.273014	largeDoses
27545	3.877472	0.401600	didntLike
58656	1.413952	0.451380	didntLike
7327	4.248986	1.430249	smallDoses
64555	8.779183	0.845947	didntLike
8998	4.156252	0.097109	smallDoses
11752	5.580018	0.158401	smallDoses
76319	15.040440	1.366898	didntLike
27665	12.793870	1.307323	largeDoses
67417	3.254877	0.669546	didntLike
21808	10.725607	0.588588	largeDoses
15326	8.256473	0.765891	smallDoses
20057	8.033892	1.618562	largeDoses
79341	10.702532	0.204792	didntLike
15636	5.062996	1.132555	smallDoses
35602	10.772286	0.668721	largeDoses
28544	1.892354	0.837028	didntLike
57663	1.019966	0.372320	didntLike
78727	15.546043	0.729742	didntLike
68255	11.638205	0.409125	didntLike
14964	3.427886	0.975616	smallDoses
21835	11.246174	1.475586	largeDoses
7487	0.000000	0.645045	smallDoses
8700	0.000000	1.424017	smallDoses
26226	8.242553	0.279069	largeDoses
65899	8.700060	0.101807	didntLike
6543	0.812344	0.260334	smallDoses
46556	2.448235	1.176829	didntLike
71038	13.230078	0.616147	didntLike
47657	0.236133	0.340840	didntLike
19600	11.155826	0.335131	largeDoses
37422	11.029636	0.505769	largeDoses
1363	2.901181	1.646633	smallDoses
26535	3.924594	1.143120	didntLike
47707	2.524806	1.292848	didntLike
38055	3.527474	1.449158	didntLike
6286	3.384281	0.889268	smallDoses
10747	0.000000	1.107592	smallDoses
44883	11.898890	0.406441	largeDoses
56823	3.529892	1.375844	didntLike
68086	11.442677	0.696919	didntLike
70242	10.308145	0.422722	didntLike
11409	8.540529	0.727373	smallDoses
67671	7.156949	1.691682	didntLike
61238	0.720675	0.847574	didntLike
17774	0.229405	1.038603	smallDoses
53376	3.399331	0.077501	didntLike
30930	6.157239	0.580133	didntLike
28987	1.239698	0.719989	didntLike
13655	6.036854	0.016548	smallDoses
7227	5.258665	0.933722	smallDoses
40409	12.393001	1.571281	largeDoses
13605	9.627613	0.935842	smallDoses
26400	11.130453	0.597610	largeDoses
13491	8.842595	0.349768	largeDoses
30232	10.690010	1.456595	largeDoses
43253	5.714718	1.674780	largeDoses
55536	3.052505	1.335804	didntLike
8807	0.000000	0.059025	smallDoses
25783	9.945307	1.287952	largeDoses
22812	2.719723	1.142148	didntLike
77826	11.154055	1.608486	didntLike
38172	2.687918	0.660836	didntLike
31676	10.037847	0.962245	largeDoses
74038	12.404762	1.112080	didntLike
44738	10.237305	0.633422	largeDoses
17410	4.745392	0.662520	smallDoses
5688	4.639461	1.569431	smallDoses
36642	3.149310	0.639669	didntLike
29956	13.406875	1.639194	largeDoses
60350	6.068668	0.881241	didntLike
23758	9.477022	0.899002	largeDoses
25780	3.897620	0.560201	smallDoses
11342	5.463615	1.203677	smallDoses
36109	3.369267	1.575043	didntLike
14292	5.234562	0.825954	smallDoses
11160	0.000000	0.722170	smallDoses
23762	12.979069	0.504068	largeDoses
39567	5.376564	0.557476	didntLike
25647	13.527910	1.586732	largeDoses
14814	2.196889	0.784587	smallDoses
73590	10.691748	0.007509	didntLike
35187	1.659242	0.447066	didntLike
49459	8.369667	0.656697	largeDoses
31657	13.157197	0.143248	largeDoses
6259	8.199667	0.908508	smallDoses
33101	4.441669	0.439381	largeDoses
27107	9.846492	0.644523	largeDoses
17824	0.019540	0.977949	smallDoses
43536	8.253774	0.748700	largeDoses
67705	6.038620	1.509646	didntLike
35283	6.091587	1.694641	largeDoses
71308	8.986820	1.225165	didntLike
31054	11.508473	1.624296	largeDoses
52387	8.807734	0.713922	largeDoses
40328	0.000000	0.816676	didntLike
34844	8.889202	1.665414	largeDoses
11607	3.178117	0.542752	smallDoses
64306	7.013795	0.139909	didntLike
32721	9.605014	0.065254	largeDoses
33170	1.230540	1.331674	didntLike
37192	10.412811	0.890803	largeDoses
13089	0.000000	0.567161	smallDoses
66491	9.699991	0.122011	didntLike
15941	0.000000	0.061191	smallDoses
4272	4.455293	0.272135	smallDoses
48812	3.020977	1.502803	didntLike
28818	8.099278	0.216317	largeDoses
35394	1.157764	1.603217	didntLike
71791	10.105396	0.121067	didntLike
40668	11.230148	0.408603	largeDoses
39580	9.070058	0.011379	largeDoses
11786	0.566460	0.478837	smallDoses
19251	0.000000	0.487300	smallDoses
56594	8.956369	1.193484	largeDoses
54495	1.523057	0.620528	didntLike
11844	2.749006	以上是关于(理论和代码相结合)KNN(最近邻)算法⭐的主要内容,如果未能解决你的问题,请参考以下文章

KNN(最近邻)分类算法

如何预测股票分析--k-近邻

day-9 sklearn库和python自带库实现最近邻KNN算法

算法学习笔记:knn理论介绍

python用K近邻(KNN)算法分类MNIST数据集和Fashion MNIST数据集

模式识别实验二:K近邻算法(KNN)