查看neighbors大小对K近邻分类算法预测准确度和泛化能力的影响

Posted 2020-12-03 yszd

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了查看neighbors大小对K近邻分类算法预测准确度和泛化能力的影响相关的知识，希望对你有一定的参考价值。

代码：

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Thu Jul 12 09:36:49 2018
 4 
 5 @author: zhen
 6 """
 7 """
 8     分析n_neighbors的大小对K近邻算法预测精度和泛化能力的影响
 9 """
10 from sklearn.datasets import load_breast_cancer
11 
12 from sklearn.model_selection import train_test_split
13 
14 from sklearn.neighbors import KNeighborsClassifier
15 
16 import matplotlib.pyplot as plt
17 
18 cancer = load_breast_cancer()
19 
20 x_train, x_test, y_train, y_test = train_test_split(
21         cancer.data, cancer.target, stratify=cancer.target, random_state=66)
22 
23 training_accuracy = []
24 
25 test_accuracy = []
26 
27 # n_neighbors取值从1~10
28 neighbors_settings = range(1, 11)
29 
30 for n_neighbors in neighbors_settings:
31     # 构建模型
32     clf = KNeighborsClassifier(n_neighbors=n_neighbors)
33     clf.fit(x_train, y_train)
34     # 记录训练集精度S
35     training_accuracy.append(clf.score(x_train, y_train))
36     # 记录泛化能力
37     test_accuracy.append(clf.score(x_test, y_test))
38     
39 plt.plot(neighbors_settings, training_accuracy, label="training accuracy")
40 plt.plot(neighbors_settings, test_accuracy, label="test accuracy")
41 
42 plt.xlabel("n_neighbors")
43 plt.ylabel("Accuracy")
44 
45 plt.legend()

结果：

技术分享图片

总结：在仅考虑单一近邻时，训练集上的预测结果十分完美（接近100%）。但随着邻居个数的增多，模型变得更简单（泛化能力越好），训练集精度也随之下降。为求得较好的预测精度和泛化能力，最佳性能在neighbors为6左右！

以上是关于查看neighbors大小对K近邻分类算法预测准确度和泛化能力的影响的主要内容，如果未能解决你的问题，请参考以下文章