使用 KNearestNeighbors 绘制 Iris 数据集的决策边界的问题

Posted 2023-03-12

技术标签:

【中文标题】使用 KNearestNeighbors 绘制 Iris 数据集的决策边界的问题【英文标题】：Issues with plotting the decision boundaries for the Iris Dataset with KNearestNeighbors 【发布时间】：2020-09-25 01:54:48 【问题描述】：

我正在尝试为 Iris 数据集的 Scikit-learn 中的 KNeighborsClassifier 绘制决策边界。但是，我得到的图表对我来说没有多大意义。

我希望深蓝色和浅蓝色线条之间的边界与我在图片上绘制的绿色线条的方向一致。

我用来生成它的代码可以在下面找到。它的灵感来自Plot the decision boundaries of a VotingClassifier。

我遗漏了什么或不理解什么？

# -*- coding: utf-8 -*-
"""
Created on Sat May 30 14:22:05 2020

@author: KamKam

Plotting the decision boundaries for KNearestNeighbours.
"""
# Import required modules.
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
from matplotlib.colors import ListedColormap

n_neighbors = [1, 3, 9]

# Load the iris dataset.
iris = datasets.load_iris()
X = iris.data[:, 2:4] # Slice features to only contain 
y = iris.target


# Set up the data such that it can be inserting into one plot.
# Count the number of each target that are in the dataset.
ylen = y.shape[0]
unique, counts = np.unique(y, return_counts=True)

# Create empty arrays for each of the targets. We only require them to have 2
# features because we are only plotting in 2D.
X0 = np.zeros((counts[0], 2))
X1 = np.zeros((counts[1], 2))
X2 = np.zeros((counts[2], 2))

countX0, countX1, countX2 = 0, 0, 0 #Initialize place holder for interating
# though and adding data to the X arrays.
# Insert data into to newly created arrays.
for i in range(ylen):
    if y[i] == 0:
        X0[countX0, :] = X[i, :]
        countX0 += 1
    elif y[i] == 1:
        X1[countX1, :] = X[i, :]
        countX1 += 1
    else:
        X2[countX2, :] = X[i, :]
        countX2 += 1

h = 0.02 # Step size of the mesh.
plotCount = 0 # Counter for each of the plots that we will be creating.

# Create colour maps.
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ListedColormap(['darkorange', 'c', 'darkblue'])

# Initialize plotting. Close all the currently open plots, initialize the 
# figure and subplot commands
plt.close('all')
fig, axs = plt.subplots(1, 3)
axs = axs.ravel()

for j in n_neighbors:
    # Create the instance od Neighbours classifier and fit the data.
    knn = KNeighborsClassifier(n_neighbors=j)
    knn.fit(X, y)

    # Plot the decision boundary. For that, we will assign a color for each
    # point in the mesh [x_min, x_max]x[y_min, y_max]
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), 
                         np.arange(y_min, y_max, h))
    Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    axs[plotCount].pcolormesh(xx, yy, Z, cmap=cmap_bold)

    # Plot the training points.
    axs[plotCount].scatter(X0[:,0], X0[:,1], c='k', marker='o', 
                           label=iris.target_names[0])
    axs[plotCount].scatter(X1[:,0], X1[:,1], c='r', marker='o', 
                           label=iris.target_names[1])
    axs[plotCount].scatter(X1[:,0], X2[:,1], c='y', marker='o', 
                           label=iris.target_names[2])
    axs[plotCount].set_xlabel('Petal Width')
    axs[plotCount].set_ylabel('Petal Length')
    axs[plotCount].legend()
    axs[plotCount].set_title('n_neighbours = ' + str(j))
    plotCount += 1

fig.suptitle('Petal Width vs Length')
plt.show()

【问题讨论】：

【参考方案1】：

数组 X0、X1 和 X2 的引入似乎使事情变得过于复杂，并且很难将代码变成 Pythonic。

在 Python 中应该避免的一些事情：

多余的变量plotCount 仅用于遍历轴，可以省略并替换为for j, ax in zip(n_neighbors, axs)。 X0、X1 和 ``X2can be obtained directly viaX[:, 0][y == y_val], X[:, 1][y == y_val]` 的内容还允许在一个循环中轻松编写散点图。您可以在this doc 中阅读有关 numpy 高级索引的更多信息。

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
from matplotlib.colors import ListedColormap

n_neighbors = [1, 3, 9]

# Load the iris dataset.
iris = datasets.load_iris()
X = iris.data[:, 2:4]  # Slice features to only contain
y = iris.target

# Set up the data such that it can be inserting into one plot.
# Count the number of each target that are in the dataset.
ylen = y.shape[0]
unique, counts = np.unique(y, return_counts=True)

h = 0.02  # Step size of the mesh.

# Create colour maps.
#cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ListedColormap(['darkorange', 'c', 'darkblue'])

# Initialize plotting. Close all the currently open plots, initialize the
# figure and subplot commands
plt.close('all')
fig, axs = plt.subplots(1, 3)
axs = axs.ravel()

for j, ax in zip(n_neighbors, axs):
    # Create the instance od Neighbours classifier and fit the data.
    knn = KNeighborsClassifier(n_neighbors=j)
    knn.fit(X, y)

    # Plot the decision boundary. For that, we will assign a color for each
    # point in the mesh [x_min, x_max]x[y_min, y_max]
    x_min, x_max = X[:, 0].min() - h, X[:, 0].max() + h
    y_min, y_max = X[:, 1].min() - h, X[:, 1].max() + h
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    ax.pcolormesh(xx, yy, Z, cmap=cmap_bold)

    # Plot the training points.
    for y_val, (color, name) in enumerate(zip(['k', 'r', 'y'], iris.target_names)):
        ax.scatter(X[:, 0][y == y_val], X[:, 1][y == y_val], c=color, marker='o', label=name)

    ax.set_xlabel('Petal Width')
    ax.set_ylabel('Petal Length')
    ax.legend()
    ax.set_title(f'n_neighbours = j')

fig.suptitle('Petal Width vs Length')
plt.show()

【讨论】：

谢谢@JohanC。看起来我最终为错误数据绘制了决策边界。我会阅读你分享的文档。您做出如此微小的更改使代码变得如此简洁的方式非常棒。万事如意！

以上是关于使用 KNearestNeighbors 绘制 Iris 数据集的决策边界的问题的主要内容，如果未能解决你的问题，请参考以下文章