执行朴素贝叶斯和决策树分类时出现 KeyError

Posted

技术标签:

【中文标题】执行朴素贝叶斯和决策树分类时出现 KeyError【英文标题】:KeyError when performing naive bayes and decision trees classification 【发布时间】:2021-10-31 15:20:01 【问题描述】:

我想使用朴素贝叶斯和决策树对 iris 数据集进行分类。我收到了一个 keyerror,我不明白也无法解决。

from sklearn import datasets, naive_bayes, tree, metrics
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import warnings
import random

# Get raw data and labels from the iris dataset
labelled_data = list(zip(iris_df, iris.target))

def sample_data(training_frac=0.5, iris_data=iris_df, iris_labels=iris.target):
    # separate data into training and testing sets
    training_size = int(training_frac * len(iris_data))
    
    training_idx = random.sample(range(0, len(iris_data)), k=training_size)
    testing_idx = [idx for idx in range(0, len(iris_data)) if idx not in training_idx]
    
    assert(len(training_idx) + len(testing_idx) == len(iris_data))
    
    training_set = [iris_data[idx] for idx in training_idx]
    training_labels = [iris_labels[idx] for idx in training_idx]

    testing_set = [iris_data[idx] for idx in testing_idx]
    testing_labels = [iris_labels[idx] for idx in testing_idx]
    
    return (training_set, training_labels), (testing_set, testing_labels)
# run the designated classifier
def run_classifier(classifier, training, testing):
    classifier.fit(*training)

    expect = testing[1]
    predict = classifier.predict(testing[0])
    
    return expect, predict

# collect data on training size plateau
def simulate():
    # progress through range of testing data sizes
    nb_acc = []
    tree_acc = []
    training_fracs = [x/1000 for x in range(500, 850, 25)]

    for i in training_fracs:
        nb = naive_bayes.CategoricalNB()
        dt = tree.DecisionTreeClassifier()
        training, testing = sample_data(i)
        
        nb_expect, nb_predict = run_classifier(nb, training, testing)
        dt_expect, dt_predict = run_classifier(dt, training, testing)
        
        nb_acc.append(metrics.accuracy_score(nb_expect, nb_predict))
        tree_acc.append(metrics.accuracy_score(dt_expect, dt_predict))
        
    return nb_acc, tree_acc, training_fracs
        
nb_acc, tree_acc, fracs = simulate()
    
print(f"Naive Bayes accuracy @ 50% training: nb_acc[0]")
print(f"Decision Tree accuracy @ 50% training: tree_acc[0]")

----------------------------------- ---------------------------- KeyError Traceback(最近一次调用 最后)~\anaconda3\lib\site-packages\pandas\core\indexes\base.py 在 get_loc(self, key, method, tolerance) 3079 尝试: -> 3080 return self._engine.get_loc(casted_key) 3081 除了 KeyError as err:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi 在 pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas_libs\hashtable_class_helper.pxi 在 pandas._libs.hashtable.PyObjectHashTable.get_item()

密钥错误:18

上述异常是以下异常的直接原因:

KeyError Traceback(最近调用 最后)在 28 返回 nb_acc、tree_acc、training_fracs 29 ---> 30 nb_acc,tree_acc,fracs = 模拟() 31 32 print(f"朴素贝叶斯准确率@50% 训练:nb_acc[0]")

在模拟() 18 nb = naive_bayes.CategoricalNB() 19 dt = 树.DecisionTreeClassifier() ---> 20 次训练,测试 = sample_data(i) 21 22 nb_expect, nb_predict = run_classifier(nb, training, testing)

在 sample_data(training_frac, iris_data, iris_labels) 11 断言(len(training_idx)+ len(testing_idx)== len(iris_data)) 12 ---> 13 training_set = [iris_data[idx] for idx in training_idx] 14 training_labels = [iris_labels[idx] for idx in training_idx] 15

在 (.0) 11 断言(len(training_idx)+ len(testing_idx)== len(iris_data)) 12 ---> 13 training_set = [iris_data[idx] for idx in training_idx] 14 training_labels = [iris_labels[idx] for idx in training_idx] 15

~\anaconda3\lib\site-packages\pandas\core\frame.py 在 getitem(self, key) 3022 if self.columns.nlevels > 1: 3023 return self._getitem_multilevel(key) -> 3024 indexer = self.columns.get_loc(key) 3025 if is_integer(indexer): 3026 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py 在 get_loc(self, key, method, tolerance) 3080 返回 self._engine.get_loc(casted_key) 3081 除了 KeyError 作为错误: -> 3082 如果容差不是无,则从 err 3083 3084 引发 KeyError(key):

密钥错误:18

【问题讨论】:

你在哪里导入虹膜数据集?它不在这里,也许你省略了初始化 irisiris_df 的那部分。 【参考方案1】:

您在sample_data 中的断言失败,因此您得到KeyError, 对于火车测试拆分,只需使用:

from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)

【讨论】:

以上是关于执行朴素贝叶斯和决策树分类时出现 KeyError的主要内容,如果未能解决你的问题,请参考以下文章

第3章 机器学习的典型应用 3-3 典型应用-朴素贝叶斯和决策树

朴素贝叶斯

贝叶斯分类器(3)朴素贝叶斯分类器

朴素贝叶斯分类器原理

朴素贝叶斯和 SVM 分类 - 如何在 x y 轴上绘制精度?

机器学习--模型分类--贝叶斯