执行朴素贝叶斯和决策树分类时出现 KeyError
Posted
技术标签:
【中文标题】执行朴素贝叶斯和决策树分类时出现 KeyError【英文标题】:KeyError when performing naive bayes and decision trees classification 【发布时间】:2021-10-31 15:20:01 【问题描述】:我想使用朴素贝叶斯和决策树对 iris 数据集进行分类。我收到了一个 keyerror
,我不明白也无法解决。
from sklearn import datasets, naive_bayes, tree, metrics
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import warnings
import random
# Get raw data and labels from the iris dataset
labelled_data = list(zip(iris_df, iris.target))
def sample_data(training_frac=0.5, iris_data=iris_df, iris_labels=iris.target):
# separate data into training and testing sets
training_size = int(training_frac * len(iris_data))
training_idx = random.sample(range(0, len(iris_data)), k=training_size)
testing_idx = [idx for idx in range(0, len(iris_data)) if idx not in training_idx]
assert(len(training_idx) + len(testing_idx) == len(iris_data))
training_set = [iris_data[idx] for idx in training_idx]
training_labels = [iris_labels[idx] for idx in training_idx]
testing_set = [iris_data[idx] for idx in testing_idx]
testing_labels = [iris_labels[idx] for idx in testing_idx]
return (training_set, training_labels), (testing_set, testing_labels)
# run the designated classifier
def run_classifier(classifier, training, testing):
classifier.fit(*training)
expect = testing[1]
predict = classifier.predict(testing[0])
return expect, predict
# collect data on training size plateau
def simulate():
# progress through range of testing data sizes
nb_acc = []
tree_acc = []
training_fracs = [x/1000 for x in range(500, 850, 25)]
for i in training_fracs:
nb = naive_bayes.CategoricalNB()
dt = tree.DecisionTreeClassifier()
training, testing = sample_data(i)
nb_expect, nb_predict = run_classifier(nb, training, testing)
dt_expect, dt_predict = run_classifier(dt, training, testing)
nb_acc.append(metrics.accuracy_score(nb_expect, nb_predict))
tree_acc.append(metrics.accuracy_score(dt_expect, dt_predict))
return nb_acc, tree_acc, training_fracs
nb_acc, tree_acc, fracs = simulate()
print(f"Naive Bayes accuracy @ 50% training: nb_acc[0]")
print(f"Decision Tree accuracy @ 50% training: tree_acc[0]")
----------------------------------- ---------------------------- KeyError Traceback(最近一次调用 最后)~\anaconda3\lib\site-packages\pandas\core\indexes\base.py 在 get_loc(self, key, method, tolerance) 3079 尝试: -> 3080 return self._engine.get_loc(casted_key) 3081 除了 KeyError as err:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi 在 pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi 在 pandas._libs.hashtable.PyObjectHashTable.get_item()
密钥错误:18
上述异常是以下异常的直接原因:
KeyError Traceback(最近调用 最后)在 28 返回 nb_acc、tree_acc、training_fracs 29 ---> 30 nb_acc,tree_acc,fracs = 模拟() 31 32 print(f"朴素贝叶斯准确率@50% 训练:nb_acc[0]")
在模拟() 18 nb = naive_bayes.CategoricalNB() 19 dt = 树.DecisionTreeClassifier() ---> 20 次训练,测试 = sample_data(i) 21 22 nb_expect, nb_predict = run_classifier(nb, training, testing)
在 sample_data(training_frac, iris_data, iris_labels) 11 断言(len(training_idx)+ len(testing_idx)== len(iris_data)) 12 ---> 13 training_set = [iris_data[idx] for idx in training_idx] 14 training_labels = [iris_labels[idx] for idx in training_idx] 15
在 (.0) 11 断言(len(training_idx)+ len(testing_idx)== len(iris_data)) 12 ---> 13 training_set = [iris_data[idx] for idx in training_idx] 14 training_labels = [iris_labels[idx] for idx in training_idx] 15
~\anaconda3\lib\site-packages\pandas\core\frame.py 在 getitem(self, key) 3022 if self.columns.nlevels > 1: 3023 return self._getitem_multilevel(key) -> 3024 indexer = self.columns.get_loc(key) 3025 if is_integer(indexer): 3026 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py 在 get_loc(self, key, method, tolerance) 3080 返回 self._engine.get_loc(casted_key) 3081 除了 KeyError 作为错误: -> 3082 如果容差不是无,则从 err 3083 3084 引发 KeyError(key):
密钥错误:18
【问题讨论】:
你在哪里导入虹膜数据集?它不在这里,也许你省略了初始化iris
和 iris_df
的那部分。
【参考方案1】:
您在sample_data
中的断言失败,因此您得到KeyError
,
对于火车测试拆分,只需使用:
from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.33, random_state=42)
【讨论】:
以上是关于执行朴素贝叶斯和决策树分类时出现 KeyError的主要内容,如果未能解决你的问题,请参考以下文章
第3章 机器学习的典型应用 3-3 典型应用-朴素贝叶斯和决策树