使用 scikit wine 数据集制作 sns.pairplot
Posted
技术标签:
【中文标题】使用 scikit wine 数据集制作 sns.pairplot【英文标题】:Making a sns.pairplot using scikit wine dataset 【发布时间】:2021-11-06 12:49:44 【问题描述】:这看起来很简单,但是我在网上找不到解决方案。
我正在尝试在 Python 中创建一个 sns.pairplot。我已经下载了 wine 数据集,保留了我需要的功能,并运行了绘图。
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_wine
# Load the wine dataset
wine = datasets.load_wine()
wine = list(zip(wine.data, wine.target))
wine = load_wine()
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
wine = load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
#This is the code that should run the plot
b=sns.pairplot(df, vars = df.columns[1 :], hue = "target", height = 2.5)
但我收到此错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'target'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-108-1107acc27949> in <module>
----> 1 b=sns.pairplot(df, vars = df.columns[1 :], hue = "target", height = 2.5)
2
3 plt.show()
~\anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update(k: arg for k, arg in zip(sig.parameters, args))
---> 46 return f(**kwargs)
47 return inner_f
48
~\anaconda3\lib\site-packages\seaborn\axisgrid.py in pairplot(data, hue, hue_order, palette, vars, x_vars, y_vars, kind, diag_kind, markers, height, aspect, corner, dropna, plot_kws, diag_kws, grid_kws, size)
1923 # Set up the PairGrid
1924 grid_kws.setdefault("diag_sharey", diag_kind == "hist")
-> 1925 grid = PairGrid(data, vars=vars, x_vars=x_vars, y_vars=y_vars, hue=hue,
1926 hue_order=hue_order, palette=palette, corner=corner,
1927 height=height, aspect=aspect, dropna=dropna, **grid_kws)
~\anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update(k: arg for k, arg in zip(sig.parameters, args))
---> 46 return f(**kwargs)
47 return inner_f
48
~\anaconda3\lib\site-packages\seaborn\axisgrid.py in __init__(self, data, hue, hue_order, palette, hue_kws, vars, x_vars, y_vars, corner, diag_sharey, height, aspect, layout_pad, despine, dropna, size)
1212 index=data.index)
1213 else:
-> 1214 hue_names = categorical_order(data[hue], hue_order)
1215 if dropna:
1216 # Filter NA from the list of unique hue names
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'target'
链接到这个问题的解决方案:How to convert a Scikit-learn dataset to a Pandas dataset 不幸的是在这里似乎不起作用。
我还尝试了“类”而不是目标。会不会是上面的'zip'功能不正常,导致程序无法识别'target'?
提前谢谢你!
【问题讨论】:
【参考方案1】:根据您输入的内容,它的工作原理是这样的。
from sklearn.datasets import load_iris
wine = load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
#This is the code that should run the plot
b=sns.pairplot(df, vars = df.columns[1 :], height = 2.5)
问题是您想如何突出功能,为什么? 您从列表中删除酒精,因此目标根本不会对齐。 第二件事是它是功能明智的配对图而不是目标/类。 所以总而言之,我不明白你在这里要做什么
【讨论】:
以上是关于使用 scikit wine 数据集制作 sns.pairplot的主要内容,如果未能解决你的问题,请参考以下文章
scikit-learn:FeatureUnion 包含手工制作的功能
PCA详解-并用scikit-learn实现PCA压缩红酒数据集
get_feature_names() 不适用于在 scikit learn 中使用 CountVectorizer() 制作的稀疏矩阵