使用 scikit wine 数据集制作 sns.pairplot

Posted

技术标签:

【中文标题】使用 scikit wine 数据集制作 sns.pairplot【英文标题】:Making a sns.pairplot using scikit wine dataset 【发布时间】:2021-11-06 12:49:44 【问题描述】:

这看起来很简单,但是我在网上找不到解决方案。

我正在尝试在 Python 中创建一个 sns.pairplot。我已经下载了 wine 数据集,保留了我需要的功能,并运行了绘图。

%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import load_wine

# Load the wine dataset
wine = datasets.load_wine()
wine = list(zip(wine.data, wine.target))

wine = load_wine()
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
wine = load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)

#This is the code that should run the plot
b=sns.pairplot(df, vars = df.columns[1 :], hue = "target", height = 2.5)

但我收到此错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'target'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-108-1107acc27949> in <module>
----> 1 b=sns.pairplot(df, vars = df.columns[1 :], hue = "target", height = 2.5)
      2 
      3 plt.show()

~\anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update(k: arg for k, arg in zip(sig.parameters, args))
---> 46         return f(**kwargs)
     47     return inner_f
     48 

~\anaconda3\lib\site-packages\seaborn\axisgrid.py in pairplot(data, hue, hue_order, palette, vars, x_vars, y_vars, kind, diag_kind, markers, height, aspect, corner, dropna, plot_kws, diag_kws, grid_kws, size)
   1923     # Set up the PairGrid
   1924     grid_kws.setdefault("diag_sharey", diag_kind == "hist")
-> 1925     grid = PairGrid(data, vars=vars, x_vars=x_vars, y_vars=y_vars, hue=hue,
   1926                     hue_order=hue_order, palette=palette, corner=corner,
   1927                     height=height, aspect=aspect, dropna=dropna, **grid_kws)

~\anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update(k: arg for k, arg in zip(sig.parameters, args))
---> 46         return f(**kwargs)
     47     return inner_f
     48 

~\anaconda3\lib\site-packages\seaborn\axisgrid.py in __init__(self, data, hue, hue_order, palette, hue_kws, vars, x_vars, y_vars, corner, diag_sharey, height, aspect, layout_pad, despine, dropna, size)
   1212                                       index=data.index)
   1213         else:
-> 1214             hue_names = categorical_order(data[hue], hue_order)
   1215             if dropna:
   1216                 # Filter NA from the list of unique hue names

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'target'

链接到这个问题的解决方案:How to convert a Scikit-learn dataset to a Pandas dataset 不幸的是在这里似乎不起作用。

我还尝试了“类”而不是目标。会不会是上面的'zip'功能不正常,导致程序无法识别'target'?

提前谢谢你!

【问题讨论】:

【参考方案1】:

根据您输入的内容,它的工作原理是这样的。

from sklearn.datasets import load_iris
wine = load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)

#This is the code that should run the plot
b=sns.pairplot(df, vars = df.columns[1 :], height = 2.5)

问题是您想如何突出功能,为什么? 您从列表中删除酒精,因此目标根本不会对齐。 第二件事是它是功能明智的配对图而不是目标/类。 所以总而言之,我不明白你在这里要做什么

【讨论】:

以上是关于使用 scikit wine 数据集制作 sns.pairplot的主要内容,如果未能解决你的问题,请参考以下文章

scikit-learn:FeatureUnion 包含手工制作的功能

PCA详解-并用scikit-learn实现PCA压缩红酒数据集

get_feature_names() 不适用于在 scikit learn 中使用 CountVectorizer() 制作的稀疏矩阵

在 scikit-learn 中保存新数据的特征向量

有没有办法在 Python 中为具有多个分类的随机森林制作部分依赖图(使用 scikit-learn)?

将 PCA 按一组特征制作到 Scikit-Learn Pipeline 而不是整个特征