使用 Seaborn FacetGrid 从数据框中绘制误差线

Posted 2023-03-12

技术标签:

【中文标题】使用 Seaborn FacetGrid 从数据框中绘制误差线【英文标题】：Plotting errors bars from dataframe using Seaborn FacetGrid 【发布时间】：2014-09-12 17:24:52 【问题描述】：

我想在 Seaborn FacetGrid 上的 pandas 数据框中的列中绘制误差线

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame('A' : ['foo', 'bar', 'foo', 'bar']*2,
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                  'C' : np.random.randn(8),
                  'D' : np.random.randn(8))
df

示例数据框

    A       B        C           D
0   foo     one      0.445827   -0.311863
1   bar     one      0.862154   -0.229065
2   foo     two      0.290981   -0.835301
3   bar     three    0.995732    0.356807
4   foo     two      0.029311    0.631812
5   bar     two      0.023164   -0.468248
6   foo     one     -1.568248    2.508461
7   bar     three   -0.407807    0.319404

此代码适用于固定大小的误差线：

g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D",yerr=0.5, fmt='o');

但我无法使用数据框中的值使其工作

df['E'] = abs(df['D']*0.5)
g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D", yerr=df['E']);

或

g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D", yerr='E');

两者都会产生大量错误

编辑：

在大量阅读 matplotlib 文档和各种 *** 答案后，这是一个纯 matplotlib 解决方案

#define a color palette index based on column 'B'
df['cind'] = pd.Categorical(df['B']).labels

#how many categories in column 'A'
cats = df['A'].unique()
cats.sort()

#get the seaborn colour palette and convert to array
cp = sns.color_palette()
cpa = np.array(cp)

#draw a subplot for each category in column "A"
fig, axs = plt.subplots(nrows=1, ncols=len(cats), sharey=True)
for i,ax in enumerate(axs):
    df_sub = df[df['A'] == cats[i]]
    col = cpa[df_sub['cind']]
    ax.scatter(df_sub['C'], df_sub['D'], c=col)
    eb = ax.errorbar(df_sub['C'], df_sub['D'], yerr=df_sub['E'], fmt=None)
    a, (b, c), (d,) = eb.lines
    d.set_color(col)

除了标签，还有轴限制它就OK了。它为“A”列中的每个类别绘制了一个单独的子图，由“B”列中的类别着色。（注意随机数据与上述不同）

如果有人有任何想法，我仍然想要 pandas/seaborn 解决方案？

【问题讨论】：

【参考方案1】：

使用FacetGrid.map 时，任何引用data DataFrame 的内容都必须作为位置参数传递。这将适用于您的情况，因为 yerr 是 plt.errorbar 的第三个位置参数，但为了证明我将使用提示数据集：

from scipy import stats
tips_all = sns.load_dataset("tips")
tips_grouped = tips_all.groupby(["smoker", "size"])
tips = tips_grouped.mean()
tips["CI"] = tips_grouped.total_bill.apply(stats.sem) * 1.96
tips.reset_index(inplace=True)

然后我可以使用FacetGrid 和errorbar 进行绘图：

g = sns.FacetGrid(tips, col="smoker", size=5)
g.map(plt.errorbar, "size", "total_bill", "CI", marker="o")

但是，请记住，有 seaborn 绘图函数可用于从完整数据集到带有误差线的绘图（使用引导），因此对于许多应用程序而言，这可能不是必需的。例如，您可以使用factorplot:

sns.factorplot("size", "total_bill", col="smoker",
               data=tips_all, kind="point")

或lmplot:

sns.lmplot("size", "total_bill", col="smoker",
           data=tips_all, fit_reg=False, x_estimator=np.mean)

【讨论】：

位置参数位是关键。在测量上下文中，A 类不确定性（统计）很容易在 factorplot、lmplot 中评估，尽管必须深入研究 api 文档以准确检查正在绘制的数据分布的度量以及其计算方式（68% 置信限通过引导程序？）。如果这在文档中更前期会很好。我需要绘制我可以做的 B 型不确定性，如此处所示。谢谢默认 CI 为 95%（您可以在函数签名中看到），但它们都采用 ci 关键字参数，如果您想要标准错误，您可以将其设置为 68%。 @mwaskom 有非对称误差线的解决方案吗？想象一下，我有两列数据框给出 CI 最小值/最大值。有没有办法通过g.map 将其传递给plt.errorbar？您应该能够编写一个包装函数，该函数接受向量(x, y, err_lower, err_upper) 并正确调用plt.errorbar。【参考方案2】：

您没有显示df['E'] 的实际含义，以及它是否是与df['C'] 和df['D'] 长度相同的列表。

yerr 关键字参数 (kwarg) 采用单个值，该值将应用于数据帧中键 C 和 D 的列表中的每个元素，或者它需要与这些列表长度相同的值列表。

因此，C、D 和 E 必须都与相同长度的列表相关联，或者 C 和 D 必须是相同长度的列表，而 E 必须与单个 float 或 int 相关联。如果单个 float 或 int 在列表中，则必须提取它，例如 df['E'][0]。

matplotlib 代码与yerr 示例： http://matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html

描述yerr的条形图API文档： http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar

【讨论】：

df['E'] = abs(df['D']*0.5)，在第 4 个代码块的第一行。我认为问题在于 seaborn 的 map 函数将整个 df['E'] 列表传递给 matplotlib 的 errorbar 函数，而不仅仅是适用于该子图的部分。

以上是关于使用 Seaborn FacetGrid 从数据框中绘制误差线的主要内容，如果未能解决你的问题，请参考以下文章