如何在列的分组条形图上添加误差线

Posted 2023-03-12

技术标签:

【中文标题】如何在列的分组条形图上添加误差线【英文标题】：How to add error bars on a grouped barplot from a column 【发布时间】：2017-06-20 09:35:58 【问题描述】：

我有一个 pandas 数据框 df，它有四列：Candidate、Sample_Set、Values 和 Error。例如，Candidate 列有三个唯一条目：[X, Y, Z]，我们有三个样本集，因此Sample_Set 也有三个唯一值：[1,2,3]。 df 大致如下所示。

import pandas as pd

data = 'Candidate': ['X', 'Y', 'Z', 'X', 'Y', 'Z', 'X', 'Y', 'Z'],
        'Sample_Set': [1, 1, 1, 2, 2, 2, 3, 3, 3],
        'Values': [20, 10, 10, 200, 101, 99, 1999, 998, 1003],
        'Error': [5, 2, 3, 30, 30, 30, 10, 10, 10]
df = pd.DataFrame(data)

# display(df)
  Candidate  Sample_Set  Values  Error
0         X           1      20      5
1         Y           1      10      2
2         Z           1      10      3
3         X           2     200     30
4         Y           2     101     30
5         Z           2      99     30
6         X           3    1999     10
7         Y           3     998     10
8         Z           3    1003     10

我正在使用seaborn 使用x="Candidate"、y="Values"、hue="Sample_Set" 创建分组条形图。一切都很好，直到我尝试使用名为 Error 的列下的值沿 y 轴添加一个误差线。我正在使用以下代码。

import seaborn as sns

ax = sns.factorplot(x="Candidate", y="Values", hue="Sample_Set", data=df,
                    size=8, kind="bar")

如何合并错误？

我希望有解决方案或更优雅的方法来完成这项任务。

【问题讨论】：

seaborn 通常是matplotlib 的扩展，所以无论您在seaborn 中无法实现什么，您都可以使用前者的工具入侵您的输出 (ax) .你有一个直观的例子来说明“错误栏”的含义吗？你的意思是bar plot 吗？谢谢@ResMar。是的，您引用的条形图中的黑色垂直线。我正在使用 matplotlib 功能。我只需要提取结果分组条形图的数字 x 和 y 值。否则有人可能会给我一个答案让我感到惊讶，但我认为没有“优雅”的方式来做这件事。 seaborn 通过聚合许多观察结果在 barplot 中生成这些误差线，而您的数据已经预先聚合。 【参考方案1】： seaborn 绘图在聚合数据时会生成误差线，但此数据已聚合并具有指定的误差列。最简单的解决方案是使用pandas 与pandas.DataFrame.plot 和kind='bar' 一起创建bar-chart matplotlib 默认用作绘图后端，绘图 API 有一个 yerr 参数，它接受以下内容：作为DataFrame 或dict 的错误，列名与绘图DataFrame 的columns 属性匹配或与Series 的name 属性匹配。作为str 指示绘图DataFrame 的哪些列包含错误值。作为原始值（list、tuple 或 np.ndarray）。必须与绘图的长度相同DataFrame/Series。这可以通过使用pandas.DataFrame.pivot 将数据帧从长格式重新调整为宽格式来实现见pandas User Guide: Plotting with error bars 在python 3.8.12、pandas 1.3.4、matplotlib 3.4.3 测试

# reshape the dataframe into a wide format for Values
vals = df.pivot(index='Candidate', columns='Sample_Set', values='Values')

# display(vals)
Sample_Set   1    2     3
Candidate                
X           20  200  1999
Y           10  101   998
Z           10   99  1003

# reshape the dataframe into a wide format for Errors
yerr = df.pivot(index='Candidate', columns='Sample_Set', values='Error')

# display(yerr)
Sample_Set  1   2   3
Candidate            
X           5  30  10
Y           2  30  10
Z           3  30  10

# plot vals with yerr
ax = vals.plot(kind='bar', yerr=yerr, logy=True, rot=0, figsize=(6, 5))
_ = ax.legend(title='Sample Set', bbox_to_anchor=(1, 1.02), loc='upper left')

【讨论】：

【参考方案2】：

我建议从patches 属性中提取位置坐标，然后绘制误差线。

ax = sns.barplot(data=df, x="Candidate", y="Values", hue="Sample_Set")
x_coords = [p.get_x() + 0.5*p.get_width() for p in ax.patches]
y_coords = [p.get_height() for p in ax.patches]
plt.errorbar(x=x_coords, y=y_coords, yerr=df["Error"], fmt="none", c= "k")

【讨论】：

【参考方案3】：

正如@ResMar 在 cmets 中指出的那样，seaborn 中似乎没有内置功能可以轻松设置单个错误栏。

如果您更关心结果而不是到达那里的方式，那么以下（不是那么优雅）的解决方案可能会有所帮助，它基于 matplotlib.pyplot.bar。 seaborn 导入只是用来获取相同的样式。

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

def grouped_barplot(df, cat,subcat, val , err):
    u = df[cat].unique()
    x = np.arange(len(u))
    subx = df[subcat].unique()
    offsets = (np.arange(len(subx))-np.arange(len(subx)).mean())/(len(subx)+1.)
    width= np.diff(offsets).mean()
    for i,gr in enumerate(subx):
        dfg = df[df[subcat] == gr]
        plt.bar(x+offsets[i], dfg[val].values, width=width, 
                label=" ".format(subcat, gr), yerr=dfg[err].values)
    plt.xlabel(cat)
    plt.ylabel(val)
    plt.xticks(x, u)
    plt.legend()
    plt.show()


cat = "Candidate"
subcat = "Sample_Set"
val = "Values"
err = "Error"

# call the function with df from the question
grouped_barplot(df, cat, subcat, val, err )

请注意，通过简单地反转类别和子类别

cat = "Sample_Set"
subcat = "Candidate"

你可以得到不同的分组：

【讨论】：

【参考方案4】：

您可以使用 pandas 绘图功能接近您需要的内容：see this answer

bars = data.groupby("Candidate").plot(kind='bar',x="Sample_Set", y= "Values", yerr=data['Error'])

这并不完全符合您的要求，但非常接近。不幸的是，用于 python 的 ggplot2 目前无法正确呈现错误栏。就个人而言，在这种情况下，我会求助于R ggplot2：

data <- read.csv("~/repos/tmp/test.csv")
data
library(ggplot2)
ggplot(data, aes(x=Candidate, y=Values, fill=factor(Sample_Set))) + 
  geom_bar(position=position_dodge(), stat="identity") +
  geom_errorbar(aes(ymin=Values-Error, ymax=Values+Error), width=.1, position=position_dodge(.9))

【讨论】：

以上是关于如何在列的分组条形图上添加误差线的主要内容，如果未能解决你的问题，请参考以下文章