用均值条形图及其标准差 ggplot2 总结数据框

Posted 2023-03-24

技术标签:

【中文标题】用均值条形图及其标准差 ggplot2 总结数据框【英文标题】：Summarising data frame with barplot of means and their standard deviation, ggplot2 【发布时间】：2021-04-13 05:37:43 【问题描述】：

我有一个 8 列 152 行的数据集。我的目标是使用每列均值的 ggplot2 及其标准差（这些差异很大）创建一个条形图。我可以轻松创建散点图，但 barplot 带有多个错误消息，包括：

barplot.default(GRPA) 中的错误：“高度”必须是向量或矩阵任何建议或示例代码都会很棒

部分数据示例：

structure(list(ALP.B = c(80L, 37L, 52L, 36L, 39L, 48L, 71L, 81L, 
77L, 38L, 56L, 33L, 64L, 70L, 43L, 45L, 59L, 42L, 59L, 45L), 
    ALT.B = c(13L, 15L, 10L, 13L, 18L, 8L, 12L, 13L, 18L, 13L, 
    10L, 28L, 10L, 13L, 12L, 28L, 15L, 7L, 11L, 13L), AST.B = c(14L, 
    16L, 13L, 13L, 12L, 13L, 18L, 16L, 19L, 14L, 15L, 21L, 15L, 
    13L, 12L, 16L, 23L, 12L, 14L, 12L), TBL.B = c(12.654, 6.498, 
    4.788, 6.84, 14.364, 6.156, 9.063, 10.773, 7.353, 7.182, 
    7.866, 8.721, 13.338, 7.866, 11.628, 10.089, 5.301, 9.918, 
    7.353, 7.182), ALP.M = c(87L, 37L, 55L, 35L, 37L, 50L, 74L, 
    89L, 83L, 36L, 58L, 32L, 78L, 78L, 43L, 51L, 60L, 47L, 50L, 
    51L), ALT.M = c(22L, 25L, 10L, 11L, 21L, 8L, 10L, 17L, 21L, 
    16L, 13L, 27L, 14L, 18L, 13L, 41L, 14L, 8L, 13L, 14L), AST.M = c(22L, 
    23L, 13L, 12L, 15L, 13L, 15L, 13L, 22L, 17L, 18L, 27L, 16L, 
    15L, 13L, 23L, 22L, 12L, 13L, 15L), TBL.M = c(23.085, 8.037, 
    6.498, 8.037, 16.758, 5.985, 7.524, 7.866, 8.379, 7.866, 
    8.208, 13.338, 15.732, 8.208, 14.706, 15.39, 7.866, 7.353, 
    9.918, 7.866)), row.names = c(NA, 20L), class = "data.frame")

我的代码很简陋，因为我尝试了很多次：

ggplot(colMeans(GRPA), aes(x="drug", y="value")) + 
  geom_bar(stat = "identity")

【问题讨论】：

欢迎来到 SO！为了帮助我们为您提供帮助，您能否通过共享您的数据、您尝试的代码和您的包的示例来重现您的问题用过的？见how to make a minimal reproducible example。已经更新，如果还好请告诉我如果您想发布数据，只需在控制台中输入dput(NAME_OF_DATASET) 并将以structure(.... 开头的输出复制并粘贴到您的帖子中。如果您的数据集有很多观察结果，您可以对前 20 行数据执行 dput(head(NAME_OF_DATASET, 20))。此外，请发布您尝试过的代码以及导致问题的原因。对不起，这真的不起作用没有线索 (: 一切都很好。现在我们将数据作为 nice dput() ).(; 【参考方案1】：

您的代码存在几个问题。首先ggplot2 在你传递一个向量colMeans(GRPA) 时处理数据帧。此外，如果您想传递 ggplot2 变量的名称，请不要使用引号。

为了达到您想要的结果，最好使用例如将您的数据集重塑为长或整齐的格式。 tidyr::pivot_longer()。之后您可以使用dplyr 计算每种药物的平均值（和/或标准差）：

然后可以通过 ggplot2 轻松绘制此汇总数据集。

library(dplyr)
library(tidyr)
library(ggplot2)

# Reshape dataset to long format, compute means per drug using group_by + summarise
GRPA_long <- GRPA %>% 
  pivot_longer(everything(), names_to = "drug", values_to = "value") %>% 
  group_by(drug) %>% 
  summarise(mean = mean(value), sd = sd(value))
#> `summarise()` ungrouping output (override with `.groups` argument)

ggplot(GRPA_long, aes(x = drug, y = mean)) + 
  geom_bar(stat = "identity") + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd))

【讨论】：

非常感谢，现在明白了，这是救命稻草不客气。总是很高兴帮助人们开始。 (; PS：我刚刚添加了错误栏的代码。 (;

以上是关于用均值条形图及其标准差 ggplot2 总结数据框的主要内容，如果未能解决你的问题，请参考以下文章

ggplot2 并排绘制变量的均值和标准差

用ggplot2直方图中另一个连续变量的平均值填充条形颜色

ggplot2将滚动平均值的标准差添加到散点图

堆叠条形图 ggplot2 - 重新排序填充

R - ggplot2 - 限制分类数据的条形图输出

带有 ggplot2 的条形图用于基因表达