ggplot2中具有分组密度线的直方图

Posted

技术标签:

【中文标题】ggplot2中具有分组密度线的直方图【英文标题】:Histogram with grouped density lines in ggplot2 【发布时间】:2022-01-17 16:04:24 【问题描述】:

对于比我更熟悉ggplot2 的人来说,这可能是一件容易的事。我有这种类型的数据,increase_maxrole 分组,有两个层次:

df <- structure(list(role = c("Recipient", "Speaker", "Recipient", 
                           "Recipient", "Recipient", "Speaker", "Recipient", "Recipient", 
                           "Speaker", "Speaker", "Recipient", "Speaker", "Recipient", "Recipient", 
                           "Recipient", "Speaker", "Recipient", "Speaker", "Recipient", 
                           "Speaker", "Recipient", "Recipient", "Speaker", "Recipient", 
                           "Recipient", "Speaker", "Speaker", "Speaker", "Recipient", "Speaker", 
                           "Speaker", "Recipient", "Speaker", "Recipient", "Recipient", 
                           "Speaker", "Recipient", "Recipient", "Recipient", "Speaker", 
                           "Speaker", "Recipient", "Speaker", "Recipient", "Speaker", "Recipient", 
                           "Speaker", "Speaker", "Recipient", "Recipient", "Speaker", "Recipient", 
                           "Recipient", "Speaker", "Recipient", "Recipient", "Recipient", 
                           "Speaker", "Recipient", "Speaker", "Recipient", "Speaker", "Recipient", 
                           "Recipient", "Speaker", "Recipient", "Recipient", "Speaker", 
                           "Recipient", "Recipient", "Recipient", "Speaker", "Recipient", 
                           "Speaker", "Recipient", "Speaker", "Recipient", "Recipient", 
                           "Recipient", "Recipient", "Speaker", "Recipient", "Recipient", 
                           "Recipient", "Speaker", "Recipient", "Speaker", "Recipient", 
                           "Recipient", "Speaker", "Recipient", "Recipient", "Speaker", 
                           "Recipient", "Recipient", "Recipient", "Speaker", "Recipient", 
                           "Speaker", "Recipient"), increase_max = c(0.008, 0.118, NA, NA, 
                                                                     NA, 0.209, NA, 0.001, 0.111, NA, NA, NA, NA, NA, 0.007, 0.002, 
                                                                     0.006, 0.255, 0.009, NA, 0.004, 0.232, NA, 0.007, 0.004, 0.095, 
                                                                     0.09, NA, 0.002, NA, 0.05, NA, 0.02, 0.045, 0.002, NA, NA, 0.005, 
                                                                     0.012, NA, 0.037, NA, 0.066, NA, 0.019, 0.002, 0.136, NA, 0.003, 
                                                                     NA, 0.128, 0.004, 0.003, NA, NA, NA, 0.03, 0.042, NA, 0.138, 
                                                                     0.139, 0.126, 0.002, NA, 0.005, NA, 0.002, 0.01, 0.001, NA, 0.005, 
                                                                     0.003, NA, NA, 0.002, NA, 0.005, NA, NA, 0.015, 0.007, 0.021, 
                                                                     NA, NA, NA, NA, NA, 0.171, 0.02, 0.036, 0.026, 0.001, 0.033, 
                                                                     0.127, 0.339, 0.075, 0.037, 0.083, NA, 0.041)), class = c("tbl_df", 
                                                                                                                               "tbl", "data.frame"), row.names = c(NA, -100L))

我制作情节的方式至少基本上是可行的,但肯定是完全笨拙和复杂:

# variable 1:
speaker_0 <- df %>%
  filter(!is.na(increase_max)
         & role == "Speaker") %>%
  pull(increase_max)

# variable 2:
recipient_0 <- df %>%
  filter(!is.na(increase_max)
         & role == "Recipient") %>%
  pull(increase_max)

# subset both variables on certain range:
speaker <- data.frame(Max_EDA_increase = speaker_0[speaker_0 >= 0.05 & speaker_0 <= 0.5])
recipient <- data.frame(Max_EDA_increase = recipient_0[recipient_0 >= 0.05 & recipient_0 <= 0.5])

# bind together:
both <- rbind(speaker, recipient)

# plot histogram with density lines:
ggplot(both, aes(x = Max_EDA_increase)) + 
  geom_histogram(aes(y = after_stat(density)), data = speaker, fill = "red", alpha = 0.35, binwidth = 0.05) + 
  geom_line(data = speaker, color = "red", stat = "density", alpha = 0.35) +
  geom_histogram(aes(y = after_stat(density)), data = recipient, fill = "blue", alpha = 0.35, binwidth = 0.05) +
  geom_line(data = recipient, color = "blue", stat = "density", alpha = 0.35)

结果图:

我确定一定有更直接的方式来制作情节,还添加了一个图例来区分两组/两条密度线

【问题讨论】:

【参考方案1】:

我认为减少这种笨拙的方法是不要按角色拆分组合。您可以过滤一次数据,然后设置fill = rolecolour = role

library(ggplot2)

# Omitted for brevity
# df <- structure(...)

df2 <- subset(df, !is.na(increase_max) & 
                increase_max >= 0.05 & 
                increase_max <= 0.5)
ggplot(df2, aes(x = increase_max)) +
  geom_histogram(aes(y = after_stat(density), fill = role),
                 binwidth = 0.05, position = "identity",
                 alpha = 0.35) +
  geom_density(aes(colour = role)) +
  scale_colour_manual(
    aesthetics = c("fill", "colour"),
    values = c("blue", "red")
  )

由reprex package (v2.0.1) 于 2021 年 12 月 14 日创建

【讨论】:

不错!这也适用于一段代码:df %&gt;% filter( !is.na(increase_max) &amp; increase_max &gt;= 0.05 &amp; increase_max &lt;= 0.5) %&gt;% ggplot(aes(x = increase_max)) + geom_histogram(aes(y = after_stat(density), fill = role), binwidth = 0.05, position = "identity", alpha = 0.35) + geom_density(aes(colour = role)) + scale_colour_manual( aesthetics = c("fill", "colour"), values = c("blue", "red") ) 是的,但这主要是一种风格偏好,需要加载额外的库来规避一个临时变量。在 R4.1 中,您还可以使用基础管道 subset(df, ...) |&gt; ggplot(aes(...)) + ...。或者您可以在 R4.1 之前 ggplot(subset(df, ...), aes(...)) + ...

以上是关于ggplot2中具有分组密度线的直方图的主要内容,如果未能解决你的问题,请参考以下文章

R语言ggplot2可视化:可视化人口金字塔图直方图(堆叠直方图连续变量堆叠直方图离散变量堆叠直方图)密度图箱图(添加抖动数据点tufte箱图多分类变量分组箱图)小提琴图

在 ggplot2 中使用计数数据向直方图添加密度线

ggplot2 堆叠直方图 - 转换为密度图

在 ggplot2 中创建密度直方图?

使用 ggplot2 沿平滑曲线绘制直方图或密度

R语言与医学统计图形-14ggplot2几何对象之直方密度图