如何从 df 的不同列中获取闪避的 geom_bar (ggplot2)

Posted

技术标签:

【中文标题】如何从 df 的不同列中获取闪避的 geom_bar (ggplot2)【英文标题】:how to obtain a dodged geom_bar (ggplot2) from different columns of a df 【发布时间】:2021-12-05 02:02:17 【问题描述】:

显示数据框df

     ID               gene1                   gene2
    4602              TET2                    TET2
    4602              TP53                    TP53
    4602              TET2                    TET2
    5095             ASXL1                   ASXL1
    5095            DNMT3A                  DNMT3A
    5095              NPM1                    <NA>

我一直在尝试获取一个 匹配 条形图,显示列 gene1gene2 的计数(条形)。 gene1 方法是标准方法,而gene2 是另一个突变检测器,应该与gene1 进行比较。如您所见,在样本5095 中仅检测到 2 个突变,而第 3 个未复制。

如何为每个ID 制作一个带有两个条形的条形图,显示gene1gene2 中的计数?

这里是 dput()

structure(list(ID = c(4602, 4602, 4602, 5095, 5095, 5095, 5095, 
4649, 4649, 4649, 5069, 5069, 5069, 5146, 5132, 5132, 5132, 5132, 
5132, 5132, 4297, 4297, 4297, 4297, 4297, 4345, 4345, 4345, 4345, 
4345, 4356, 4356, 4356, 4356, 4385, 4385, 4385, 4385, 4385, 4385, 
4437, 4437, 4437, 4437, 4437, 4437, 4442, 4442, 4442, 4442, 4442, 
4479, 4479, 4479, 4479, 4479, 4479, 4479, 4479, 4479, 4479, 4479, 
4479, 4479, 4479, 4479, 4487, 4487, 4487, 4487, 4487, 4487, 4537, 
4537, 4537, 4537, 4537, 4537, 4621, 4621, 4621, 4621, 4621, 4621, 
4621, 4624, 4624, 4624, 4624, 4624, 4665, 4736, 4736, 4736, 4736, 
4736, 4895, 4895, 4895, 4895, 4895, 4903, 4903, 4903, 4903, 4691, 
4691, 4691, 4691, 4261, 4261, 4261, 4261, 4394, 4394, 4394, 4394, 
4424, 4424, 4424, 4424, 4943, 4943, 4943, 5073, 5169, 5169), 
    gene1 = c("TET2", "TP53", "TET2", "ASXL1", "DNMT3A", "NPM1", 
    "PTPN11", "TP53", "TP53", "TET2", "DNMT3A", "TET2", "TET2", 
    "negative", "JAK2", "ASXL1", "BRAF", "CBL", "TET2", "TET2", 
    "DNMT3A", "IDH1", "NPM1", "CREBBP", "FLT3", "DNMT3A", "FLT3", 
    "NPM1", "BCOR", "KIT", "DNMT3A", "IDH1", "NRAS", "BCOR", 
    "KRAS", "NPM1", "PTPN11", "ETV6", "PHF6", "TET2", "DNMT3A", 
    "KRAS", "NPM1", "WT1", "TET2", "WT1", "DNMT3A", "FLT3", "NPM1", 
    "NRAS", "WT1", "DNMT3A", "IDH2", "NPM1", "SRSF2", "ATRX", 
    "CUX1", "CUX1", "FLT3", "GNAS", "PHF6", "PIGA", "PIGA", "PRPF40B", 
    "PTPN11", "TET2", "IDH1", "IDH2", "RUNX1", "U2AF1", "TET2", 
    "TP53", "DNMT3A", "IDH2", "ATRX", "GATA2", "STAG2", "TP53", 
    "IDH2", "SRSF2", "ASXL1", "GATA1", "KDM6A", "STAG2", "TP53", 
    "IDH2", "JAK2", "SRSF2", "ASXL1", "RIT1", "KRAS", "NPM1", 
    "NRAS", "NRAS", "BCOR", "MYD88", "FLT3", "NPM1", "NRAS", 
    "TET2", "TET2", "DNMT3A", "IDH1", "NPM1", "CREBBP", "DNMT3A", 
    "IDH1", "IDH2", "NPM1", "FLT3", "FLT3", "GATA2", "SH2B3", 
    "FLT3", "NPM1", "KDM6A", "SMC1A", "IDH2", "SRSF2", "ASXL2", 
    "RUNX1", "IDH2", "JAK2", "NPM1", "JAK2", "SRSF2", "STAG2"
    ), gene2 = c("TET2", "TP53", "TET2", "ASXL1", "DNMT3A", NA, 
    "PTPN11", "TP53", "TP53", "TET2", "DNMT3A", NA, "TET2", "PTEN", 
    NA, NA, "BRAF", "CBL", "TET2", "TET2", "JAK2", "SRSF2", NA, 
    "DNMT3A", "IDH1", "NPM1", NA, "FLT3", "DNMT3A", "FLT3", "NPM1", 
    NA, NA, "DNMT3A", "IDH1", "NRAS", "BCOR", "KRAS", "NPM1", 
    "PTPN11", "ETV6", "PHF6", "TET2", "DNMT3A", "KRAS", "NPM1", 
    NA, "TET2", NA, "DNMT3A", "FLT3", "NPM1", "NRAS", NA, NA, 
    "IDH2", "NPM1", "SRSF2", NA, "CALR", NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, "IDH1", "IDH2", "RUNX1", "U2AF1", "TET2", 
    NA, "DNMT3A", "IDH2", NA, NA, NA, NA, "IDH2", "SRSF2", "ASXL1", 
    NA, NA, "KMT2D", "TP53", "IDH2", "JAK2", "SRSF2", "ASXL1", 
    NA, "KRAS", "NPM1", "NRAS", "NRAS", NA, NA, "FLT3", "NPM1", 
    "NRAS", "TET2", "TET2", "DNMT3A", "IDH1", "NPM1", "CREBBP", 
    "DNMT3A", "IDH1", "IDH2", "NPM1", "FLT3", "FLT3", NA, NA, 
    "FLT3", "NPM1", NA, "SMC1A", "IDH2", "SRSF2", NA, "RUNX1", 
    "IDH2", "JAK2", "NPM1")), class = "data.frame", row.names = c(NA, 
-127L))

【问题讨论】:

【参考方案1】:

还有一个:

df %>%  
  group_by(ID) %>% 
  add_count(gene1, gene2) %>% 
  pivot_longer(
    cols = contains("gene")
  ) %>% 
  ungroup() %>% 
  ggplot(aes(factor(name), n, fill=value, group=value, label=value)) + 
  geom_col() +
  facet_wrap(.~ID, scales = "free_y") +
  geom_text(size = 3, position = position_stack(vjust = 0.5)) +
  theme_classic()+
  xlab("") +
  guides(fill=FALSE)

最后一次尝试,现在应该可以了。现在我们在旋转之前进行计数:

df %>%  
  group_by(ID) %>% 
  add_count(gene1, gene2) %>% 
  pivot_longer(
    cols = contains("gene")
  ) %>% 
  ungroup() %>% 
  ggplot(aes(factor(name), n, fill=value, group=value)) + 
  geom_col() +
  facet_wrap(.~ID, scales = "free_y") +
  theme(legend.position = "bottom")+
  guides(fill=guide_legend(nrow=2))

【讨论】:

这似乎是这样,但我需要 x 轴上的 ID 请看我的更新。我们可以根据您的最终结果进行调整! 我检查了您的上次更新,但我需要两个相邻的 ID:4602 条,一个来自gene1,一个来自gene2。虽然你所做的有点传达相同的信息,但更难解释 我们基本上就在那里,你的最后一次更新就是我要找的。唯一的问题是,当应用于我的整个 df(127 行,3 列)时,它对于大多数 ID 看起来都是正确的,但在某些情况下,它会给出大量的计数。我将编辑添加 dput() 的问题,以便您理解我的意思。检查 ID 4479 的条形图。 太棒了!非常感谢

以上是关于如何从 df 的不同列中获取闪避的 geom_bar (ggplot2)的主要内容,如果未能解决你的问题,请参考以下文章

根据条件从另一个数据框列中获取值

从日期列中获取月份

如何从 Power Query 的不同表中的 2 列中获取数据以汇总为 1 列?

通过在一列中添加不同的数据从一个创建两个数据帧

更改 ggplot2 barplot 中闪避条的顺序

如何在R中的一列中添加具有不同值的新行