如何从 df 的不同列中获取闪避的 geom_bar (ggplot2)
Posted
技术标签:
【中文标题】如何从 df 的不同列中获取闪避的 geom_bar (ggplot2)【英文标题】:how to obtain a dodged geom_bar (ggplot2) from different columns of a df 【发布时间】:2021-12-05 02:02:17 【问题描述】:显示数据框df
:
ID gene1 gene2
4602 TET2 TET2
4602 TP53 TP53
4602 TET2 TET2
5095 ASXL1 ASXL1
5095 DNMT3A DNMT3A
5095 NPM1 <NA>
我一直在尝试获取一个 匹配 条形图,显示列 gene1
和 gene2
的计数(条形)。 gene1
方法是标准方法,而gene2 是另一个突变检测器,应该与gene1
进行比较。如您所见,在样本5095
中仅检测到 2 个突变,而第 3 个未复制。
如何为每个ID
制作一个带有两个条形的条形图,显示gene1
和gene2
中的计数?
这里是 dput()
structure(list(ID = c(4602, 4602, 4602, 5095, 5095, 5095, 5095,
4649, 4649, 4649, 5069, 5069, 5069, 5146, 5132, 5132, 5132, 5132,
5132, 5132, 4297, 4297, 4297, 4297, 4297, 4345, 4345, 4345, 4345,
4345, 4356, 4356, 4356, 4356, 4385, 4385, 4385, 4385, 4385, 4385,
4437, 4437, 4437, 4437, 4437, 4437, 4442, 4442, 4442, 4442, 4442,
4479, 4479, 4479, 4479, 4479, 4479, 4479, 4479, 4479, 4479, 4479,
4479, 4479, 4479, 4479, 4487, 4487, 4487, 4487, 4487, 4487, 4537,
4537, 4537, 4537, 4537, 4537, 4621, 4621, 4621, 4621, 4621, 4621,
4621, 4624, 4624, 4624, 4624, 4624, 4665, 4736, 4736, 4736, 4736,
4736, 4895, 4895, 4895, 4895, 4895, 4903, 4903, 4903, 4903, 4691,
4691, 4691, 4691, 4261, 4261, 4261, 4261, 4394, 4394, 4394, 4394,
4424, 4424, 4424, 4424, 4943, 4943, 4943, 5073, 5169, 5169),
gene1 = c("TET2", "TP53", "TET2", "ASXL1", "DNMT3A", "NPM1",
"PTPN11", "TP53", "TP53", "TET2", "DNMT3A", "TET2", "TET2",
"negative", "JAK2", "ASXL1", "BRAF", "CBL", "TET2", "TET2",
"DNMT3A", "IDH1", "NPM1", "CREBBP", "FLT3", "DNMT3A", "FLT3",
"NPM1", "BCOR", "KIT", "DNMT3A", "IDH1", "NRAS", "BCOR",
"KRAS", "NPM1", "PTPN11", "ETV6", "PHF6", "TET2", "DNMT3A",
"KRAS", "NPM1", "WT1", "TET2", "WT1", "DNMT3A", "FLT3", "NPM1",
"NRAS", "WT1", "DNMT3A", "IDH2", "NPM1", "SRSF2", "ATRX",
"CUX1", "CUX1", "FLT3", "GNAS", "PHF6", "PIGA", "PIGA", "PRPF40B",
"PTPN11", "TET2", "IDH1", "IDH2", "RUNX1", "U2AF1", "TET2",
"TP53", "DNMT3A", "IDH2", "ATRX", "GATA2", "STAG2", "TP53",
"IDH2", "SRSF2", "ASXL1", "GATA1", "KDM6A", "STAG2", "TP53",
"IDH2", "JAK2", "SRSF2", "ASXL1", "RIT1", "KRAS", "NPM1",
"NRAS", "NRAS", "BCOR", "MYD88", "FLT3", "NPM1", "NRAS",
"TET2", "TET2", "DNMT3A", "IDH1", "NPM1", "CREBBP", "DNMT3A",
"IDH1", "IDH2", "NPM1", "FLT3", "FLT3", "GATA2", "SH2B3",
"FLT3", "NPM1", "KDM6A", "SMC1A", "IDH2", "SRSF2", "ASXL2",
"RUNX1", "IDH2", "JAK2", "NPM1", "JAK2", "SRSF2", "STAG2"
), gene2 = c("TET2", "TP53", "TET2", "ASXL1", "DNMT3A", NA,
"PTPN11", "TP53", "TP53", "TET2", "DNMT3A", NA, "TET2", "PTEN",
NA, NA, "BRAF", "CBL", "TET2", "TET2", "JAK2", "SRSF2", NA,
"DNMT3A", "IDH1", "NPM1", NA, "FLT3", "DNMT3A", "FLT3", "NPM1",
NA, NA, "DNMT3A", "IDH1", "NRAS", "BCOR", "KRAS", "NPM1",
"PTPN11", "ETV6", "PHF6", "TET2", "DNMT3A", "KRAS", "NPM1",
NA, "TET2", NA, "DNMT3A", "FLT3", "NPM1", "NRAS", NA, NA,
"IDH2", "NPM1", "SRSF2", NA, "CALR", NA, NA, NA, NA, NA,
NA, NA, NA, NA, "IDH1", "IDH2", "RUNX1", "U2AF1", "TET2",
NA, "DNMT3A", "IDH2", NA, NA, NA, NA, "IDH2", "SRSF2", "ASXL1",
NA, NA, "KMT2D", "TP53", "IDH2", "JAK2", "SRSF2", "ASXL1",
NA, "KRAS", "NPM1", "NRAS", "NRAS", NA, NA, "FLT3", "NPM1",
"NRAS", "TET2", "TET2", "DNMT3A", "IDH1", "NPM1", "CREBBP",
"DNMT3A", "IDH1", "IDH2", "NPM1", "FLT3", "FLT3", NA, NA,
"FLT3", "NPM1", NA, "SMC1A", "IDH2", "SRSF2", NA, "RUNX1",
"IDH2", "JAK2", "NPM1")), class = "data.frame", row.names = c(NA,
-127L))
【问题讨论】:
【参考方案1】:还有一个:
df %>%
group_by(ID) %>%
add_count(gene1, gene2) %>%
pivot_longer(
cols = contains("gene")
) %>%
ungroup() %>%
ggplot(aes(factor(name), n, fill=value, group=value, label=value)) +
geom_col() +
facet_wrap(.~ID, scales = "free_y") +
geom_text(size = 3, position = position_stack(vjust = 0.5)) +
theme_classic()+
xlab("") +
guides(fill=FALSE)
最后一次尝试,现在应该可以了。现在我们在旋转之前进行计数:
df %>%
group_by(ID) %>%
add_count(gene1, gene2) %>%
pivot_longer(
cols = contains("gene")
) %>%
ungroup() %>%
ggplot(aes(factor(name), n, fill=value, group=value)) +
geom_col() +
facet_wrap(.~ID, scales = "free_y") +
theme(legend.position = "bottom")+
guides(fill=guide_legend(nrow=2))
【讨论】:
这似乎是这样,但我需要 x 轴上的 ID 请看我的更新。我们可以根据您的最终结果进行调整! 我检查了您的上次更新,但我需要两个相邻的 ID:4602 条,一个来自gene1,一个来自gene2。虽然你所做的有点传达相同的信息,但更难解释 我们基本上就在那里,你的最后一次更新就是我要找的。唯一的问题是,当应用于我的整个 df(127 行,3 列)时,它对于大多数 ID 看起来都是正确的,但在某些情况下,它会给出大量的计数。我将编辑添加 dput() 的问题,以便您理解我的意思。检查 ID 4479 的条形图。 太棒了!非常感谢以上是关于如何从 df 的不同列中获取闪避的 geom_bar (ggplot2)的主要内容,如果未能解决你的问题,请参考以下文章