使用 ggplot 绘制 300 多个单位的热图

Posted

技术标签:

【中文标题】使用 ggplot 绘制 300 多个单位的热图【英文标题】:Heatmap using ggplot for 300+ units 【发布时间】:2021-10-11 08:18:48 【问题描述】:

我正在尝试使用我在网上找到的以下代码 (https://simplystatistics.org/2019/08/28/you-can-replicate-almost-any-plot-with-ggplot2/) 通过 ggplot 生成热图:

library(dslabs)
data(us_contagious_diseases)
the_disease <- "Measles"
dat <- us_contagious_diseases %>%
  filter(!state%in%c("Hawaii","Alaska") & disease == the_disease) %>%
  mutate(rate = count / population * 10000 * 52 / weeks_reporting) 

jet.colors <- colorRampPalette(c("#F0FFFF", "cyan", "#007FFF", "yellow", "#FFBF00", "orange", "red", "#7F0000"), bias = 2.25)

dat %>% mutate(state = reorder(state, desc(state))) %>%
  ggplot(aes(year, state, fill = rate)) +
  geom_tile(color = "white", size = 0.35) +
  scale_x_continuous(expand = c(0,0)) +
  scale_fill_gradientn(colors = jet.colors(16), na.value = 'white') +
  geom_vline(xintercept = 1963, col = "black") +
  theme_minimal() + 
  theme(panel.grid = element_blank()) +
        coord_cartesian(clip = 'off') +
        ggtitle(the_disease) +
        ylab("") +
        xlab("") +  
        theme(legend.position = "bottom", text = element_text(size = 8)) + 
        annotate(geom = "text", x = 1963, y = 50.5, label = "Vaccine introduced", size = 3, hjust = 0)

结果:

这一切都很好,但在我的数据集中我有 320 个单位而不是 50 个。我使用上面示例中的数据创建了一个类似的场景:

library(dslabs)
data(us_contagious_diseases)
the_disease <- "Measles"
dat <- us_contagious_diseases %>%
  filter(!state%in%c("Hawaii","Alaska") & disease == the_disease) %>%
  mutate(rate = count / population * 10000 * 52 / weeks_reporting) 

dat1 <- dat

levels(dat1$state) <- c("State1_1","State1_2","State1_3","State1_4","State1_5","State1_6","State1_7","State1_8","State1_9","State1_10",
                        "State1_11","State1_12","State1_13","State1_14","State1_15","State1_16","State1_17","State1_18","State1_19","State1_20",
                        "State1_21","State1_22","State1_23","State1_24","State1_25","State1_26","State1_27","State1_28","State1_29","State1_30",
                        "State1_31","State1_32","State1_33","State1_34","State1_35","State1_36","State1_37","State1_38","State1_39","State1_40",             
                        "State1_41","State1_42","State1_43","State1_44","State1_45","State1_46","State1_47","State1_48","State1_49","State1_50","State1_51")                      

dat2 <- dat

levels(dat2$state) <- c("State2_1","State2_2","State2_3","State2_4","State2_5","State2_6","State2_7","State2_8","State2_9","State2_10",
                        "State2_11","State2_12","State2_13","State2_14","State2_15","State2_16","State2_17","State2_18","State2_19","State2_20",
                        "State2_21","State2_22","State2_23","State2_24","State2_25","State2_26","State2_27","State2_28","State2_29","State2_30",
                        "State2_31","State2_32","State2_33","State2_34","State2_35","State2_36","State2_37","State2_38","State2_39","State2_40",             
                        "State2_41","State2_42","State2_43","State2_44","State2_45","State2_46","State2_47","State2_48","State2_49","State2_50","State2_51")                      

dat3 <- dat

levels(dat3$state) <- c("State3_1","State3_2","State3_3","State3_4","State3_5","State3_6","State3_7","State3_8","State3_9","State3_10",
                        "State3_11","State3_12","State3_13","State3_14","State3_15","State3_16","State3_17","State3_18","State3_19","State3_20",
                        "State3_21","State3_22","State3_23","State3_24","State3_25","State3_26","State3_27","State3_28","State3_29","State3_30",
                        "State3_31","State3_32","State3_33","State3_34","State3_35","State3_36","State3_37","State3_38","State3_39","State3_40",             
                        "State3_41","State3_42","State3_43","State3_44","State3_45","State3_46","State3_47","State3_48","State3_49","State3_50","State3_51")                      

dat4 <- dat

levels(dat4$state) <- c("State4_1","State4_2","State4_3","State4_4","State4_5","State4_6","State4_7","State4_8","State4_9","State4_10",
                        "State4_11","State4_12","State4_13","State4_14","State4_15","State4_16","State4_17","State4_18","State4_19","State4_20",
                        "State4_21","State4_22","State4_23","State4_24","State4_25","State4_26","State4_27","State4_28","State4_29","State4_30",
                        "State4_31","State4_32","State4_33","State4_34","State4_35","State4_36","State4_37","State4_38","State4_39","State4_40",             
                        "State4_41","State4_42","State4_43","State4_44","State4_45","State4_46","State4_47","State4_48","State4_49","State4_50","State4_51")                      

dat5 <- dat

levels(dat5$state) <- c("State5_1","State5_2","State5_3","State5_4","State5_5","State5_6","State5_7","State5_8","State5_9","State5_10",
                        "State5_11","State5_12","State5_13","State5_14","State5_15","State5_16","State5_17","State5_18","State5_19","State5_20",
                        "State5_21","State5_22","State5_23","State5_24","State5_25","State5_26","State5_27","State5_28","State5_29","State5_30",
                        "State5_31","State5_32","State5_33","State5_34","State5_35","State5_36","State5_37","State5_38","State5_39","State5_40",             
                        "State5_41","State5_42","State5_43","State5_44","State5_45","State5_46","State5_47","State5_48","State5_49","State5_50","State5_51")                      

dat <- rbind(dat,dat1,dat2,dat3,dat4,dat5)
jet.colors <- colorRampPalette(c("#F0FFFF", "cyan", "#007FFF", "yellow", "#FFBF00", "orange", "red", "#7F0000"), bias = 2.25)

dat %>% mutate(state = reorder(state, desc(state))) %>%
  ggplot(aes(year, state, fill = rate)) +
  geom_tile(color = "white", size = 0.35) +
  scale_x_continuous(expand = c(0,0)) +
  scale_fill_gradientn(colors = jet.colors(16), na.value = 'white') +
  geom_vline(xintercept = 1963, col = "black") +
  theme_minimal() + 
  theme(panel.grid = element_blank()) +
  coord_cartesian(clip = 'off') +
  ggtitle(the_disease) +
  ylab("") +
  xlab("") +  
  theme(legend.position = "bottom", text = element_text(size = 8)) + 
  annotate(geom = "text", x = 1963, y = 50.5, label = "Vaccine introduced", size = 3, hjust = 0)

结果:

在这种情况下,热图太长,不向下滚动就无法在屏幕上看到。关于如何在屏幕上放置所有这 320 个单位而不使正方形太小,有什么想法吗?

【问题讨论】:

这真的取决于你想在这里展示什么。无论如何,320 个数据点对于人类观察者来说实在是太多了。甚至是您用作灵感的原始热图 has been criticised on datavis grounds。一般来说,答案是:汇总数据,绘制摘要。 感谢康拉德。不确定我是否理解您分享的链接中的批评。我浏览了链接并看到了一种不同的数据呈现方式,但并不认为这是对热图的批评。热图有什么问题? 问题是缺少摘要。该图的热图版本都将 50 多个状态显示为单独的数据点,但它们不提供统计摘要的可视化。这使得麻疹图特别难以解释,因为它显示病例数与疫苗的引入之间没有因果关系。它所显示的只是“一段时间后”的下降(也没有显示病例数的惊人稳定性)。相比之下,包含汇总统计数据清楚地表明,病例下降立即随着疫苗的引入而开始。 感谢康拉德。我可以澄清一下 - “统计摘要”的可视化是什么意思?在您共享的链接中,我认为这些图中的任何一个都没有统计摘要?我试图理解您所说的“包含汇总统计数据清楚地表明病例下降立即开始于疫苗的引入”的意思。 第一个图包含一个统计摘要:黄色回归线。 【参考方案1】:

根据屏幕(或纸张大小),它的实用性存在实际限制。看来您正面临这个限制。

您可以旋转情节,因为屏幕往往比高度更宽,但这会扰乱情节明显的从左到右的时间顺序。

我认为您唯一的其他现实选择是将绘图按其他变量拆分,或者以某种方式汇总 y 轴上的类别,并隐藏 y 轴上的大部分标签。在您创建的示例中,您可以将它们分组为地理区域而不是单个州,例如东海岸、西海岸、南部等。任何适用于您的真实数据的东西。如果 y 轴上的所有标签都很重要,则必须拆分图,因为它们目前基本上不可读。

【讨论】:

以上是关于使用 ggplot 绘制 300 多个单位的热图的主要内容,如果未能解决你的问题,请参考以下文章

绘制时间序列热图时提高 ggplotly 的性能

R语言可视化分别使用lattice包和ggplot2包可视化热图(heatmap)并绘制热力图对应的系统树图(dendrogram)实战

R 数据可视化 —— 聚类热图 pheatmap

ggplot2绘制环状聚类热图

使用 R 创建热图

ggplot中带有圆圈而不是瓷砖的热图