通过随时间变化的第二个变量排序堆积条形图

Posted

技术标签:

【中文标题】通过随时间变化的第二个变量排序堆积条形图【英文标题】:Ordering a stacked bar graph by second variable changing over time 【发布时间】:2021-12-17 06:01:38 【问题描述】:

我查看了许多答案(here、here、here),但没有一个能产生我想要的结果。随着时间的推移,我有一个行业数量的数据集。我想要每个月按数量订购的堆积条。这意味着,如果该月内的相对交易量发生变化,则每个月堆叠的条形应该具有不同的顺序。

以下是截断的数据样本:

test <- structure(list(Date = structure(c(18506, 18506, 18506, 18506, 
18506, 18506, 18536, 18536, 18536, 18536, 18536, 18536, 18567, 
18567, 18567, 18567, 18567, 18567), class = "Date"), Industry = c("Investment", 
"Telecoms", "Mortgage & Loans", "Banking", "Insurance", "Credit Cards", 
"Telecoms", "Investment", "Mortgage & Loans", "Banking", "Credit Cards", 
"Insurance", "Investment", "Telecoms", "Mortgage & Loans", "Credit Cards", 
"Insurance", "Banking"), volume = c(775349, 811294, 3144684, 
4427814, 7062691, 9377254, 1210194, 1735033, 3539406, 6952688, 
8858649, 9076391, 670934, 869452, 3542294, 5132132, 6953113, 
6954535)), row.names = c(NA, -18L), groups = structure(list(Date = structure(c(18506, 
18536, 18567), class = "Date"), .rows = structure(list(1:6, 7:12, 
    13:18), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"))

这是目前的图表:

#A. Library
library(dplyr)
library(ggplot)
library(ggtext)
library(scales)

#B. Graph
graph <- test %>%
    
    ggplot(aes(x=Date)) +
    
    ##1. Bar graph
    geom_bar(aes(x=Date, y=volume, fill = Industry), stat="identity") +
    
    ##2. Graph title and Axis labels
    ggtitle(label = "**Volume**",
            subtitle = "By Industry") +
    ylab("Volume (Millions)") + 
    xlab("") +
    
    ##3. Scales
    scale_fill_manual(values=c("#e3120b", "#336666", "#FB9851", "#acc8d4", 
                               "#dbcc98", "#36E2BD")) +
    scale_x_date(date_breaks = "month", labels = scales::label_date_short()) +
    scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6, 
                                            accuracy = 1)) + 
    
    #4. Theme
    guides(col = guide_legend(ncol = 2, nrow = 3)) +
    theme_minimal() +
    theme(text = element_text(family = "Georgia"),
          panel.border=element_blank(), 
          axis.line=element_line(), 
          plot.title = element_markdown(color="black", size=14, hjust = .5),
          plot.subtitle = element_text(hjust = .5),
          axis.title.x = element_text(size = 9, color = "grey30"), 
          axis.title.y = element_text(size = 9, color = "grey30"), 
          legend.box.background = element_rect(color="black", size=.5),
          legend.title = element_blank(),
          legend.text = element_text(size = 6),
          legend.position = "bottom",
          strip.background = element_rect(linetype="solid",),
          panel.grid.minor.y = element_line(color = NA),
          panel.grid.minor.x = element_line(color = NA),
          plot.caption = ggtext::element_markdown(hjust = 1, size = 7, 
                                                  color = "#7B7D7D"))  

据我了解,ggplot 按因子顺序对堆积条进行排序。我尝试了test %&gt;% arrange(Date, volume),但后来被困在如何按月更改因子,而不仅仅是因子的静态顺序。我可以使用单独的因子为每个月创建一个单独的条形图,但如果我想在图表中添加多年,这会变得很麻烦。

任何帮助表示赞赏!

【问题讨论】:

“我可以用一个单独的因素为每个月创建一个单独的栏” - 我认为你不会解决这个问题。但我也不认为这需要太麻烦。 @tjebo 作为一个例子,你将如何处理我在测试数据集中拥有的三个月的数据?也许它没有我想象的那么笨重。 【参考方案1】:

我冒昧地将您的示例归结为基本要素。根据评论,我认为没有办法单独定义每个月的因子水平。但是您可以在函数中执行此操作,创建一个列表,并使用 ggplot 对象的列表字符。

这种方式是可扩展的,这意味着,无论你有多少个月,它都将保持相同的代码...... :)

library(tidyverse)
library(lubridate)

test <- 
  test %>% 
  ## it's probably not necessary to order the data and 
  ## create the factor levels explicitly, but it gives more control
  arrange(Date) %>%
  mutate(year_mo = fct_inorder(paste(year(Date), month(Date), sep = "_")))

## split the new data by month and create different factor levels
ls_test <- 
  test %>%
  split(., .$year_mo) %>%
  map(function(x) x$Industry <- fct_reorder(x$Industry, x$volume); x)

## make your geom_col list (geom_col is equivalent to geom_bar(stat= "identity")
ls_p_col <- map(ls_test, function(x)
  geom_col(data = x, mapping = aes(x=year_mo, y=volume, fill = Industry))
)

# Voilà!
ggplot() +
  ls_p_col +
  scale_fill_brewer() +
  scale_x_discrete(limits = unique(test$year_mo)) # to force the correct order of your x

【讨论】:

以上是关于通过随时间变化的第二个变量排序堆积条形图的主要内容,如果未能解决你的问题,请参考以下文章

ggplot2:3路交互堆积条形图的分组条形图

日期箱内变量计数的堆积条形图

带有ggplot2的发散堆积条形图:图例中的因子排序问题

为多个变量制作堆积条形图 - R 中的 ggplot2

一个定量变量和一个分类变量的堆积条形图

具有 50 多个创建的虚拟变量的堆积条形图(百分比)? [关闭]