ggplot 条形图，具有与方面相关的类别顺序

Posted 2023-02-16

技术标签:

【中文标题】ggplot 条形图，具有与方面相关的类别顺序【英文标题】：ggplot bar plot with facet-dependent order of categories 【发布时间】：2013-09-08 14:00:17 【问题描述】：

我看到了很多关于如何在条形图中（重新）排序类别的问题（通常与 Order Bars in ggplot2 bar graph 相关联）。

我所追求的只是有点不同，但我还没有找到一个好的方法：我有一个多面的条形图，我想独立地为每个面排序 x 轴，根据另一个变量（在我的例子中，该变量只是 y 值本身，即我只希望条形在每个方面的长度增加）。

简单的例子，例如Order Bars in ggplot2 bar graph:

df <- data.frame(name=c('foo','bar','foo','bar'),period=c('old','old','recent','recent'),val=c(1.23,2.17,4.15,3.65))
p = ggplot(data = df, aes(x = reorder(name, val), y = val))
p = p + geom_bar(stat='identity')
p = p + facet_grid(~period)
p

我们得到以下内容：

而我想要的是：

【问题讨论】：

天哪！你在写How to Lie with Statistics的后续文章吗？唯一的方法是制作单独的绘图并使用gridExtra 包中的grid.arrange。但我同意它通常不会产生非常好的情节。（你会在 ggplot 中发现很多；如果某些事情真的很难做到，那可能是因为它试图阻止你做一些愚蠢的事情。并非总是如此，但很多......）是的，谢谢，不是很有用，但还是谢谢。在我们使用它的上下文中，它是一个重要的情节，并且类别的顺序是经过深思熟虑的。在这里，我将其归结为一个最小的示例，但在我们的应用程序中，我们根据它们实现的可加性对十几个信号进行排序，并且在某些方面让条形遍布整个地方是不可接受的。我理解动机，只是大多数人误解了刻面设计方式的原因。它们明确适用于每个面板共享相同比例的情况。在某些情况下，您希望多个不共享一个共同比例的图，但刻面不是正确的工具。你基本上是在谈论多个单独的情节，因此grid.arrange。但大多数人只是假设 faceting = 安排多个大致相似的情节。好吧，老实说，discrete_scale 的分类顺序（例如，字母顺序，或按 y 的平均值的某种整体顺序）无论如何都有些随意，因此多个方面必须共享相同分类尺度的概念对我来说有点做作。在我看来，决定 x 在显示类别时按某些指标进行排名，并让标签落在每个方面的可能位置更有意义。从这个意义上说，所有方面共享的共同尺度是数字度量。这有点像在散点图中绘制文本标签。 【参考方案1】：

好吧，抛开所有的哲学思考，如果有人感兴趣，这里有一个丑陋的 hack 来做这件事。这个想法是使用不同的标签（想想paste(period, name)，除了我将句点替换为 0 空格、1 空格等，这样它们就不会显示）。我需要这个情节，我不想安排 grobs 之类的，因为我可能想分享一个共同的传说等。

前面给出的原子示例变为：

df <- data.frame(name=c('foo','bar','foo','bar'),
  period=c('old','old','recent','recent'),
  val=c(1.23,2.17,4.15,3.65),
  stringsAsFactors=F)
df$n = as.numeric(factor(df$period))
df = ddply(df,.(period,name),transform, x=paste(c(rep(' ',n-1), name), collapse=''))
df$x = factor(df$x, levels=df[order(df$val), 'x'])
p = ggplot(data = df, aes(x = x, y = val))
p = p + geom_bar(stat='identity')
p = p + facet_grid(~period, scale='free_x')
p

另一个例子，仍然有点傻，但更接近我的实际用例，是：

df <- ddply(mpg, .(year, manufacturer), summarize, mixmpg = mean(cty+hwy))
df$manufacturer = as.character(df$manufacturer)
df$n = as.numeric(factor(df$year))
df = ddply(df, .(year,manufacturer), transform,
     x=paste(c(rep(' ',n-1), manufacturer), collapse=''))
df$x = factor(df$x, levels=df[order(df$mixmpg), 'x'])
p = ggplot(data = df, aes(x = x, y = mixmpg))
p = p + geom_bar(stat='identity')
p = p + facet_grid(~year, scale='free_x')
p = p + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=.5,colour='gray50'))
p

闭上眼睛，想一想帝国，试着享受吧。

【讨论】：

我加了答案，因为我认为没有grid.arrange 也可以做到这一点很酷，但再次相信这可能非常棘手，因为我们对多面图的期望是分类将被安排在各个方面以相同的方式。这可能是与生俱来的期望或历史期望，但期望仍然存在，违反它可能会产生误导。我在这两个方面都同意@TylerRinker 并相应地投票。（恕我直言）可能不那么令人困惑的另一个选项可能是完全抑制轴标签，或者仅使用填充美学（如果只有几个条形图）或在每个条形图上方的图中标记它们。谢谢。本质上，您建议将 x 设为排名（这是一个一致的数值），并在每个条形内的某处绘制类别的文本，而不是作为标签。如果某些类别的栏很小，这可能是个问题，但我总是对意见的多样性持开放态度。也许你可以举个例子，例如使用mpg 数据，这样我们就可以看到它的样子。作为一个 Tufte 爱好者，无论如何，使用条形图不会是我的首选，但它符合 Tyler 所说的“历史期望”（在这种情况下，是我公司的期望）......【参考方案2】：

这是一个老问题，但它被用作欺骗目标。因此，可能值得提出一个利用ggplot2 包的最新增强功能的解决方案，即labels 参数到scale_x_discrete()。这样可以避免使用已弃用的 use duplicate levels 或 manipulate factor labels by prepending a varying number of spaces。

准备数据

这里，mpg 数据集用于与this answer 进行比较。对于数据操作，此处使用data.tablepackage，但您可以随意使用任何您喜欢的包。

library(data.table)   # version 1.10.4
library(ggplot2)      # version 2.2.1
# aggregate data
df <- as.data.table(mpg)[, .(mixmpg = mean(cty + hwy)), by = .(year, manufacturer)]
# create dummy var which reflects order when sorted alphabetically
df[, ord := sprintf("%02i", frank(df, mixmpg, ties.method = "first"))]

创建情节

# `ord` is plotted on x-axis instead of `manufacturer`
ggplot(df, aes(x = ord, y = mixmpg)) +
  # geom_col() is replacement for geom_bar(stat = "identity")
  geom_col() +
  # independent x-axis scale in each facet, 
  # drop absent factor levels (actually not required here)
  facet_wrap(~ year, scales = "free_x", drop = TRUE) +
  # use named character vector to replace x-axis labels
  scale_x_discrete(labels = df[, setNames(as.character(manufacturer), ord)]) + 
  # replace x-axis title
  xlab(NULL) +
  # rotate x-axis labels
  theme(axis.text.x = element_text(angle = 90, hjust=1, vjust=.5))

【讨论】：

相同的解决方案，但使用 dplyr 而不是 data.table：gist.github.com/holgerbrandl/2b216b2e3ec51d48b2be4d9f46f0ff5e【参考方案3】：

根据this answer，有几种不同的方法可以实现 OP 的目标

(1) reorder_within() 函数可在 period 方面对 name 重新排序。

library(tidyverse)
library(forcats)

df <- data.frame(
  name = c("foo", "bar", "foo", "bar"),
  period = c("old", "old", "recent", "recent"),
  val = c(1.23, 2.17, 4.15, 3.65)
)

reorder_within <- function(x, by, within, fun = mean, sep = "___", ...) 
  new_x <- paste(x, within, sep = sep)
  stats::reorder(new_x, by, FUN = fun)


scale_x_reordered <- function(..., sep = "___") 
  reg <- paste0(sep, ".+$")
  ggplot2::scale_x_discrete(labels = function(x) gsub(reg, "", x), ...)


ggplot(df, aes(reorder_within(name, val, period), val)) +
  geom_col() +
  scale_x_reordered() +
  facet_grid(period ~ ., scales = "free", space = "free") +
  coord_flip() +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank())

或者（2）类似的想法

### https://trinkerrstuff.wordpress.com/2016/12/23/ordering-categories-within-ggplot2-facets/
df %>% 
  mutate(name = reorder(name, val)) %>%
  group_by(period, name) %>% 
  arrange(desc(val)) %>% 
  ungroup() %>% 
  mutate(name = factor(paste(name, period, sep = "__"), 
                       levels = rev(paste(name, period, sep = "__")))) %>%
  ggplot(aes(name, val)) +
  geom_col() +
  facet_grid(period ~., scales = "free", space = 'free') +
  scale_x_discrete(labels = function(x) gsub("__.+$", "", x)) +
  coord_flip() +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank()) + 
  theme(axis.ticks.y = element_blank())

或者 (3) 对整个数据框进行排序，并对每个构面组内的类别 (period) 进行排序！

  ### https://drsimonj.svbtle.com/ordering-categories-within-ggplot2-facets
  # 
  df2 <- df %>% 
  # 1. Remove any grouping
  ungroup() %>% 
  # 2. Arrange by
  #   i.  facet group (period)
  #   ii. value (val)
  arrange(period, val) %>%
  # 3. Add order column of row numbers
  mutate(order = row_number())
df2
#>   name period  val order
#> 1  foo    old 1.23     1
#> 2  bar    old 2.17     2
#> 3  bar recent 3.65     3
#> 4  foo recent 4.15     4

ggplot(df2, aes(order, val)) +
  geom_col() +
  facet_grid(period ~ ., scales = "free", space = "free") +
  coord_flip() +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank())

# To finish we need to replace the numeric values on each x-axis 
# with the appropriate labels
ggplot(df2, aes(order, val)) +
  geom_col() +
  scale_x_continuous(
    breaks = df2$order,
    labels = df2$name) +
  # scale_y_continuous(expand = c(0, 0)) +
  facet_grid(period ~ ., scales = "free", space = "free") +
  coord_flip() +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank()) + 
  theme(legend.position = "bottom",
        axis.ticks.y = element_blank())

^{由reprex package (v0.2.1.9000) 于 2018 年 11 月 5 日创建}

【讨论】：

facet_wrap 中的空格参数似乎不再存在。【参考方案4】：

试试这个，真的很简单（忽略警告）

df <-data.frame(name = c('foo', 'bar', 'foo', 'bar'),
                period = c('old', 'old', 'recent', 'recent'),
                val = c(1.23, 2.17, 4.15, 3.65))

d1 <- df[order(df$period, df$val), ]
sn <- factor(x = 1:4, labels = d1$name)
d1$sn <- sn
p <- ggplot(data = d1, aes(x = sn, y = val))
p <- p + geom_bar(stat = 'identity')
p <- p + facet_wrap(~ period, scale = 'free_x')
p

【讨论】：

为了完整起见：要忽略的警告阅读：duplicated levels in factors are deprecated。

以上是关于ggplot 条形图，具有与方面相关的类别顺序的主要内容，如果未能解决你的问题，请参考以下文章