在折线图或按计数的日期散点图上绘制多个分类数据

Posted

技术标签:

【中文标题】在折线图或按计数的日期散点图上绘制多个分类数据【英文标题】:Plot several categorical data on a line graph or scatter plot of date by count 【发布时间】:2021-07-06 02:02:36 【问题描述】:

我有一些类似的数据:

  year        car_type
    1       1993     sport
    2       1994     sport
    3       1945     family
    4       1955     off-road
    5       1998     sport
    6       1966     off-road
    7       2001     super
    8       1999     super
    9       2010     super
    10      1988     off-road
    11      1988     off-road
    12      1988     sport
    13      2014     sport
    14      2056     super
    15      2022     family
    16      2022     family
    17      2008     family
    18      2001     off-road
    19      2018     super
    20      2008     family
    21      2020     sport
    22      2013     sport
    23      2014     super
    24      2015     off-road
    25      2014     off-road
    26      2013     sport
    27      2013     super
    28      2014     super
    29      2020     off-road
    30      2020     sport

注意:year 和 car_type 都可以出现多次。

我想绘制一个折线图或散点图,其中 x 轴是年份,y 轴是汽车在该年出现的次数(任何 car_type 出现)。

我可以从这里https://r-graphics.org/recipe-line-graph-multiple-line 收集如何绘制多条线,但是我不知道如何绘制一个变量及其出现的折线图。所以 x 轴是日期,y 是该日期发生的次数。与散点图相同。

我可以在堆积条形图中做同样的概念:

但是,这并没有显示这些汽车随着时间的推移而出现的情况。任何帮助将不胜感激。

【问题讨论】:

【参考方案1】:

也许您对这种解决方案感兴趣?

library(tidyverse)
library(lubridate) # for working with dates
library(scales)   # to access breaks/formatting functions

 df %>%
  group_by(year) %>% 
  dplyr::count(car_type) %>% 
  dplyr::summarise(N = sum(n)) %>% 
  arrange(year) %>%  
  mutate(year = lubridate::ymd(year, truncated = 2L)) %>% 
  ggplot +
  aes(x=year, y=N) +
  geom_line( color="steelblue", size=1) + 
  scale_x_date(breaks=date_breaks("5 year"), date_labels = "%Y") +
  geom_point() +
  xlab("") +
  theme_bw() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
   xlab("year") + 
   ylab("Cars(N)") +
  ylim(0,6) +
   ggtitle("Cars per year") 
   

数据:

df <- data.frame(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
       11, 12, 13, 14, 15, 16, 17, 18, 19, 
       20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30),
year = c(1993, 1994, 1945, 1955, 1998, 1966, 2001, 1999,
         2010, 1988, 1988, 1988, 2014, 2056, 2022, 2022, 2008, 2001, 2018, 
         2008, 2020, 2013, 2014, 2015, 2014, 2013, 2013, 2014, 2020, 2020), 
car_type = c("sport", "sport", "family", "off-road", "sport", 
             "off-road", "super", "super", "super", "off-road", "off-road", 
             "sport", "sport", "super", "family", "family", "family", "off-road", 
             "super", "family", "sport", "sport", "super", "off-road", "off-road",
             "sport", "super", "super", "off-road", "sport"))

【讨论】:

【参考方案2】:

这是基于您的问题的版本,使用问题中的数据绘制散点图。

library(ggplot2)
library(dplyr)

简单散点图的问题在于,当您有一个离散轴时,点会像第一个示例一样重叠。

ggplot(df)+
  geom_point(aes(year, car)) 

为了使图表更有意义,您可以按给定类别和年份的汽车数量汇总数据,如下所示:


df1 <- 
  df %>%
  group_by(year, car) %>% 
  summarise(count = n())
 
ggplot(df1)+
  geom_point(aes(year, car, size = count))+
  scale_size_continuous(breaks = unique(df1$count))

数据

df <- structure(list(id = 2:30, year = c(1994L, 1945L, 1955L, 1998L, 
                                         1966L, 2001L, 1999L, 2010L, 1988L, 1988L, 1988L, 2014L, 2056L, 
                                         2022L, 2022L, 2008L, 2001L, 2018L, 2008L, 2020L, 2013L, 2014L, 
                                         2015L, 2014L, 2013L, 2013L, 2014L, 2020L, 2020L), car = c("sport", 
                                                                                                   "family", "off-road", "sport", "off-road", "super", "super", 
                                                                                                   "super", "off-road", "off-road", "sport", "sport", "super", "family", 
                                                                                                   "family", "family", "off-road", "super", "family", "sport", "sport", 
                                                                                                   "super", "off-road", "off-road", "sport", "super", "super", "off-road", 
                                                                                                   "sport")), class = "data.frame", row.names = c(NA, -29L))

由reprex package (v2.0.0) 于 2021-04-10 创建

【讨论】:

【参考方案3】:

在 ggplot2 中,图层有两个重要的组成部分:geom 和 stat。一些层,如geom_bar() 自动附加了非身份统计部分,在本例中为stat_count()。如果你想用geom_line() 复制geom_bar() 的行为,你需要为层提供正确的统计数据。

library(ggplot2)

# Assuming 'data' is a data.frame with the data you've posted
ggplot(data, aes(year, colour = car_type)) +
  geom_line(stat = "count")

【讨论】:

以上是关于在折线图或按计数的日期散点图上绘制多个分类数据的主要内容,如果未能解决你的问题,请参考以下文章

EXCEL画曲线时,怎么画出多个系列的散点图

科研技巧Matlab 绘制论文所需格式图实现(柱状图折线散点图)

科研技巧Matlab 绘制论文所需格式图实现(柱状图折线散点图)

Excel技巧:诡异的折线图?折线图与XY散点图的区别

怎样在excel中把多组数据绘制在一张xy散点图上??? 能帮我做出来最好了!!

MatPlotLib:同一个散点图上的多个数据集