在折线图或按计数的日期散点图上绘制多个分类数据
Posted
技术标签:
【中文标题】在折线图或按计数的日期散点图上绘制多个分类数据【英文标题】:Plot several categorical data on a line graph or scatter plot of date by count 【发布时间】:2021-07-06 02:02:36 【问题描述】:我有一些类似的数据:
year car_type
1 1993 sport
2 1994 sport
3 1945 family
4 1955 off-road
5 1998 sport
6 1966 off-road
7 2001 super
8 1999 super
9 2010 super
10 1988 off-road
11 1988 off-road
12 1988 sport
13 2014 sport
14 2056 super
15 2022 family
16 2022 family
17 2008 family
18 2001 off-road
19 2018 super
20 2008 family
21 2020 sport
22 2013 sport
23 2014 super
24 2015 off-road
25 2014 off-road
26 2013 sport
27 2013 super
28 2014 super
29 2020 off-road
30 2020 sport
注意:year 和 car_type 都可以出现多次。
我想绘制一个折线图或散点图,其中 x 轴是年份,y 轴是汽车在该年出现的次数(任何 car_type 出现)。
我可以从这里https://r-graphics.org/recipe-line-graph-multiple-line 收集如何绘制多条线,但是我不知道如何绘制一个变量及其出现的折线图。所以 x 轴是日期,y 是该日期发生的次数。与散点图相同。
我可以在堆积条形图中做同样的概念:
但是,这并没有显示这些汽车随着时间的推移而出现的情况。任何帮助将不胜感激。
【问题讨论】:
【参考方案1】:也许您对这种解决方案感兴趣?
library(tidyverse)
library(lubridate) # for working with dates
library(scales) # to access breaks/formatting functions
df %>%
group_by(year) %>%
dplyr::count(car_type) %>%
dplyr::summarise(N = sum(n)) %>%
arrange(year) %>%
mutate(year = lubridate::ymd(year, truncated = 2L)) %>%
ggplot +
aes(x=year, y=N) +
geom_line( color="steelblue", size=1) +
scale_x_date(breaks=date_breaks("5 year"), date_labels = "%Y") +
geom_point() +
xlab("") +
theme_bw() +
theme(axis.text.x=element_text(angle=60, hjust=1)) +
xlab("year") +
ylab("Cars(N)") +
ylim(0,6) +
ggtitle("Cars per year")
数据:
df <- data.frame(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30),
year = c(1993, 1994, 1945, 1955, 1998, 1966, 2001, 1999,
2010, 1988, 1988, 1988, 2014, 2056, 2022, 2022, 2008, 2001, 2018,
2008, 2020, 2013, 2014, 2015, 2014, 2013, 2013, 2014, 2020, 2020),
car_type = c("sport", "sport", "family", "off-road", "sport",
"off-road", "super", "super", "super", "off-road", "off-road",
"sport", "sport", "super", "family", "family", "family", "off-road",
"super", "family", "sport", "sport", "super", "off-road", "off-road",
"sport", "super", "super", "off-road", "sport"))
【讨论】:
【参考方案2】:这是基于您的问题的版本,使用问题中的数据绘制散点图。
library(ggplot2)
library(dplyr)
简单散点图的问题在于,当您有一个离散轴时,点会像第一个示例一样重叠。
ggplot(df)+
geom_point(aes(year, car))
为了使图表更有意义,您可以按给定类别和年份的汽车数量汇总数据,如下所示:
df1 <-
df %>%
group_by(year, car) %>%
summarise(count = n())
ggplot(df1)+
geom_point(aes(year, car, size = count))+
scale_size_continuous(breaks = unique(df1$count))
数据
df <- structure(list(id = 2:30, year = c(1994L, 1945L, 1955L, 1998L,
1966L, 2001L, 1999L, 2010L, 1988L, 1988L, 1988L, 2014L, 2056L,
2022L, 2022L, 2008L, 2001L, 2018L, 2008L, 2020L, 2013L, 2014L,
2015L, 2014L, 2013L, 2013L, 2014L, 2020L, 2020L), car = c("sport",
"family", "off-road", "sport", "off-road", "super", "super",
"super", "off-road", "off-road", "sport", "sport", "super", "family",
"family", "family", "off-road", "super", "family", "sport", "sport",
"super", "off-road", "off-road", "sport", "super", "super", "off-road",
"sport")), class = "data.frame", row.names = c(NA, -29L))
由reprex package (v2.0.0) 于 2021-04-10 创建
【讨论】:
【参考方案3】:在 ggplot2 中,图层有两个重要的组成部分:geom 和 stat。一些层,如geom_bar()
自动附加了非身份统计部分,在本例中为stat_count()
。如果你想用geom_line()
复制geom_bar()
的行为,你需要为层提供正确的统计数据。
library(ggplot2)
# Assuming 'data' is a data.frame with the data you've posted
ggplot(data, aes(year, colour = car_type)) +
geom_line(stat = "count")
【讨论】:
以上是关于在折线图或按计数的日期散点图上绘制多个分类数据的主要内容,如果未能解决你的问题,请参考以下文章
科研技巧Matlab 绘制论文所需格式图实现(柱状图折线散点图)
科研技巧Matlab 绘制论文所需格式图实现(柱状图折线散点图)