根据月份日期列查找数据框列中每个因素的平均值[重复]
Posted
技术标签:
【中文标题】根据月份日期列查找数据框列中每个因素的平均值[重复]【英文标题】:Find the average of every factor in dataframe column based on month of dates column [duplicate] 【发布时间】:2022-01-22 22:51:54 【问题描述】:我有下面的数据框,我想根据Closed_Date
的月份找到Area
列的每个级别/因子的平均值。所以实际上我想要一个新的数据框,列 Area
、Date
(年和月)和 Average Sold Price
。
subs<-structure(list(Sold_Pr = c(6500, 173000, 60000, 73000, 155000,
105000, 140000, 39900, 73500, 46000, 99900, 180000, 164000, 120000,
206000, 160000, 67400, 215000, 145000, 175000, 350000, 425000,
435000, 490000, 545000, 585000, 170000, 229900, 652000, 472500,
520000, 690000, 320000, 560000, 710000, 632000, 680000, 439000,
770000, 725000, 580000, 775000, 490000, 470000, 605000, 640000,
563000, 575000, 620000, 520000), Area = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("411",
"415", "981", "8001", "8002", "8003", "8004", "8005", "8006",
"8007", "8008", "8009", "8010", "8011", "8012", "8013", "8014",
"8015", "8016", "8017", "8018", "8019", "8020", "8021", "8022",
"8023", "8024", "8025", "8026", "8027", "8028", "8029", "8030",
"8031", "8034", "8035", "8037", "8038", "8039", "8040", "8041",
"8042", "8043", "8044", "8045", "8046", "8047", "8048", "8049",
"8050", "8051", "8052", "8053", "8055", "8056", "8057", "8058",
"8059", "8060", "8061", "8062", "8063", "8064", "8065", "8066",
"8067", "8068", "8069", "8070", "8071", "8072", "8073", "8074",
"8075", "8076", "8077"), class = "factor"), Closed_Date = structure(c(18668,
18933, 18716, 18740, 18639, 18845, 18708, 18676, 18733, 18695, 18715, 18709, 18794, 18803, 18750, 18787, 18906, 18810, 18855,
18870, 18626, 18786, 18808, 18864, 18961, 18914, 18865, 18704,
18661, 18747, 18676, 18659, 18696, 18802, 18689, 18873, 18836,
18809, 18823, 18851, 18967, 18893, 18660, 18626, 18810, 18655,
18661, 18719, 18647, 18863), class = "Date")), row.names = c(NA,
50L), class = c("tbl_df", "tbl", "data.frame"))
【问题讨论】:
一旦您了解从Date
转换为年月字符串(使用我的substr
或ThomasIsCoding 的format
),这实际上只是“按组平均”,这是一个骗局。希望答案有帮助! (您仍然可以接受或投票赞成答案。)如果我遗漏了什么,请@ping 我,我可以重新打开/取消重复。
【参考方案1】:
你的意思是像下面这样的吗?
> aggregate(cbind(Average_Sold_Pr = Sold_Pr) ~ Area + cbind(Date = format(Closed_Date, "%Y-%m")), subs, mean)
Area Date Average_Sold_Pr
1 415 2020-12 350000.0
2 8002 2020-12 470000.0
3 411 2021-01 155000.0
4 8002 2021-01 630000.0
5 411 2021-02 23200.0
6 8001 2021-02 620666.7
7 8002 2021-02 526500.0
8 411 2021-03 105180.0
9 8001 2021-03 419966.7
10 411 2021-04 73250.0
11 8001 2021-04 472500.0
12 8002 2021-04 575000.0
13 411 2021-05 206000.0
14 411 2021-06 148000.0
15 415 2021-06 430000.0
16 8001 2021-06 560000.0
17 411 2021-07 215000.0
18 8001 2021-07 629666.7
19 8002 2021-07 605000.0
20 411 2021-08 141666.7
21 415 2021-08 490000.0
22 981 2021-08 170000.0
23 8001 2021-08 725000.0
24 8002 2021-08 520000.0
25 8001 2021-09 703500.0
26 411 2021-10 67400.0
27 415 2021-10 585000.0
28 411 2021-11 173000.0
29 415 2021-11 545000.0
30 8001 2021-12 580000.0
【讨论】:
如何将 Date 转换为 Date 共振峰以便用 ggplot 绘制?as.Date(paste0(subs$Date, "-01"))
将是一个好的开始。【参考方案2】:
subs %>%
mutate(Date = substr(Closed_Date, 1, 7)) %>%
group_by(Date, Area) %>%
summarize(Sold_Pr = mean(Sold_Pr), n = n()) %>%
ungroup()
# # A tibble: 30 x 4
# Date Area Sold_Pr n
# <chr> <fct> <dbl> <int>
# 1 2020-12 415 350000 1
# 2 2020-12 8002 470000 1
# 3 2021-01 411 155000 1
# 4 2021-01 8002 630000 2
# 5 2021-02 411 23200 2
# 6 2021-02 8001 620667. 3
# 7 2021-02 8002 526500 2
# 8 2021-03 411 105180 5
# 9 2021-03 8001 419967. 3
# 10 2021-04 411 73250 2
# # ... with 20 more rows
(我添加了n
以表明正在聚合行,您无需将其保留在代码中。)
【讨论】:
以上是关于根据月份日期列查找数据框列中每个因素的平均值[重复]的主要内容,如果未能解决你的问题,请参考以下文章