根据月份日期列查找数据框列中每个因素的平均值[重复]

Posted

技术标签:

【中文标题】根据月份日期列查找数据框列中每个因素的平均值[重复]【英文标题】:Find the average of every factor in dataframe column based on month of dates column [duplicate] 【发布时间】:2022-01-22 22:51:54 【问题描述】:

我有下面的数据框,我想根据Closed_Date 的月份找到Area 列的每个级别/因子的平均值。所以实际上我想要一个新的数据框,列 AreaDate(年和月)和 Average Sold Price

subs<-structure(list(Sold_Pr = c(6500, 173000, 60000, 73000, 155000, 
105000, 140000, 39900, 73500, 46000, 99900, 180000, 164000, 120000, 
206000, 160000, 67400, 215000, 145000, 175000, 350000, 425000, 
435000, 490000, 545000, 585000, 170000, 229900, 652000, 472500, 
520000, 690000, 320000, 560000, 710000, 632000, 680000, 439000, 
770000, 725000, 580000, 775000, 490000, 470000, 605000, 640000, 
563000, 575000, 620000, 520000), Area = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("411", 
"415", "981", "8001", "8002", "8003", "8004", "8005", "8006", 
"8007", "8008", "8009", "8010", "8011", "8012", "8013", "8014", 
"8015", "8016", "8017", "8018", "8019", "8020", "8021", "8022", 
"8023", "8024", "8025", "8026", "8027", "8028", "8029", "8030", 
"8031", "8034", "8035", "8037", "8038", "8039", "8040", "8041", 
"8042", "8043", "8044", "8045", "8046", "8047", "8048", "8049", 
"8050", "8051", "8052", "8053", "8055", "8056", "8057", "8058", 
"8059", "8060", "8061", "8062", "8063", "8064", "8065", "8066", 
"8067", "8068", "8069", "8070", "8071", "8072", "8073", "8074", 
"8075", "8076", "8077"), class = "factor"), Closed_Date = structure(c(18668, 
18933, 18716, 18740, 18639, 18845, 18708, 18676, 18733, 18695, 18715, 18709, 18794, 18803, 18750, 18787, 18906, 18810, 18855, 
18870, 18626, 18786, 18808, 18864, 18961, 18914, 18865, 18704, 
18661, 18747, 18676, 18659, 18696, 18802, 18689, 18873, 18836, 
18809, 18823, 18851, 18967, 18893, 18660, 18626, 18810, 18655, 
18661, 18719, 18647, 18863), class = "Date")), row.names = c(NA, 
50L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】:

一旦您了解从Date 转换为年月字符串(使用我的substr 或ThomasIsCoding 的format),这实际上只是“按组平均”,这是一个骗局。希望答案有帮助! (您仍然可以接受或投票赞成答案。)如果我遗漏了什么,请@ping 我,我可以重新打开/取消重复。 【参考方案1】:

你的意思是像下面这样的吗?

> aggregate(cbind(Average_Sold_Pr = Sold_Pr) ~ Area + cbind(Date = format(Closed_Date, "%Y-%m")), subs, mean)
   Area    Date Average_Sold_Pr
1   415 2020-12        350000.0
2  8002 2020-12        470000.0
3   411 2021-01        155000.0
4  8002 2021-01        630000.0
5   411 2021-02         23200.0
6  8001 2021-02        620666.7
7  8002 2021-02        526500.0
8   411 2021-03        105180.0
9  8001 2021-03        419966.7
10  411 2021-04         73250.0
11 8001 2021-04        472500.0
12 8002 2021-04        575000.0
13  411 2021-05        206000.0
14  411 2021-06        148000.0
15  415 2021-06        430000.0
16 8001 2021-06        560000.0
17  411 2021-07        215000.0
18 8001 2021-07        629666.7
19 8002 2021-07        605000.0
20  411 2021-08        141666.7
21  415 2021-08        490000.0
22  981 2021-08        170000.0
23 8001 2021-08        725000.0
24 8002 2021-08        520000.0
25 8001 2021-09        703500.0
26  411 2021-10         67400.0
27  415 2021-10        585000.0
28  411 2021-11        173000.0
29  415 2021-11        545000.0
30 8001 2021-12        580000.0

【讨论】:

如何将 Date 转换为 Date 共振峰以便用 ggplot 绘制? as.Date(paste0(subs$Date, "-01")) 将是一个好的开始。【参考方案2】:
subs %>%
  mutate(Date = substr(Closed_Date, 1, 7)) %>%
  group_by(Date, Area) %>%
  summarize(Sold_Pr = mean(Sold_Pr), n = n()) %>%
  ungroup()
# # A tibble: 30 x 4
#    Date    Area  Sold_Pr     n
#    <chr>   <fct>   <dbl> <int>
#  1 2020-12 415   350000      1
#  2 2020-12 8002  470000      1
#  3 2021-01 411   155000      1
#  4 2021-01 8002  630000      2
#  5 2021-02 411    23200      2
#  6 2021-02 8001  620667.     3
#  7 2021-02 8002  526500      2
#  8 2021-03 411   105180      5
#  9 2021-03 8001  419967.     3
# 10 2021-04 411    73250      2
# # ... with 20 more rows

(我添加了n 以表明正在聚合行,您无需将其保留在代码中。)

【讨论】:

以上是关于根据月份日期列查找数据框列中每个因素的平均值[重复]的主要内容,如果未能解决你的问题,请参考以下文章

如何根据日期列中的月份从 SQLite 中获取数据

根据其他列中描述的范围填充数据框列

将数据框列中的日期与单个日期进行比较

在熊猫数据框列中查找特定文本

如何在熊猫数据框列中获取 NaN 观察的频率 [重复]

根据其他列值从数据框列中的列表中删除最后一个元素