在data.table R中滚动连接

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在data.table R中滚动连接相关的知识,希望对你有一定的参考价值。

您好我想使用data.table包在R中执行滚动连接。在“日期”列上加入时有多个匹配,因此我想在“字段”列中使用by中的data.table参数,以防止来自不同字段的数据连接在一起。示例数据

d1<-structure(list(Field = c("6", "W62", "6", "6", "12S", "19-1", 
"6", "6", "12S", "7", "6", "12S", "W62", "6", "12S", "W62", "12S", 
"6", "6", "7", "12S", "12S", "W62", "7", "12S", "6", "12S", "7", 
"12S", "7", "6", "7", "12S", "7", "6", "6", "6", "6", "12S", 
"7", "7", "6", "6", "12S", "7", "12S", "12S", "12S", "19-1", 
"6"), Date = structure(c(16994, 17240, 17240, 17401, 17048, 17417, 
17387, 17394, 17382, 17414, 17029, 17403, 17045, 17359, 17179, 
17281, 17152, 16972, 16987, 17042, 17282, 17415, 17281, 17266, 
17179, 17190, 17057, 17380, 17280, 17178, 17178, 17343, 17373, 
17043, 17190, 17343, 17253, 16981, 17079, 17043, 17270, 17366, 
16981, 17357, 17366, 17415, 17079, 17190, 17385, 17008), class = "Date"), 
    NlbsAcre = c(NA, 18874.6557383659, 2477.08251404958, NA, 
    NA, 19658.0054165823, NA, NA, 12621.0827111083, NA, NA, 16764.41968227, 
    16764.9745173044, NA, 7671.24950330348, 21341.6444661863, 
    5197.26333885612, NA, NA, NA, 39560.8958554292, 18162.4040880297, 
    22578.1487456647, 15842.9161753361, 3613.95523726973, 2601.07083566694, 
    17766.9873538952, NA, 44728.1837479613, 2279.60909695434, 
    2014.7720270382, NA, 14847.7006686211, NA, 3082.31758038481, 
    NA, 2427.53558465175, NA, 23641.2999848709, NA, NA, NA, NA, 
    5928.31591997149, NA, 22162.2028819815, 18972.2228621189, 
    6534.4257935542, 12630.9231775315, NA)), .Names = c("Field", 
"Date", "NlbsAcre"), class = c("data.table", "data.frame"), row.names = c(NA, 
-50L), .internal.selfref = <pointer: 0x0000000006540788>)

d2<-structure(list(Field = c("6", "W62", "7", "12S", "19-1", "12S", 
"6", "6", "19-1", "19-1", "6", "7", "W62", "19-1", "12S", "7", 
"19-1", "7", "12S", "12S", "12S", "7", "6", "7", "6", "7", "W62", 
"19-1", "6", "6", "12S", "12S", "6", "6", "12S", "6", "12S", 
"19-1", "6", "W62", "W62", "6", "7", "7", "6", "19-1", "W62", 
"6", "12S", "7"), Date = structure(c(16993, 17140, 17208, 17443, 
17063, 16948, 17415, 16926, 17316, 16922, 16981, 17043, 17219, 
17252, 17392, 17244, 17179, 17017, 17042, 17031, 17013, 17104, 
17273, 16954, 17364, 16993, 17168, 17028, 17208, 16966, 17241, 
16945, 17038, 17169, 17379, 17183, 17238, 17054, 17244, 16952, 
17044, 17359, 17219, 17303, 17007, 17151, 16926, 17178, 17382, 
17364), class = "Date"), TotN = c(79.244802845739, 94.193700050628, 
21.075505564932, 692.152760834712, 224.689064446728, 172.576578578436, 
47.406177406404, 102.53239575903, 818.80997295717, 476.174916307807, 
125.828033450364, 58.270026966444, 75.465909993456, 435.049246131543, 
337.913876678769, 31.714327953234, 305.353940577156, 72.621457768224, 
393.815453005314, 428.540114240892, 318.97091713563, 73.888113736431, 
79.0380747113805, 147.493527174027, 65.5311189906495, 59.269732271703, 
119.390398108236, 110.706003557451, 21.96790939404, 149.060445984684, 
128.143343232486, 208.621943093862, 75.770138571561, 47.496596179338, 
132.723654607278, 43.92222198012, 145.150910469252, 215.88105225024, 
21.393670871196, 72.969536052, 86.335878117078, 103.524169592979, 
19.920230115264, 44.968722966108, 62.244487239885, 338.593490463303, 
96.7285416279375, 45.537296152302, 422.630318314444, 58.5336350807685
)), .Names = c("Field", "Date", "TotN"), class = c("data.table", 
"data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x0000000006540788>)

示例这是我尝试通过“字段”列在“日期”列上执行滚动连接。显然我可以通过Field分别处理数据并单独处理,但我想避免使用该选项。

>d1[d2, roll = "nearest", on = .(Date),by=.(Field)]

Error in `[.data.table`(d1, d2, roll = "nearest", on = .(Date), by = .(Field)) : 
  'by' or 'keyby' is supplied but not j
答案

你快到了。

您可以同时加入多个列。所以,你可以在Field子句中包含onDate是最后的,因为它将用于滚动连接):

library(data.table)
d1[d2, roll = "nearest", on = .(Field, Date)]

为了更好地验证,可以订购结果

d1[d2, roll = "nearest", on = .(Field, Date)][order(Field, Date)]
    Field       Date  NlbsAcre      TotN
 1:   12S 2016-05-24        NA 208.62194
 2:   12S 2016-05-27        NA 172.57658
 3:   12S 2016-07-31        NA 318.97092
 4:   12S 2016-08-18        NA 428.54011
 5:   12S 2016-08-29        NA 393.81545
 6:   12S 2017-03-13 44728.184 145.15091
 7:   12S 2017-03-16 44728.184 128.14334
 8:   12S 2017-08-01 12621.083 132.72365
 9:   12S 2017-08-04 12621.083 422.63032
10:   12S 2017-08-14 12621.083 337.91388
11:   12S 2017-10-04 22162.203 692.15276
12:  19-1 2016-05-01 12630.923 476.17492
13:  19-1 2016-08-15 12630.923 110.70600
14:  19-1 2016-09-10 12630.923 215.88105
15:  19-1 2016-09-19 12630.923 224.68906
16:  19-1 2016-12-16 12630.923 338.59349
17:  19-1 2017-01-13 12630.923 305.35394
18:  19-1 2017-03-27 12630.923 435.04925
19:  19-1 2017-05-30 12630.923 818.80997
20:     6 2016-05-05        NA 102.53240
21:     6 2016-06-14        NA 149.06045
22:     6 2016-06-29        NA 125.82803
23:     6 2016-06-29        NA 125.82803
24:     6 2016-07-11        NA  79.24480
25:     6 2016-07-25        NA  62.24449
26:     6 2016-08-25        NA  75.77014
27:     6 2017-01-03  2014.772  47.49660
28:     6 2017-01-12  2014.772  45.53730
29:     6 2017-01-17  2014.772  43.92222
30:     6 2017-02-11  3082.318  21.96791
31:     6 2017-03-19  2477.083  21.39367
32:     6 2017-04-17  2427.536  79.03807
33:     6 2017-07-12        NA 103.52417
34:     6 2017-07-17        NA  65.53112
35:     6 2017-09-06        NA  47.40618
36:     7 2016-06-02        NA 147.49353
37:     7 2016-07-11        NA  59.26973
38:     7 2016-08-04        NA  72.62146
39:     7 2016-08-30        NA  58.27003
40:     7 2016-08-30        NA  58.27003
41:     7 2016-10-30        NA  73.88811
42:     7 2017-02-11  2279.609  21.07551
43:     7 2017-02-22  2279.609  19.92023
44:     7 2017-03-19 15842.916  31.71433
45:     7 2017-05-17        NA  44.96872
46:     7 2017-07-17        NA  58.53364
47:   W62 2016-05-05 16764.975  96.72854
48:   W62 2016-05-31 16764.975  72.96954
49:   W62 2016-08-31 16764.975  86.33588
50:   W62 2016-12-05 16764.975  94.19370
51:   W62 2017-01-02 18874.656 119.39040
52:   W62 2017-02-22 18874.656  75.46591
    Field       Date  NlbsAcre      TotN

以上是关于在data.table R中滚动连接的主要内容,如果未能解决你的问题,请参考以下文章

滚动连接R中的data.table

R语言data.table进行滚动数据连接,滚动连接通常用于分析涉及时间的数据(例如商业销售活动和对应的广告投放的安排之之间的关系)实战:实战和动画说明滚动数据连接的形式及方法

使用 data.table 包滚动平均值到 R 中的多个变量

R语言data.table导入数据实战:data.table进行多表数据连接(mergejoin)内连接左连接外连接

r data.table 围绕 ad-hoc 连接的函数包装器(在链中聚合)

R语言data.table导入数据实战:data.table使用自定义函数及Reduce函数实现一次性性多表连接