在data.table R中滚动连接
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在data.table R中滚动连接相关的知识,希望对你有一定的参考价值。
您好我想使用data.table
包在R中执行滚动连接。在“日期”列上加入时有多个匹配,因此我想在“字段”列中使用by
中的data.table
参数,以防止来自不同字段的数据连接在一起。示例数据
d1<-structure(list(Field = c("6", "W62", "6", "6", "12S", "19-1",
"6", "6", "12S", "7", "6", "12S", "W62", "6", "12S", "W62", "12S",
"6", "6", "7", "12S", "12S", "W62", "7", "12S", "6", "12S", "7",
"12S", "7", "6", "7", "12S", "7", "6", "6", "6", "6", "12S",
"7", "7", "6", "6", "12S", "7", "12S", "12S", "12S", "19-1",
"6"), Date = structure(c(16994, 17240, 17240, 17401, 17048, 17417,
17387, 17394, 17382, 17414, 17029, 17403, 17045, 17359, 17179,
17281, 17152, 16972, 16987, 17042, 17282, 17415, 17281, 17266,
17179, 17190, 17057, 17380, 17280, 17178, 17178, 17343, 17373,
17043, 17190, 17343, 17253, 16981, 17079, 17043, 17270, 17366,
16981, 17357, 17366, 17415, 17079, 17190, 17385, 17008), class = "Date"),
NlbsAcre = c(NA, 18874.6557383659, 2477.08251404958, NA,
NA, 19658.0054165823, NA, NA, 12621.0827111083, NA, NA, 16764.41968227,
16764.9745173044, NA, 7671.24950330348, 21341.6444661863,
5197.26333885612, NA, NA, NA, 39560.8958554292, 18162.4040880297,
22578.1487456647, 15842.9161753361, 3613.95523726973, 2601.07083566694,
17766.9873538952, NA, 44728.1837479613, 2279.60909695434,
2014.7720270382, NA, 14847.7006686211, NA, 3082.31758038481,
NA, 2427.53558465175, NA, 23641.2999848709, NA, NA, NA, NA,
5928.31591997149, NA, 22162.2028819815, 18972.2228621189,
6534.4257935542, 12630.9231775315, NA)), .Names = c("Field",
"Date", "NlbsAcre"), class = c("data.table", "data.frame"), row.names = c(NA,
-50L), .internal.selfref = <pointer: 0x0000000006540788>)
d2<-structure(list(Field = c("6", "W62", "7", "12S", "19-1", "12S",
"6", "6", "19-1", "19-1", "6", "7", "W62", "19-1", "12S", "7",
"19-1", "7", "12S", "12S", "12S", "7", "6", "7", "6", "7", "W62",
"19-1", "6", "6", "12S", "12S", "6", "6", "12S", "6", "12S",
"19-1", "6", "W62", "W62", "6", "7", "7", "6", "19-1", "W62",
"6", "12S", "7"), Date = structure(c(16993, 17140, 17208, 17443,
17063, 16948, 17415, 16926, 17316, 16922, 16981, 17043, 17219,
17252, 17392, 17244, 17179, 17017, 17042, 17031, 17013, 17104,
17273, 16954, 17364, 16993, 17168, 17028, 17208, 16966, 17241,
16945, 17038, 17169, 17379, 17183, 17238, 17054, 17244, 16952,
17044, 17359, 17219, 17303, 17007, 17151, 16926, 17178, 17382,
17364), class = "Date"), TotN = c(79.244802845739, 94.193700050628,
21.075505564932, 692.152760834712, 224.689064446728, 172.576578578436,
47.406177406404, 102.53239575903, 818.80997295717, 476.174916307807,
125.828033450364, 58.270026966444, 75.465909993456, 435.049246131543,
337.913876678769, 31.714327953234, 305.353940577156, 72.621457768224,
393.815453005314, 428.540114240892, 318.97091713563, 73.888113736431,
79.0380747113805, 147.493527174027, 65.5311189906495, 59.269732271703,
119.390398108236, 110.706003557451, 21.96790939404, 149.060445984684,
128.143343232486, 208.621943093862, 75.770138571561, 47.496596179338,
132.723654607278, 43.92222198012, 145.150910469252, 215.88105225024,
21.393670871196, 72.969536052, 86.335878117078, 103.524169592979,
19.920230115264, 44.968722966108, 62.244487239885, 338.593490463303,
96.7285416279375, 45.537296152302, 422.630318314444, 58.5336350807685
)), .Names = c("Field", "Date", "TotN"), class = c("data.table",
"data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x0000000006540788>)
示例这是我尝试通过“字段”列在“日期”列上执行滚动连接。显然我可以通过Field分别处理数据并单独处理,但我想避免使用该选项。
>d1[d2, roll = "nearest", on = .(Date),by=.(Field)]
Error in `[.data.table`(d1, d2, roll = "nearest", on = .(Date), by = .(Field)) :
'by' or 'keyby' is supplied but not j
答案
你快到了。
您可以同时加入多个列。所以,你可以在Field
子句中包含on
(Date
是最后的,因为它将用于滚动连接):
library(data.table)
d1[d2, roll = "nearest", on = .(Field, Date)]
为了更好地验证,可以订购结果
d1[d2, roll = "nearest", on = .(Field, Date)][order(Field, Date)]
Field Date NlbsAcre TotN 1: 12S 2016-05-24 NA 208.62194 2: 12S 2016-05-27 NA 172.57658 3: 12S 2016-07-31 NA 318.97092 4: 12S 2016-08-18 NA 428.54011 5: 12S 2016-08-29 NA 393.81545 6: 12S 2017-03-13 44728.184 145.15091 7: 12S 2017-03-16 44728.184 128.14334 8: 12S 2017-08-01 12621.083 132.72365 9: 12S 2017-08-04 12621.083 422.63032 10: 12S 2017-08-14 12621.083 337.91388 11: 12S 2017-10-04 22162.203 692.15276 12: 19-1 2016-05-01 12630.923 476.17492 13: 19-1 2016-08-15 12630.923 110.70600 14: 19-1 2016-09-10 12630.923 215.88105 15: 19-1 2016-09-19 12630.923 224.68906 16: 19-1 2016-12-16 12630.923 338.59349 17: 19-1 2017-01-13 12630.923 305.35394 18: 19-1 2017-03-27 12630.923 435.04925 19: 19-1 2017-05-30 12630.923 818.80997 20: 6 2016-05-05 NA 102.53240 21: 6 2016-06-14 NA 149.06045 22: 6 2016-06-29 NA 125.82803 23: 6 2016-06-29 NA 125.82803 24: 6 2016-07-11 NA 79.24480 25: 6 2016-07-25 NA 62.24449 26: 6 2016-08-25 NA 75.77014 27: 6 2017-01-03 2014.772 47.49660 28: 6 2017-01-12 2014.772 45.53730 29: 6 2017-01-17 2014.772 43.92222 30: 6 2017-02-11 3082.318 21.96791 31: 6 2017-03-19 2477.083 21.39367 32: 6 2017-04-17 2427.536 79.03807 33: 6 2017-07-12 NA 103.52417 34: 6 2017-07-17 NA 65.53112 35: 6 2017-09-06 NA 47.40618 36: 7 2016-06-02 NA 147.49353 37: 7 2016-07-11 NA 59.26973 38: 7 2016-08-04 NA 72.62146 39: 7 2016-08-30 NA 58.27003 40: 7 2016-08-30 NA 58.27003 41: 7 2016-10-30 NA 73.88811 42: 7 2017-02-11 2279.609 21.07551 43: 7 2017-02-22 2279.609 19.92023 44: 7 2017-03-19 15842.916 31.71433 45: 7 2017-05-17 NA 44.96872 46: 7 2017-07-17 NA 58.53364 47: W62 2016-05-05 16764.975 96.72854 48: W62 2016-05-31 16764.975 72.96954 49: W62 2016-08-31 16764.975 86.33588 50: W62 2016-12-05 16764.975 94.19370 51: W62 2017-01-02 18874.656 119.39040 52: W62 2017-02-22 18874.656 75.46591 Field Date NlbsAcre TotN
以上是关于在data.table R中滚动连接的主要内容,如果未能解决你的问题,请参考以下文章
R语言data.table进行滚动数据连接,滚动连接通常用于分析涉及时间的数据(例如商业销售活动和对应的广告投放的安排之之间的关系)实战:实战和动画说明滚动数据连接的形式及方法
使用 data.table 包滚动平均值到 R 中的多个变量
R语言data.table导入数据实战:data.table进行多表数据连接(mergejoin)内连接左连接外连接