foreach 拆分 data.tables 的迭代器问题: 中的错误:选择了未定义的列
Posted
技术标签:
【中文标题】foreach 拆分 data.tables 的迭代器问题: 中的错误:选择了未定义的列【英文标题】:Iterator issue with foreach splitting data.tables: Error in : undefined columns selectedforeach 拆分 data.tables 的迭代器问题: 中的错误:选择了未定义的列 【发布时间】:2015-12-25 16:40:31 【问题描述】:使用 foreach
包很难调试此问题,因为我的可重现示例运行良好,但这里是对问题的简要描述以及我想要实现的目标。
我正在使用一些代码originally posted by Steve Weston,它将根据一个关键列拆分data.table
,这是一个因素。迭代器将遍历data.table
的“块”,并可以访问表的拆分和用于生成拆分(键)的索引值。
虽然这种方法以前在各种场合都对我有用,但这次我在foreach
循环中收到错误消息。
Error in : undefined columns selected
我的触发问题的代码如下:
library(foreach)
library(data.table)
str(dat.in)
names(dat.in)
class(dat.in$fc.item)
key(dat.in)
library(foreach)
fc = foreach(dt.sub = isplitDT(dat.in, levels(dat.in$fc.item))) %do%
# code to execute on each core/iteration
print(dt.sub$key[1])
dt.sub$value
数据已通过dput
输出,可以在问题底部找到。
我检查了我的dat.in
对象,结果如下:
> str(dat.in)
Classes ‘data.table’ and 'data.frame': 313 obs. of 3 variables:
$ fc.item: Factor w/ 1 level "A": 1 1 1 1 1 1 1 1 1 1 ...
$ period : num 1 2 3 4 5 6 7 8 9 10 ...
$ y : int 287718 343083 291241 298469 300267 356797 225253 294265 337773 318346 ...
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "sorted")= chr "fc.item"
> names(dat.in)
[1] "fc.item" "period" "y"
> class(dat.in$fc.item)
[1] "factor"
> key(dat.in)
[1] "fc.item"
所以我创建了一个可重现的示例来匹配我的场景,代码如下所示:
library(foreach)
library(data.table)
# generate data and set key of the data.table
dt = data.table(item = as.factor(paste0("item-", sort(rep(rep(1:10),10)))),
t = rep(1:10,10),
y = as.integer(abs(rnorm(100, 0,10))))
setkeyv(dt,"item")
## helper functions written by Steve Weston
isplitDT = function(x, vals)
ival <- iter(vals)
nextEl <- function()
val <- nextElem(ival)
list(value=x[val], key=val)
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
dtcomb = function(...)
rbindlist(list(...))
############################################
## main function to split-process-combine using isplitDT and dtcomb
result = foreach(dt.sub = isplitDT(dt, levels(dt$item)),
.combine = "dtcomb") %do%
print(dt.sub$key[1])
dt.sub$value
print(paste("Did it work =", sum(result == dt) == 300))
我的困难是这段代码工作得很好,但我看不出与之前失败的foreach
循环有什么区别。如果有人能指出我做错了什么,我将不胜感激:我猜这是我犯的一个非常愚蠢的错误!
原始问题的数据在这里:
> dput(dat.in)
structure(list(fc.item = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "A", class = "factor"), period = c(1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,
126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138,
139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151,
152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177,
178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190,
191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203,
204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216,
217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229,
230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242,
243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255,
256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268,
269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281,
282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294,
295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307,
308, 309, 310, 311, 312, 313), y = c(287718L, 343083L, 291241L,
298469L, 300267L, 356797L, 225253L, 294265L, 337773L, 318346L,
270013L, 294559L, 265521L, 292651L, 326301L, 274133L, 225154L,
377162L, 341432L, 308449L, 271186L, 272062L, 296231L, 264176L,
272708L, 279367L, 265335L, 313174L, 273261L, 327539L, 322067L,
260082L, 317229L, 268120L, 231941L, 322187L, 255401L, 261383L,
232523L, 333930L, 291594L, 325835L, 282851L, 309369L, 306474L,
331198L, 333453L, 282738L, 223454L, 343898L, 404772L, 420113L,
363688L, 283529L, 304850L, 304265L, 260494L, 286632L, 291025L,
234396L, 249829L, 243722L, 281929L, 252805L, 291330L, 217721L,
233124L, 291646L, 214542L, 272663L, 246599L, 248463L, 276895L,
238617L, 353554L, 240288L, 260862L, 215496L, 264241L, 251804L,
317853L, 261112L, 241778L, 274305L, 260939L, 284144L, 238942L,
268412L, 207012L, 322499L, 216205L, 283388L, 210637L, 283405L,
232547L, 317938L, 232847L, 254665L, 293350L, 356068L, 272952L,
262610L, 449750L, 369915L, 294255L, 267604L, 244032L, 263057L,
226927L, 249796L, 235638L, 254442L, 226594L, 255157L, 219919L,
260555L, 202837L, 282846L, 242090L, 324165L, 195997L, 247319L,
214422L, 211885L, 238364L, 228117L, 243929L, 183895L, 204071L,
228919L, 227446L, 244663L, 225126L, 251333L, 199212L, 205160L,
205272L, 211975L, 201057L, 240099L, 203967L, 276464L, 180230L,
256560L, 185168L, 209131L, 209283L, 266414L, 221112L, 247453L,
285895L, 310151L, 236241L, 246656L, 371197L, 346882L, 308349L,
218239L, 222147L, 240713L, 227690L, 195599L, 254913L, 203627L,
209650L, 182243L, 213345L, 239517L, 194998L, 220132L, 248232L,
187663L, 182200L, 180731L, 188778L, 218335L, 234029L, 192304L,
183598L, 165051L, 207673L, 168798L, 187578L, 175816L, 192978L,
212731L, 208684L, 176274L, 210670L, 227207L, 203419L, 183886L,
215670L, 158552L, 209275L, 186366L, 228439L, 176090L, 252070L,
203126L, 235651L, 216970L, 222579L, 224996L, 241870L, 194938L,
292197L, 283827L, 281966L, 157419L, 256606L, 184074L, 223767L,
206831L, 196338L, 177536L, 179195L, 180747L, 228955L, 253872L,
254636L, 172384L, 181243L, 228535L, 178251L, 166644L, 193261L,
191703L, 158698L, 184620L, 188777L, 171378L, 176349L, 168550L,
173176L, 198650L, 176989L, 163293L, 164869L, 165503L, 185504L,
172217L, 164511L, 160720L, 175902L, 171150L, 140939L, 155618L,
157323L, 171457L, 165290L, 140833L, 158788L, 162213L, 201366L,
248834L, 170899L, 159564L, 231487L, 281335L, 268906L, 134745L,
155222L, 133268L, 223074L, 211489L, 167485L, 139614L, 178060L,
186616L, 141583L, 172486L, 175021L, 187544L, 153492L, 245626L,
168411L, 166539L, 148776L, 191410L, 135434L, 153281L, 203938L,
155049L, 149193L, 168851L, 168000L, 143976L, 167995L, 172333L,
143025L, 168156L, 175161L, 184271L, 148113L, 153620L, 178359L,
143852L, 139743L, 159931L, 181351L, 170455L, 140985L, 136863L,
167934L, 162680L, 181756L, 212960L, 149715L, 168102L, 175952L,
275313L, 276390L)), .Names = c("fc.item", "period", "y"), row.names = c(NA,
-313L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x103804778>, sorted = "fc.item")
【问题讨论】:
请告知我应该如何改进问题以吸引兴趣。如果我今天没有听到任何消息,我将不得不安排赏金,因为这是我工作的主要绊脚石! 我找不到函数iter
试图运行你的代码
你说得对,它在iterators
包中,我认为可以通过foreach
获得
foreach 过去“依赖”迭代器包,但现在它“导入”迭代器。这意味着,例如,当您的代码使用“iter”时,您必须显式加载迭代器。
你能附上你的 sessionInfo 吗?我想知道您的某个包是否可能已过期(尤其是 data.table)。
【参考方案1】:
我认为完整的示例对于 cmets 来说太长了。除了您的 MRE 中的 setkey 问题(您将其发布为 key
,除非我误解了您的帖子)之外,我无法重现。这可能是你的问题吗?考虑这个替代方案:
> key(dat.in)
NULL
> setkey(dat.in,fc.item)
> foreach(dt.sub = isplitDT(dat.in, levels(dat.in$fc.item))) %do%
# code to execute on each core/iteration
print(dt.sub$key[1])
dt.sub$value
[1] "A"
[[1]]
fc.item period y
1: A 1 287718
2: A 2 343083
3: A 3 291241
4: A 4 298469
5: A 5 300267
---
309: A 309 149715
310: A 310 168102
311: A 311 175952
312: A 312 275313
313: A 313 276390
> isplitDT = function(x, vals)
ival <- iter(vals)
nextEl <- function()
val <- nextElem(ival)
list(value=x[val], key=val)
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
dtcomb = function(...)
rbindlist(list(...))
############################################
>
> ## main function to split-process-combine using isplitDT and dtcomb
> result = foreach(dt.sub = isplitDT(dt, levels(dt$item)),
.combine = "dtcomb") %do%
print(dt.sub$key[1])
dt.sub$value
[1] "item-1"
[1] "item-10"
[1] "item-2"
[1] "item-3"
[1] "item-4"
[1] "item-5"
[1] "item-6"
[1] "item-7"
[1] "item-8"
[1] "item-9"
> print(paste("Did it work =", sum(result == dt) == 300))
[1] "Did it work = TRUE"
请注意,在本例中,我将密钥设置为fc.item
(因为period
抛出了错误)
【讨论】:
谢谢。 MRE 没有问题,但仅在使用我的数据时。key()
用于显示我的表是键控的。我将修复我的数据(dput
输出各种结构和指针),以便您尝试使用它。将在今天晚些时候恢复。
我似乎仍然无法重现。在名为fc
的对象中调用“foreach”之后。我得到一个包含 313 个观察值的 data.frame/data.table...以上是关于foreach 拆分 data.tables 的迭代器问题: 中的错误:选择了未定义的列的主要内容,如果未能解决你的问题,请参考以下文章