foreach 拆分 data.tables 的迭代器问题: 中的错误:选择了未定义的列

Posted

技术标签:

【中文标题】foreach 拆分 data.tables 的迭代器问题: 中的错误:选择了未定义的列【英文标题】:Iterator issue with foreach splitting data.tables: Error in : undefined columns selectedforeach 拆分 data.tables 的迭代器问题: 中的错误:选择了未定义的列 【发布时间】:2015-12-25 16:40:31 【问题描述】:

使用 foreach 包很难调试此问题,因为我的可重现示例运行良好,但这里是对问题的简要描述以及我想要实现的目标。

我正在使用一些代码originally posted by Steve Weston,它将根据一个关键列拆分data.table,这是一个因素。迭代器将遍历data.table 的“块”,并可以访问表的拆分和用于生成拆分(键)的索引值。

虽然这种方法以前在各种场合都对我有用,但这次我在foreach 循环中收到错误消息。

Error in  : undefined columns selected

我的触发问题的代码如下:

library(foreach)
library(data.table)

str(dat.in)
names(dat.in)
class(dat.in$fc.item)
key(dat.in)
library(foreach)
fc = foreach(dt.sub = isplitDT(dat.in, levels(dat.in$fc.item))) %do%

    # code to execute on each core/iteration
    print(dt.sub$key[1])
    dt.sub$value 

数据已通过dput输出,可以在问题底部找到。

我检查了我的dat.in 对象,结果如下:

    > str(dat.in)
Classes ‘data.table’ and 'data.frame':  313 obs. of  3 variables:
 $ fc.item: Factor w/ 1 level "A": 1 1 1 1 1 1 1 1 1 1 ...
 $ period : num  1 2 3 4 5 6 7 8 9 10 ...
 $ y      : int  287718 343083 291241 298469 300267 356797 225253 294265 337773 318346 ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr "fc.item"
> names(dat.in)
[1] "fc.item" "period"  "y"      
> class(dat.in$fc.item)
[1] "factor"
> key(dat.in)
[1] "fc.item"

所以我创建了一个可重现的示例来匹配我的场景,代码如下所示:

library(foreach)
library(data.table)

# generate data and set key of the data.table
dt = data.table(item = as.factor(paste0("item-", sort(rep(rep(1:10),10)))),
                t = rep(1:10,10), 
                y = as.integer(abs(rnorm(100, 0,10))))
setkeyv(dt,"item")

## helper functions written by Steve Weston
isplitDT = function(x, vals) 
    ival <- iter(vals)
    nextEl <- function() 
        val <- nextElem(ival)
        list(value=x[val], key=val)
    
    obj <- list(nextElem=nextEl)
    class(obj) <- c('abstractiter', 'iter')
    obj

dtcomb = function(...) 
    rbindlist(list(...))

############################################

## main function to split-process-combine using isplitDT and dtcomb
result = foreach(dt.sub = isplitDT(dt, levels(dt$item)),
        .combine = "dtcomb") %do%

    print(dt.sub$key[1])
    dt.sub$value


print(paste("Did it work =", sum(result == dt) == 300))

我的困难是这段代码工作得很好,但我看不出与之前失败的foreach 循环有什么区别。如果有人能指出我做错了什么,我将不胜感激:我猜这是我犯的一个非常愚蠢的错误!


原始问题的数据在这里:

> dput(dat.in)
structure(list(fc.item = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "A", class = "factor"), period = c(1, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 
100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 
113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 
126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 
139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 
152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 
165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 
178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 
191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 
204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 
217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 
230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 
243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 
256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 
269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 
282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 
295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 
308, 309, 310, 311, 312, 313), y = c(287718L, 343083L, 291241L, 
298469L, 300267L, 356797L, 225253L, 294265L, 337773L, 318346L, 
270013L, 294559L, 265521L, 292651L, 326301L, 274133L, 225154L, 
377162L, 341432L, 308449L, 271186L, 272062L, 296231L, 264176L, 
272708L, 279367L, 265335L, 313174L, 273261L, 327539L, 322067L, 
260082L, 317229L, 268120L, 231941L, 322187L, 255401L, 261383L, 
232523L, 333930L, 291594L, 325835L, 282851L, 309369L, 306474L, 
331198L, 333453L, 282738L, 223454L, 343898L, 404772L, 420113L, 
363688L, 283529L, 304850L, 304265L, 260494L, 286632L, 291025L, 
234396L, 249829L, 243722L, 281929L, 252805L, 291330L, 217721L, 
233124L, 291646L, 214542L, 272663L, 246599L, 248463L, 276895L, 
238617L, 353554L, 240288L, 260862L, 215496L, 264241L, 251804L, 
317853L, 261112L, 241778L, 274305L, 260939L, 284144L, 238942L, 
268412L, 207012L, 322499L, 216205L, 283388L, 210637L, 283405L, 
232547L, 317938L, 232847L, 254665L, 293350L, 356068L, 272952L, 
262610L, 449750L, 369915L, 294255L, 267604L, 244032L, 263057L, 
226927L, 249796L, 235638L, 254442L, 226594L, 255157L, 219919L, 
260555L, 202837L, 282846L, 242090L, 324165L, 195997L, 247319L, 
214422L, 211885L, 238364L, 228117L, 243929L, 183895L, 204071L, 
228919L, 227446L, 244663L, 225126L, 251333L, 199212L, 205160L, 
205272L, 211975L, 201057L, 240099L, 203967L, 276464L, 180230L, 
256560L, 185168L, 209131L, 209283L, 266414L, 221112L, 247453L, 
285895L, 310151L, 236241L, 246656L, 371197L, 346882L, 308349L, 
218239L, 222147L, 240713L, 227690L, 195599L, 254913L, 203627L, 
209650L, 182243L, 213345L, 239517L, 194998L, 220132L, 248232L, 
187663L, 182200L, 180731L, 188778L, 218335L, 234029L, 192304L, 
183598L, 165051L, 207673L, 168798L, 187578L, 175816L, 192978L, 
212731L, 208684L, 176274L, 210670L, 227207L, 203419L, 183886L, 
215670L, 158552L, 209275L, 186366L, 228439L, 176090L, 252070L, 
203126L, 235651L, 216970L, 222579L, 224996L, 241870L, 194938L, 
292197L, 283827L, 281966L, 157419L, 256606L, 184074L, 223767L, 
206831L, 196338L, 177536L, 179195L, 180747L, 228955L, 253872L, 
254636L, 172384L, 181243L, 228535L, 178251L, 166644L, 193261L, 
191703L, 158698L, 184620L, 188777L, 171378L, 176349L, 168550L, 
173176L, 198650L, 176989L, 163293L, 164869L, 165503L, 185504L, 
172217L, 164511L, 160720L, 175902L, 171150L, 140939L, 155618L, 
157323L, 171457L, 165290L, 140833L, 158788L, 162213L, 201366L, 
248834L, 170899L, 159564L, 231487L, 281335L, 268906L, 134745L, 
155222L, 133268L, 223074L, 211489L, 167485L, 139614L, 178060L, 
186616L, 141583L, 172486L, 175021L, 187544L, 153492L, 245626L, 
168411L, 166539L, 148776L, 191410L, 135434L, 153281L, 203938L, 
155049L, 149193L, 168851L, 168000L, 143976L, 167995L, 172333L, 
143025L, 168156L, 175161L, 184271L, 148113L, 153620L, 178359L, 
143852L, 139743L, 159931L, 181351L, 170455L, 140985L, 136863L, 
167934L, 162680L, 181756L, 212960L, 149715L, 168102L, 175952L, 
275313L, 276390L)), .Names = c("fc.item", "period", "y"), row.names = c(NA, 
-313L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x103804778>, sorted = "fc.item") 

【问题讨论】:

请告知我应该如何改进问题以吸引兴趣。如果我今天没有听到任何消息,我将不得不安排赏金,因为这是我工作的主要绊脚石! 我找不到函数iter试图运行你的代码 你说得对,它在iterators 包中,我认为可以通过foreach 获得 foreach 过去“依赖”迭代器包,但现在它“导入”迭代器。这意味着,例如,当您的代码使用“iter”时,您必须显式加载迭代器。 你能附上你的 sessionInfo 吗?我想知道您的某个包是否可能已过期(尤其是 data.table)。 【参考方案1】:

我认为完整的示例对于 cmets 来说太长了。除了您的 MRE 中的 setkey 问题(您将其发布为 key,除非我误解了您的帖子)之外,我无法重现。这可能是你的问题吗?考虑这个替代方案:

> key(dat.in)
NULL
> setkey(dat.in,fc.item)
> foreach(dt.sub = isplitDT(dat.in, levels(dat.in$fc.item))) %do%
 
     # code to execute on each core/iteration
     print(dt.sub$key[1])
     dt.sub$value 
 
[1] "A"
[[1]]
     fc.item period      y
  1:       A      1 287718
  2:       A      2 343083
  3:       A      3 291241
  4:       A      4 298469
  5:       A      5 300267
 ---                      
309:       A    309 149715
310:       A    310 168102
311:       A    311 175952
312:       A    312 275313
313:       A    313 276390

> isplitDT = function(x, vals) 
     ival <- iter(vals)
     nextEl <- function() 
         val <- nextElem(ival)
         list(value=x[val], key=val)
     
     obj <- list(nextElem=nextEl)
     class(obj) <- c('abstractiter', 'iter')
     obj
 
 dtcomb = function(...) 
     rbindlist(list(...))
 
 ############################################
> 
> ## main function to split-process-combine using isplitDT and dtcomb
> result = foreach(dt.sub = isplitDT(dt, levels(dt$item)),
                  .combine = "dtcomb") %do%
                  
                      print(dt.sub$key[1])
                      dt.sub$value
                  
[1] "item-1"
[1] "item-10"
[1] "item-2"
[1] "item-3"
[1] "item-4"
[1] "item-5"
[1] "item-6"
[1] "item-7"
[1] "item-8"
[1] "item-9"
> print(paste("Did it work =", sum(result == dt) == 300))
[1] "Did it work = TRUE"

请注意,在本例中,我将密钥设置为fc.item(因为period 抛出了错误)

【讨论】:

谢谢。 MRE 没有问题,但仅在使用我的数据时。 key() 用于显示我的表是键控的。我将修复我的数据(dput 输出各种结构和指针),以便您尝试使用它。将在今天晚些时候恢复。 我似乎仍然无法重现。在名为fc 的对象中调用“foreach”之后。我得到一个包含 313 个观察值的 data.frame/data.table...

以上是关于foreach 拆分 data.tables 的迭代器问题: 中的错误:选择了未定义的列的主要内容,如果未能解决你的问题,请参考以下文章

R:如何在 foreach %dopar% 中拆分数据帧

合并多个data.tables

在 foreach-object 循环中拆分字符串

PHP:将字符串拆分为数组 foreach char

Azure.Data.Tables.TableClient 是线程安全的吗?

为啥 data.tables 的 X[Y] 连接不允许完全外连接或左连接?