如何在 ruby​​ 中批处理枚举

Posted

技术标签:

【中文标题】如何在 ruby​​ 中批处理枚举【英文标题】:How to batch enumerables in ruby 【发布时间】:2020-04-28 00:50:55 【问题描述】:

为了了解 ruby​​ 的 enumerable,我有类似以下的内容

FileReader.read(very_big_file)
          .lazy
          .flat_map |line| get_array_of_similar_words  # array.size is ~10
          .each_slice(100) # wait for 100 items
          .map|array| process_100_items

尽管每个flat_map 调用都会发出一个包含~10 个项目的数组,但我希望each_slice 调用能够批量处理100 个项目,但事实并非如此。即等到有 100 个项目,然后再将它们传递给最终的 .map 调用。

如何在响应式编程中实现类似于buffer 函数的功能?

【问题讨论】:

“但事实并非如此” - 会发生什么? 也不要使用map,你的意思是each 什么是FileReader?在 Ruby 中有 File.read 但不返回枚举数。 @SergioTulentsev。 each_slice 将尝试将可枚举的 10 个项目分成 100 个批次。因此,它将返回未更改的 10 个项目的可枚举。使用 each/map 对问题无关紧要。 1.upto(3).lazy.flat_map |i| [i, i] .each_slice(3).to_a 返回[[1, 1, 2], [2, 3, 3]],这对我来说似乎是正确的。也许您过于简化了您的示例? 【参考方案1】:

要了解lazy 如何影响计算,让我们看一个示例。先构造一个文件:

str =<<~_
Now is the
time for all
good Ruby coders
to come to
the aid of
their bowling
team
_

fname = 't' 
File.write(fname, str)
  #=> 82

并指定切片大小:

slice_size = 4

现在我将逐行读取,将这些行拆分为单词,删除重复的单词,然后将这些单词附加到一个数组中。一旦数组包含至少 4 个单词,我将取前 4 个单词并将它们映射到 4 个单词中最长的单词。执行此操作的代码如下。为了显示计算的进展情况,我将使用puts 语句对代码进行加盐。请注意,没有块的IO::foreach 返回一个枚举器。

IO.foreach(fname).
   lazy.
   tap  |o| puts "o1 = #o" .
   flat_map  |line|
     puts "line = #line"
     puts "line.split.uniq = #line.split.uniq "
     line.split.uniq .
   tap  |o| puts "o2 = #o" .
   each_slice(slice_size).
   tap  |o| puts "o3 = #o" .
   map  |arr|
     puts "arr = #arr, arr.max = #arr.max_by(&:size)"
     arr.max_by(&:size) .
   tap  |o| puts "o3 = #o" .
   to_a
  #=> ["time", "good", "coders", "bowling", "team"] 

显示如下:

o1 = #<Enumerator::Lazy:0x00005992b1ab6970>
o2 = #<Enumerator::Lazy:0x00005992b1ab6880>
o3 = #<Enumerator::Lazy:0x00005992b1ab6678>
o3 = #<Enumerator::Lazy:0x00005992b1ab6420>
line = Now is the
line.split.uniq = ["Now", "is", "the"] 
line = time for all
line.split.uniq = ["time", "for", "all"] 
arr = ["Now", "is", "the", "time"], arr.max = time
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"] 
arr = ["for", "all", "good", "Ruby"], arr.max = good
line = to come to
line.split.uniq = ["to", "come"] 
line = the aid of
line.split.uniq = ["the", "aid", "of"] 
arr = ["coders", "to", "come", "the"], arr.max = coders
line = their bowling
line.split.uniq = ["their", "bowling"] 
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
line = team
line.split.uniq = ["team"] 
arr = ["team"], arr.max = team

如果删除lazy. 行,则返回值相同,但会显示以下内容(现在末尾的.to_a 是多余的):

o1 = #<Enumerator:0x00005992b1a438f8>
line = Now is the
line.split.uniq = ["Now", "is", "the"] 
line = time for all
line.split.uniq = ["time", "for", "all"] 
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"] 
line = to come to
line.split.uniq = ["to", "come"] 
line = the aid of
line.split.uniq = ["the", "aid", "of"] 
line = their bowling
line.split.uniq = ["their", "bowling"] 
line = team
line.split.uniq = ["team"] 
o2 = ["Now", "is", "the", "time", "for", "all", "good", "Ruby",
      "coders", "to", "come", "the", "aid", "of", "their",
      "bowling", "team"]
o3 = #<Enumerator:0x00005992b1a41a08>
arr = ["Now", "is", "the", "time"], arr.max = time
arr = ["for", "all", "good", "Ruby"], arr.max = good
arr = ["coders", "to", "come", "the"], arr.max = coders
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
arr = ["team"], arr.max = team
o3 = ["time", "good", "coders", "bowling", "team"]

【讨论】:

以上是关于如何在 ruby​​ 中批处理枚举的主要内容,如果未能解决你的问题,请参考以下文章

如何创建无限可枚举的Times?

Ruby:你如何设置枚举器的状态?

如何按此哈希数组进行分组(可枚举)[Ruby,Rails]

如何根据枚举哈希对数组进行排序并返回 Ruby 中的最大值?

如何在 Rails 3 的 Postgres 数据库中使用枚举? [关闭]

使用辅助方法和 Ruby 可枚举返回 html 格式的对象