如何在 ruby 中批处理枚举
Posted
技术标签:
【中文标题】如何在 ruby 中批处理枚举【英文标题】:How to batch enumerables in ruby 【发布时间】:2020-04-28 00:50:55 【问题描述】:为了了解 ruby 的 enumerable
,我有类似以下的内容
FileReader.read(very_big_file)
.lazy
.flat_map |line| get_array_of_similar_words # array.size is ~10
.each_slice(100) # wait for 100 items
.map|array| process_100_items
尽管每个flat_map
调用都会发出一个包含~10 个项目的数组,但我希望each_slice
调用能够批量处理100 个项目,但事实并非如此。即等到有 100 个项目,然后再将它们传递给最终的 .map
调用。
如何在响应式编程中实现类似于buffer 函数的功能?
【问题讨论】:
“但事实并非如此” - 会发生什么? 也不要使用map
,你的意思是each
。
什么是FileReader
?在 Ruby 中有 File.read
但不返回枚举数。
@SergioTulentsev。 each_slice 将尝试将可枚举的 10 个项目分成 100 个批次。因此,它将返回未更改的 10 个项目的可枚举。使用 each/map 对问题无关紧要。
1.upto(3).lazy.flat_map |i| [i, i] .each_slice(3).to_a
返回[[1, 1, 2], [2, 3, 3]]
,这对我来说似乎是正确的。也许您过于简化了您的示例?
【参考方案1】:
要了解lazy
如何影响计算,让我们看一个示例。先构造一个文件:
str =<<~_
Now is the
time for all
good Ruby coders
to come to
the aid of
their bowling
team
_
fname = 't'
File.write(fname, str)
#=> 82
并指定切片大小:
slice_size = 4
现在我将逐行读取,将这些行拆分为单词,删除重复的单词,然后将这些单词附加到一个数组中。一旦数组包含至少 4 个单词,我将取前 4 个单词并将它们映射到 4 个单词中最长的单词。执行此操作的代码如下。为了显示计算的进展情况,我将使用puts
语句对代码进行加盐。请注意,没有块的IO::foreach 返回一个枚举器。
IO.foreach(fname).
lazy.
tap |o| puts "o1 = #o" .
flat_map |line|
puts "line = #line"
puts "line.split.uniq = #line.split.uniq "
line.split.uniq .
tap |o| puts "o2 = #o" .
each_slice(slice_size).
tap |o| puts "o3 = #o" .
map |arr|
puts "arr = #arr, arr.max = #arr.max_by(&:size)"
arr.max_by(&:size) .
tap |o| puts "o3 = #o" .
to_a
#=> ["time", "good", "coders", "bowling", "team"]
显示如下:
o1 = #<Enumerator::Lazy:0x00005992b1ab6970>
o2 = #<Enumerator::Lazy:0x00005992b1ab6880>
o3 = #<Enumerator::Lazy:0x00005992b1ab6678>
o3 = #<Enumerator::Lazy:0x00005992b1ab6420>
line = Now is the
line.split.uniq = ["Now", "is", "the"]
line = time for all
line.split.uniq = ["time", "for", "all"]
arr = ["Now", "is", "the", "time"], arr.max = time
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"]
arr = ["for", "all", "good", "Ruby"], arr.max = good
line = to come to
line.split.uniq = ["to", "come"]
line = the aid of
line.split.uniq = ["the", "aid", "of"]
arr = ["coders", "to", "come", "the"], arr.max = coders
line = their bowling
line.split.uniq = ["their", "bowling"]
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
line = team
line.split.uniq = ["team"]
arr = ["team"], arr.max = team
如果删除lazy.
行,则返回值相同,但会显示以下内容(现在末尾的.to_a
是多余的):
o1 = #<Enumerator:0x00005992b1a438f8>
line = Now is the
line.split.uniq = ["Now", "is", "the"]
line = time for all
line.split.uniq = ["time", "for", "all"]
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"]
line = to come to
line.split.uniq = ["to", "come"]
line = the aid of
line.split.uniq = ["the", "aid", "of"]
line = their bowling
line.split.uniq = ["their", "bowling"]
line = team
line.split.uniq = ["team"]
o2 = ["Now", "is", "the", "time", "for", "all", "good", "Ruby",
"coders", "to", "come", "the", "aid", "of", "their",
"bowling", "team"]
o3 = #<Enumerator:0x00005992b1a41a08>
arr = ["Now", "is", "the", "time"], arr.max = time
arr = ["for", "all", "good", "Ruby"], arr.max = good
arr = ["coders", "to", "come", "the"], arr.max = coders
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
arr = ["team"], arr.max = team
o3 = ["time", "good", "coders", "bowling", "team"]
【讨论】:
以上是关于如何在 ruby 中批处理枚举的主要内容,如果未能解决你的问题,请参考以下文章
如何根据枚举哈希对数组进行排序并返回 Ruby 中的最大值?