如何匹配所有出现的正则表达式

Posted 2023-03-14

技术标签:

【中文标题】如何匹配所有出现的正则表达式【英文标题】：How to match all occurrences of a regex 【发布时间】：2010-09-09 23:17:09 【问题描述】：

有没有一种快速的方法可以在 Ruby 中找到正则表达式的每个匹配项？我查看了 Ruby STL 中的 Regex 对象并在 Google 上搜索无济于事。

【问题讨论】：

我读到这是我如何在字符串中搜索所有正则表达式模式并且非常困惑...... 【参考方案1】：

使用scan 应该可以解决问题：

string.scan(/regex/)

【讨论】：

但是这个案子是什么？ "匹配我！".scan(/.../) = [ "mat", "ch " "me!" ]，但所有出现的 /.../ 将是 [ "mat", "atc", "tch", "ch ", ... ] 不会的。 /.../ 是一个正常的贪婪正则表达式。它不会回溯匹配的内容。您可以尝试使用惰性正则表达式，但即使这样也可能还不够。查看正则表达式文档ruby-doc.org/core-1.9.3/Regexp.html 以正确表达您的正则表达式:) 这看起来像一个 Ruby WTF...为什么这是在 String 而不是 Regexp 和其他正则表达式的东西上？ Regexp 的文档中甚至都没有提到它我猜这是因为它是在 String 而不是 Regex 上定义和调用的......但它确实有意义。您可以编写一个正则表达式来使用 Regex#match 捕获所有匹配项并遍历捕获的组。在这里，您编写了一个部分匹配函数并希望它在给定字符串上多次应用，这不是 Regexp 的责任。我建议您检查 scan 的实现以更好地理解：ruby-doc.org/core-1.9.3/String.html#method-i-scan @MichaelDickens：在这种情况下，您可以使用/(?=(...))/。【参考方案2】：

要查找所有匹配的字符串，请使用 String 的 scan 方法。

str = "A 54mpl3 string w1th 7 numb3rs scatter36 ar0und"
str.scan(/\d+/)
#=> ["54", "3", "1", "7", "3", "36", "0"]

如果需要，MatchData，即 Regexp match 方法返回的对象的类型，请使用：

str.to_enum(:scan, /\d+/).map  Regexp.last_match 
#=> [#<MatchData "54">, #<MatchData "3">, #<MatchData "1">, #<MatchData "7">, #<MatchData "3">, #<MatchData "36">, #<MatchData "0">]

使用MatchData的好处是可以使用offset之类的方法：

match_datas = str.to_enum(:scan, /\d+/).map  Regexp.last_match 
match_datas[0].offset(0)
#=> [2, 4]
match_datas[1].offset(0)
#=> [7, 8]

如果您想了解更多信息，请查看以下问题：

“How do I get the match data for all occurrences of a Ruby regular expression in a string?” “Ruby regular expression matching enumerator with named capture support” “How to find out the starting point for each match in ruby”

阅读 Ruby 中的特殊变量 $&、$'、$1、$2 也会有所帮助。

【讨论】：

【参考方案3】：

如果你有一个带有组的正则表达式：

str="A 54mpl3 string w1th 7 numbers scatter3r ar0und"
re=/(\d+)[m-t]/

您可以使用 String 的 scan 方法来查找匹配组：

str.scan re
#> [["54"], ["1"], ["3"]]

要找到匹配的模式：

str.to_enum(:scan,re).map $&
#> ["54m", "1t", "3r"]

【讨论】：

str.scan(/\d+[m-t]/) # => ["54m", "1t", "3r"] 比 str.to_enum(:scan,re).map $& 更惯用也许你误会了。我回复的用户示例的正则表达式是：/(\d+)[m-t]/ 不是/\d+[m-t]/ 要写：re = /(\d+)[m-t]/; str.scan(re) 是相同的str.scan(/(\d+)[mt]/) 但我得到#> [["" 54 "], [" 1 "], [" 3 "]] 而不是"54m", "1t", "3r"] 问题是：如果我有一个带有组的正则表达式，并且想在不更改正则表达式（离开组）的情况下捕获所有模式，我该怎么做？从这个意义上说，一个可能的解决方案是：str.to_enum(:scan,re).map $&【参考方案4】：

您可以使用string.scan(your_regex).flatten。如果您的正则表达式包含组，它将以单个普通数组返回。

string = "A 54mpl3 string w1th 7 numbers scatter3r ar0und"
your_regex = /(\d+)[m-t]/
string.scan(your_regex).flatten
=> ["54", "1", "3"]

正则表达式也可以是一个命名组。

string = 'group_photo.jpg'
regex = /\A(?<name>.*)\.(?<ext>.*)\z/
string.scan(regex).flatten

你也可以使用gsub，如果你想要MatchData，这只是另一种方式。

str.gsub(/\d/).map Regexp.last_match

【讨论】：

从your_regex = /(\d+)[m-t]/ 中删除分组，您将不需要使用flatten。您的最后一个示例使用last_match，在这种情况下它可能是安全的，但它是全局的，如果在调用last_match 之前匹配了任何正则表达式，则可能会被覆盖。相反，根据模式和需求，使用 string.match(regex).captures # => ["group_photo", "jpg"] 或 string.scan(/\d+/) # => ["54", "3", "1", "7", "3", "0"] 可能更安全，如其他答案所示。

以上是关于如何匹配所有出现的正则表达式的主要内容，如果未能解决你的问题，请参考以下文章