使用贪婪令牌保留 REGEX 分隔符

Posted

技术标签:

【中文标题】使用贪婪令牌保留 REGEX 分隔符【英文标题】:Keep REGEX delimiter with greedy token 【发布时间】:2021-08-01 08:24:54 【问题描述】:

早安,

我正在用 Java 编写一个方程评估器,并使用 REGEX 来识别值,包括科学记数法,这是我在其中一个提要中找到的(并略微采用),如下所示:

[\d.]+(?:E-?\d+)?

我遇到的问题是我想保留分隔值。我怎样才能做到这一点?我在 regex101.com 上玩过它,但是,当我使用前瞻和后顾时,它会抱怨贪婪的令牌。

我在 *** 上找到了其他几个 REGEX,但找不到保留分隔符的。

提前致谢!

【问题讨论】:

The problem I'm experiencing is that I want to keep the delimited value. 是什么意思?你能在问题中添加一些例子吗? @Thefourthbird,例如,如果我说:String eqn = "cos(2123.324E3)*ln(e^x)+123.345E-6*sin(sin(sin(x)))"; String[] eqnSplit = eqn.split("([\\d.]+(?:E-?\\d+)?)"); 我得到以下结果: [cos(, )*ln(e^x)+, *sin(sin(sin(x) ))] 因此,在这种情况下是分隔符的值(如果我的术语是正确的)被删除。但是,我仍然想要这些值 【参考方案1】:

除了使用拆分之外,您还可以使用交替来获取匹配项,或者匹配第一个模式不直接跟随的所有字符。

[\d.]+(?:E-?\d+)?|(?:(?![\d.]+(?:E-?\d+)?).)+

模式匹配:

[\d.]+(?:E-?\d+)?你的科学记数法模式 |或者 (?:非捕获组 (?![\d.]+(?:E-?\d+)?).负前瞻,当科学记数法不在右边时匹配单个字符 )+关闭非捕获组,重复1+次以匹配至少一个字符

Regex demo | Java demo

例如

String regex = "[\\d.]+(?:E-?\\d+)?|(?:(?![\\d.]+(?:E-?\\d+)?).)+";
String string = "cos(2123.324E3)*ln(e^x)+123.345E-6*sin(sin(sin(x)))";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) 
    System.out.println(matcher.group(0));

输出

cos(
2123.324E3
)*ln(e^x)+
123.345E-6
*sin(sin(sin(x)))

【讨论】:

我使用 split 和 matcher 方法进行了 1'000 次迭代,并且 matcher 方法执行得更快【参考方案2】:

可能不是世界上最快的事情,但你可以像这样做一些事情

/**
 * Holds onto and supplies the supplied split delimiter(s) to the split
 * array elements.<br><br>
 * <p>
 * This method creates a Regular Expression (RegEx) that is to be placed
 * within a String.split() method to acquire the desired array
 * content.<br><br>
 *
 * @param inputString       (String) The string to split.<br>
 *
 * @param delimiterPosition (Integer) A integer value of either 0, 1, or 2.
 *                          The specific value determines how the detected
 *                          delimiter types are placed within the array:<pre>
 *
 *      0       Delimiter as separate element:
 *              a;b;c;d = [a, ;, b, ;, c, ;, d]
 *              Core regex is: .split("((?&lt;=;)|(?=;))")
 *              Lookahead and Lookbehind used.
 *
 *      1       Delimiter at end of each element except last:
 *              a;b;c;d = [a;, b;, c;, d]
 *              Core regex is: .split("(?&lt;=;)")
 *              Lookahead used only.
 *
 *      2       Delimiter at beginning of each element except first:
 *              a;b;c;d = [a, ;b, ;c, ;d]
 *              Core regex is: .split("(?=;)")
 *              Lookbehind used only.</pre><br>
 *
 * If nothing is supplied then each character of the supplied input string
 * is split into the sting array.<br><br>
 *
 * If any supplied delimiters or delimiter characters happen to be RegEx
 * Meta Characters such as: ( ) [ ]   \ ^ $ | ? * + . &lt; &gt; - = ! for
 * example then those delimiters must be Escaped with a Double Backslash
 * (ie: "\\+" ) when supplied otherwise an exception will occur.<br>
 *
 * @param delimiters        (1D String Array or one to multiple comma
 *                          delimited String Entries) Any number of string
 *                          delimiters can be supplied as long as they are
 *                          separated with a comma (,).<br>
 *
 * @return (String) The Regular Expression (RegEx) to be used within a
 *         String.split() method.
 */
public static String[] SplitAndKeepDelimiters(String inputString, int delimiterPosition, String... delimiters) 
    if (delimiters.length < 1) 
        return inputString.split("");
    

    // build regex...
    String regEx = "";
    for (int i = 0; i < delimiters.length; i++) 
        switch (delimiterPosition) 
            case 0:
                regEx += regEx.isEmpty() ? "((?<=" + delimiters[i] + ")|(?=" + delimiters[i] + "))"
                        : "|((?<=" + delimiters[i] + ")|(?=" + delimiters[i] + "))";
                break;
            case 1:
                regEx += regEx.isEmpty() ? "(?<=" + delimiters[i] + ")"
                        : "|(?<=" + delimiters[i] + ")";
                break;
            case 2:
                regEx += regEx.isEmpty() ? "(?=" + delimiters[i] + ")"
                        : "|(?=" + delimiters[i] + ")";
                break;
        
    
    return inputString.split(regEx);

上述方法将允许您在多个分隔符上进行拆分。

【讨论】:

以上是关于使用贪婪令牌保留 REGEX 分隔符的主要内容,如果未能解决你的问题,请参考以下文章

C# 使用 Regex.Split 拆分大字符串。必须保留分隔符

Java Regex在分隔符和保留分隔符之间拆分字符串[重复]

python学习第十五节(正则)

保留字符串中的分隔标记列表[关闭]

使用 RegEx 解析具有复杂分隔符的字符串

使用 RegEx 忽略分隔符前的特定字符