Java 正则表达式

Posted 2020-07-18 Litmmp

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Java 正则表达式相关的知识，希望对你有一定的参考价值。

JDK 中与正则表达式有关的类都在 java.util.regex 包中

基本使用方法

Pattern pattern = Pattern.compile("正则表达式"); //编译正则表达式
Matcher matcher = pattern.matcher("目标字符串"); //绑定目标字符串
//使用matcher对象进行具体的操作 ...

正则匹配

以下是 java.util.regex.Matcher 的部分源码，列举的都是与匹配操作相关的常用方法：

public final class Matcher implements MatchResult {

    public boolean matches() {...} // 整体匹配

    public boolean lookingAt() {...} // 起始匹配

    public boolean find() {...} // 下一个匹配

    public boolean find(int start) {...} // 指定起始位置匹配

    public String group() { ... } // 获取匹配内容，等价于group(0);

    public String group(int group) { ... } // 获取分组编号为group的匹配内容

    public String group(String name) { ... } // 获取分组名为name的匹配内容

    public int groupCount() { ... } // 正则中共有多少个捕获分组

    ...

}

三类匹配操作

1、整体匹配：matches()

等价于使用了位置匹配的元字符^和$，举个例子，给定正则"\w"，matches() 就相当于把给定的正则变成了"^\w$"

Pattern pattern = Pattern.compile("\\w+"); // 匹配一个单词
Matcher matcher1 = pattern.matcher("hello");
System.out.println(matcher1.matches()); // true
Matcher matcher2 = pattern.matcher("hello world");
System.out.println(matcher2.matches()); // false

2、起始匹配：lookingAt()

等价于使用了位置匹配的元字符^，该方法相当于把"\w"变成了"^\w"

Pattern pattern = Pattern.compile("\\w+"); // 匹配一个单词
Matcher matcher1 = pattern.matcher("hello");
System.out.println(matcher1.lookingAt()); // true
Matcher matcher2 = pattern.matcher(" hello");
System.out.println(matcher2.lookingAt()); // false

3、遍历匹配：find(); find(int start);

一个正则表达式往往能在一段文本中匹配到多个结果，例如"\w"能在"hello world"中匹配到两个结果："hello"和"world"。

如果把匹配结果 {"hello", "world"} 看作是一个 List<String> 集合，那么find(...)就类似于Java迭代器中的 hasNext()；作用就是确定"正则表达式"能否在"目标字符串"中匹配到下一个结果。

Pattern pattern = Pattern.compile("\\w+"); // 匹配一个单词
Matcher matcher = pattern.matcher("hello world");
int result = 0;
while (matcher.find()){
    result++;
}
System.out.println(result); // 2

获取匹配结果

上面说的方法都是判断方法，其作用是判断"正则能否在目标文本中匹配到结果"。想得到匹配结果还需Matcher类中group(...)方法协助，在这之前，先说说"正则分组"的概念，再来说说group(...)。

正则分组

分组就是正则表达式中用小括号包围起来的部分，整个正则表达式是第一个分组，对应分组0，然后从左到右，顺序出现的小括号组就对应着分组1,2,3……(小括号出现顺序以左括号为准)。

正则分组其实分为两种类型：捕获组、非捕获组。顾名思义，捕获组就是匹配的文本内容会被捕获并分组编号，非捕获组就是匹配的内容不会被捕获更不会被分组编号。非捕获组中的正则都是以?开头，但是以?开头的正则分组不一定就是非捕获组。

这里只讨论捕获组，不讨论非捕获组。举个例子：

正则：  (A(B(C)D))(EFG)
分组0： (A(B(C)D))(EFG)
分组1： (A(B(C)D))
分组2： (B(C)D)
分组3： (C)
分组4： (EFG)

group(...)

无论是matches(); lookingAt(); find(); find(int start);这四个方法中的哪一个方法能够匹配成功，都可以调用group(...)获取结果。

举一个例子，其它类推：

Pattern pattern = Pattern.compile("(A(B(C)D))(EFG)"); // 匹配一个单词
Matcher matcher = pattern.matcher("ABCDEFG");
while (matcher.find()) {
    for (int i = 0; i < matcher.groupCount(); i++) {
        System.out.println("分组" + i + ": " + matcher.group(i));
    }
}

// 运行结果
分组0: ABCDEFG
分组1: ABCD
分组2: BCD
分组3: C

正则替换

public final class Matcher implements MatchResult {

    public static String quoteReplacement(String s) { ... } // 特殊字符替换成字面值常量

    public Matcher appendReplacement(StringBuffer sb, String replacement) { ... } // 渐进式替换

    public StringBuffer appendTail(StringBuffer sb) { ... } // 执行完appendReplacement方法后，可追加剩余字符串到最终结果

    public String replaceAll(String replacement) { ... } // 替换所有匹配结果

    public String replaceFirst(String replacement) { ... } // 替换第一个匹配结果

    ...

}

1、quoteReplacement(String s); 该方法是将特殊字符"\"和"$"替换成字面值常量，用到的机会很少。

"\"是转译字符

"$"在appendReplacement(sb,replacement)方法的replacement参数中可以表示分组(如$0表示分组0，$1表示分组1……)

举个例子(代码跑不起来)：

public static void main(String[] args) throws IOException {
    String string = "Our days have bloomed";
    StringBuffer sbuf = new StringBuffer();
    Pattern compile = Pattern.compile("[aeiou]");
    Matcher matcher = compile.matcher(string);
    while (matcher.find()) {
        // 1和2等价
        matcher.appendReplacement(sbuf, Matcher.quoteReplacement("\\")); // 1
        matcher.appendReplacement(sbuf, "\\\\"); // 2
        // $0表示匹配到的文本分组0的内容
        matcher.appendReplacement(sbuf, "$0-index");// 3
    }
    matcher.appendTail(sbuf);
    System.out.println(sbuf);
}

2、replaceAll(String replacement)和replaceFirst(String replacement)这两个方法在String类中也有，结果是一样的。可根据实际情况选择使用，如果只会调用一次，建议使用String类自带的方法，效率会更高些，也避免了编译Pattern的开销。

public static void main(String[] args) throws IOException {
    String string = "Our days have bloomed";

    Pattern pattern = Pattern.compile("[aeiou]");
    Matcher matcher = pattern.matcher(string);

    String replace = matcher.replaceFirst("*");
    System.out.println(replace);// O*r days have bloomed
    replace = string.replaceFirst("[aeiou]", "*");
    System.out.println(replace);// O*r days have bloomed

    replace = matcher.replaceAll("*");
    System.out.println(replace); // O*r d*ys h*v* bl**m*d
    replace = string.replaceAll("[aeiou]", "*");
    System.out.println(replace);// O*r d*ys h*v* bl**m*d
}

3、appendReplacement(StringBuffer sb, String replacement)和appendTail(StringBuffer sb)这两个方法通常都是结合着使用，执行渐进式替换操作，就是正则分组替换的过程可控，不会像上面的两个方法，只能使用一个固定的字符串。

public static void main(String[] args) throws IOException {
    String string = "Our days have bloomed";
    StringBuffer sbuf = new StringBuffer();
    Pattern compile = Pattern.compile("[aeiou]");
    Matcher matcher = compile.matcher(string);
    while (matcher.find()) {
        matcher.appendReplacement(sbuf, matcher.group().toUpperCase());
    }
    matcher.appendTail(sbuf);
    System.out.println(sbuf);
}

// 结果：OUr dAys hAvE blOOmEd

以上是关于Java 正则表达式的主要内容，如果未能解决你的问题，请参考以下文章