正则表达式(java)

Posted lamsey16

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了正则表达式(java)相关的知识,希望对你有一定的参考价值。

概念:

正则表达式,又称规则表达式。(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。

正则表通常被用来检索、替换那些符合某个模式(规则)的文本。

用途:

通常用于判断语句,检查字符串是否满足某一格式(匹配)。字符串查找、替换等。

 

正则表达式是含有一些特殊意义的字符的字符串,这些特殊字符称为正则表达式的元字符。

涉及的类

java.lang.String

java.util.regex.Pattern----模式

java.util.regex.Matcher---结果

示例:"."代表任何一个字符。“abc”用“...”匹配

public class RegExp {
    public static void main(String[] args){
        //简单介绍正则表达式
        System.out.println("abc".matches("..."));
    }
}

"\\d"---0-9任意数字,java正则表达式在元字符基础上需要加"\\"区分转义字符,所以写成“\\\\d”

public class RegExp {
    public static void main(String[] args){
        //简单介绍正则表达式
        p("abc".matches("..."));//匹配
        //"\\d"---匹配数字
        p("d1234w".replaceAll("\\\\d", "-"));//替换,采用的是反斜杠
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

 

类的介绍:

Pattern

定义:

A compiled representation of a regular expression.

A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.

A typical invocation sequence is thus

 Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

matches method is defined by this class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement

 boolean b = Pattern.matches("a*b", "aaaaab");

is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused.

 

下面的写法更有效率efficient ,同时Pattern和Matcher提供了更多的方法。

Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

[a-z]代表一个在a-z范围内的字母

[]代表范围;

限定修饰符

?---0次或者多次

*----0次或者多次

+---一次或者多次

{n}---正好出现{n}次

{n,}--至少出现n次

{n,m}出现n~m次

 

//范围

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //范围
        p("a".matches("[abc]"));
        p("a".matches("[^abc]"));//除了abc之外的都可以
        p("A".matches("[a-zA-Z]"));//任意字母都可以
        p("A".matches("[a-z]|[A-Z]"));//a-z或者A-Z,任意字母都可以
        p("A".matches("[a-z[A-Z]]"));//一样
        p("A".matches("[A-Z]&&[REG]"));//属于A-Z而且是EEG中的一个
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

 

//Predefined character classes

"\\\\".matches("\\\\\\\\")----匹配一个反斜线要写4个,前面写一个就会认为是转义,后面写两个会出错,三个转义,四个正确(暂时不清楚原理)
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
    
        //认识\\s \\w \\d
        p(" \\n\\r\\t".matches("\\\\s{4}"));
        p(" ".matches("\\\\S"));
        p("a_8".matches("\\\\w{3}"));
        p("abc888&^%".matches("[a-z]{1,3}\\\\d+[&^#%]+"));
        p("\\\\".matches("\\\\\\\\"));
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}
Predefined character classes
. Any character (may or may not match line terminators)
\\d A digit: [0-9]
\\D A non-digit: [^0-9]
\\h A horizontal whitespace character: [ \\t\\xA0\\u1680\\u180e\\u2000-\\u200a\\u202f\\u205f\\u3000]
\\H A non-horizontal whitespace character: [^\\h]
\\s A whitespace character: [ \\t\\n\\x0B\\f\\r]
\\S A non-whitespace character: [^\\s]
\\v A vertical whitespace character: [\\n\\x0B\\f\\r\\x85\\u2028\\u2029]
\\V A non-vertical whitespace character: [^\\v]
\\w A word character: [a-zA-Z_0-9]
\\W A non-word character: [^\\w]

 find()

Attempts to find the next subsequence(子序列) of the input sequence that matches the pattern.

reset()

Resetting a matcher discards all of its explicit state information and sets its append position to zero.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //matches find looking
        Pattern p = Pattern.compile("\\\\d{3,5}");
        String s = "123-45623-789-00";
        Matcher m = p.matcher(s);
        p(m.matches());
        m.reset();//matches方法和find方法会造成冲突,记得要调用reset方法
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.find());
        p(m.start()+"-"+ m.end());
        p(m.lookingAt());
        p(m.lookingAt());
        p(m.lookingAt());
        p(m.lookingAt());
        
        
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

查找替代

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {
    public static void main(String[] args){
        
        //replacement   可以参考appendReplacement()在API文档里面的描述
        Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher("java Java Java I love Java  u hate JAVA sfarwwfr");
       // p(m.replaceAll("JAVA"));//所有都替换成JAVA
        StringBuffer buf = new StringBuffer();
        int i = 0;
        while(m.find()){  //寻找
            i++;
            if (i%2 == 0) { //单数替换为java双数替换成JAVA
                m.appendReplacement(buf, "java");
            } else {
                m.appendReplacement(buf, "JAVA");
            }
        }
        m.appendTail(buf);//appendReplacement()多次调用后用此方法补全尾部
       p(buf);     
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

分组

Matcher.group()-----Returns the input subsequence matched by the previous match.

1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)

group运用括号可以得到不同的分组,eg:group(1);group(2)

public class RegExp {
    public static void main(String[] args){
    
        
        //groupregex
        Pattern p = Pattern.compile("(\\\\d{3,5})|([a-z]{2})");
        String s = "123aa-34345bb-234cc-00";
        Matcher m = p.matcher(s);
        while (m.find()) {
            p(m.group(2));
        }
    }
    public static void p(Object o){
        System.out.println(o);
    }
}

总结几个重要的知识点:

 

以上是关于正则表达式(java)的主要内容,如果未能解决你的问题,请参考以下文章

java 字符串替换

text 正则表达式片段

markdown 正则表达式模式片段

正则表达式匹配特定的 URL 片段而不是所有其他 URL 可能性

java正则表达式去除html标签

循环通过 python 正则表达式匹配