Java正则表达式2

Posted 2021-03-06 Lucas小毛驴博客

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Java正则表达式2相关的知识，希望对你有一定的参考价值。

一.正则表达式练习

匹配输入的QQ号（匹配规则：长度5-10位，纯数字组成，不能以0开头）

public class RegexTest {
    public static void main(String[] args) {
        Scanner in = new Scanner(System.in);

        while (true) {
            System.out.print("输入: ");
            String qq = in.nextLine();

            String regex = "^[1-9]\d{4,9}$";
            Pattern p = Pattern.compile(regex);
            Matcher m = p.matcher(qq);
            while (m.find()) {
                System.out.println("QQ:" + m.group());
            }
        }
    }
}

匹配电话号码（匹配规则：长度为11位，纯数字，且以1开头，第二位必须是3,5,7,8的一位）

public class RegexTest {
    public static void main(String[] args) {
        Scanner in = new Scanner(System.in);

        while (true) {
            System.out.print("输入: ");
            String phone = in.nextLine();

            String regex = "^1[3578][0-9]{9}";
            Pattern p = Pattern.compile(regex);
            Matcher m = p.matcher(phone);
            while (m.find()) {
                System.out.println("phone:" + m.group());
            }
        }
    }
}

字符串切割（规则：按照#切割，返回去掉后的字符串）

public class RegexTest {
    public static void main(String[] args) {
        String str = "abc##java#hello###world";

        String ret[] = str.split("#+");
        for (String s : ret) {
            System.out.println(s);
        }
    }
}

二.正则表达式常见的应用

匹配中文

u4e00-u9fa5是用来判断是不是中文的，所以匹配时候可以使用[u4e00-u9fa5]

public class RegexTest1 {
    public static void main(String[] args) {
        Scanner in = new Scanner(System.in);
        while (true) {
            System.out.print("请输入：");
            String str = in.nextLine();
            Pattern p = Pattern.compile("[u4e00-u9fa5]+");
            Matcher m = p.matcher(str);
            while (m.find()) {
                System.out.println(m.group());
            }
        }
    }
}

执行结果：

请输入：hello张三abcd
张三
请输入：hello最最最abs是不是
最最最
是不是

数字范围匹配

数字的匹配，除了0-9，两位数以上的数字范围匹配容易犯错，比如匹配从1990到2020，容易把正则写成[1990-2020]，实际上这个正则只会匹配0或1或2或9中的任意一个。

public class RegexTest2 {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("[1990-2000]");

        String str = "1";
        Matcher m = p.matcher(str);
        System.out.println(m.matches());    // 输出：true

        String str1 = "1995";
        Matcher m1 = p.matcher(str1);
        System.out.println(m1.matches());   // 输出：false
    }
}

要用正则表达式匹配数字范围时，首先要做的是确定最大值和最小值，最后再写出中间值

正确的匹配方式如下：

public class RegexTest2 {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("^1990$|^199[0-9]$|^2000$");
        String str = "1";
        Matcher m = p.matcher(str);
        System.out.println(m.matches());    // 输出：false

        String str1 = "1995";
        Matcher m1 = p.matcher(str1);
        System.out.println(m1.matches());   // 输出：true

    }
}

img标签问题

在使用img标签时，考虑可能存在不规范的写法，可能有多余空格，使用单引号等，匹配方式可如下：

public class RegexTest3 {
    public static void main(String[] args) {
        String str = "<img  src='aaa.jpg' /><img src=bbb.png/><img src="ccc.png"/>" +
                "<img src='ddd.exe'/><img src='eee.jpn'/>";

        Pattern p = Pattern.compile("<img\s+src=['"]?(\w+.(jpg|png))['"]?\s*/>");
        Matcher m = p.matcher(str);
        while (m.find()) {
            System.out.println(m.group(1));
        }
        // 输出：
        // aaa.jpg
        // bbb.png
        // ccc.png
    }
}

邮箱匹配

先了解下合法的邮箱规则：

必须包含一个并且只有一个@
第一个字符不可以是@或者.
不允许出现@.或者.@
结尾不可以是字符@或者.
允许@前的字符中出现+
不允许+在最前面或者+@

public class RegexTest4 {
    public static void main(String[] args) {
        String regex = "^([0-9a-zA-Z]+[-|\.]?)+[0-9a-zA-Z]@([0-9a-zA-Z]+(-[0-9a-zA-Z]+)?\.)+[0-9a-zA-Z]{2,3}$";
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher("hello.sss-ksssssk@qq.com");
        System.out.println(m.matches());// 输出：true
    }
}

url匹配

网络查找的比较好的一个正则。

public class RegexTest5 {
    public static void main(String[] args) {
        String str = "https://www.baidu.com/";

        Pattern p = Pattern.compile("(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]");
        Matcher m = p.matcher(str);
        System.out.println(m.matches());
    }
}

贪婪模式和非贪婪模式

比如，提取div标签中的文本

public class RegexTest6 {
    public static void main(String[] args) {
        String str = "<div>文章标题</div><div>发布时间</div>";

        // 贪婪模式
        Pattern p = Pattern.compile("<div>(?<title>.+)</div>");
        Matcher m = p.matcher(str);
        while (m.find()) {
            System.out.println(m.group("title"));
        }// 输出：文章标题</div><div>发布时间

        System.out.println("-------------");

        // 非贪婪模式
        Pattern p1 = Pattern.compile("<div>(?<title>.+?)</div>");
        Matcher m1 = p1.matcher(str);
        while (m1.find()) {
            System.out.println(m1.group("title"));
        }
        // 输出：
        // 文章标题
        // 发布时间
    }
}

以上是关于Java正则表达式2的主要内容，如果未能解决你的问题，请参考以下文章

asp.net 使用正则表达式验证包含打开/关闭括号片段的属性字符串

java 字符串替换

text 正则表达式片段

markdown 正则表达式模式片段

正则表达式匹配特定的 URL 片段而不是所有其他 URL 可能性

JAVA正则表达式代码