Regex:过滤特殊字符(如日语),但保留表情符号

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Regex:过滤特殊字符(如日语),但保留表情符号相关的知识,希望对你有一定的参考价值。

我有一个正则表达式/[^x00-x7F]/g,它可以检测诸如日语字符之类的任何特殊字符,但是此正则表达式还可以过滤表情符号,我想知道如何向此正则表达式中添加一些内容,使其无法检测到它们。这是可检测表情符号的正则表达式,是否可以“反转”它? (u00a9|u00ae|[u2000-u3300]|ud83c[ud000-udfff]|ud83d[ud000-udfff]|ud83e[ud000-udfff])

答案

说起来容易做起来难。

[首先,正则表达式(u00a9|u00ae|[u2000-u3300]|ud83c[ud000-udfff]|ud83d[ud000-udfff]|ud83e[ud000-udfff])可能为capture all emojis,但它还会捕获您要过滤的日语字符(和其他内容)(try it,请问您!)。因此,建议的模式并没有真正帮助。我进行了一些搜索,然后决定使用Mathias Bynens的emoji-regex定义(请参见:text.jsindex.js,具体取决于您的环境)-似乎更窄/更精确,但不够简洁(demo)。] >

通常,第二个方法是使用负前瞻性从过滤器范围中删除表情符号,作为不支持字符类减法的EcmaScript / JS之类的引擎的解决方法。但是,这不能很好地工作(即使手头的正则表达式引擎将支持减法),因为我们已经拥有一个取反的字符集,并且取反优先于减法(请参阅:here)。因此,我必须使用(另一个)技巧:Deleting the Matches (KeepThis)|DeleteThis

[Simple DemoFull Emoji Set Demo

const regex = /(uD83CuDFF4uDB40uDC67uDB40uDC62(?:uDB40uDC77uDB40uDC6CuDB40uDC73|uDB40uDC73uDB40uDC63uDB40uDC74|uDB40uDC65uDB40uDC6EuDB40uDC67)uDB40uDC7F|(?:uD83EuDDD1uD83CuDFFBu200DuD83EuDD1Du200DuD83EuDDD1|uD83DuDC69uD83CuDFFCu200DuD83EuDD1Du200DuD83DuDC69)uD83CuDFFB|uD83DuDC68(?:uD83CuDFFCu200D(?:uD83EuDD1Du200DuD83DuDC68uD83CuDFFB|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|uD83CuDFFFu200D(?:uD83EuDD1Du200DuD83DuDC68(?:uD83C[uDFFB-uDFFE])|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|uD83CuDFFEu200D(?:uD83EuDD1Du200DuD83DuDC68(?:uD83C[uDFFB-uDFFD])|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|uD83CuDFFDu200D(?:uD83EuDD1Du200DuD83DuDC68(?:uD83C[uDFFBuDFFC])|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|u200D(?:u2764uFE0Fu200D(?:uD83DuDC8Bu200D)?uD83DuDC68|(?:uD83D[uDC68uDC69])u200D(?:uD83DuDC66u200DuD83DuDC66|uD83DuDC67u200D(?:uD83D[uDC66uDC67]))|uD83DuDC66u200DuD83DuDC66|uD83DuDC67u200D(?:uD83D[uDC66uDC67])|(?:uD83D[uDC68uDC69])u200D(?:uD83D[uDC66uDC67])|[u2695u2696u2708]uFE0F|uD83D[uDC66uDC67]|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|(?:uD83CuDFFBu200D[u2695u2696u2708]|uD83CuDFFFu200D[u2695u2696u2708]|uD83CuDFFEu200D[u2695u2696u2708]|uD83CuDFFDu200D[u2695u2696u2708]|uD83CuDFFCu200D[u2695u2696u2708])uFE0F|uD83CuDFFBu200D(?:uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|uD83C[uDFFB-uDFFF])|uD83EuDDD1(?:uD83CuDFFFu200DuD83EuDD1Du200DuD83EuDDD1(?:uD83C[uDFFB-uDFFF])|u200DuD83EuDD1Du200DuD83EuDDD1)|uD83DuDC69(?:uD83CuDFFEu200D(?:uD83EuDD1Du200DuD83DuDC68(?:uD83C[uDFFB-uDFFDuDFFF])|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|uD83CuDFFDu200D(?:uD83EuDD1Du200DuD83DuDC68(?:uD83C[uDFFBuDFFCuDFFEuDFFF])|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|uD83CuDFFCu200D(?:uD83EuDD1Du200DuD83DuDC68(?:uD83C[uDFFBuDFFD-uDFFF])|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|uD83CuDFFBu200D(?:uD83EuDD1Du200DuD83DuDC68(?:uD83C[uDFFC-uDFFF])|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|u200D(?:u2764uFE0Fu200D(?:uD83DuDC8Bu200D(?:uD83D[uDC68uDC69])|uD83D[uDC68uDC69])|uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD])|uD83CuDFFFu200D(?:uD83C[uDF3EuDF73uDF93uDFA4uDFA8uDFEBuDFED]|uD83D[uDCBBuDCBCuDD27uDD2CuDE80uDE92]|uD83E[uDDAF-uDDB3uDDBCuDDBD]))|(?:uD83EuDDD1uD83CuDFFEu200DuD83EuDD1Du200DuD83EuDDD1|uD83DuDC69uD83CuDFFFu200DuD83EuDD1Du200D(?:uD83D[uDC68uDC69]))(?:uD83C[uDFFB-uDFFE])|(?:uD83EuDDD1uD83CuDFFDu200DuD83EuDD1Du200DuD83EuDDD1|uD83DuDC69uD83CuDFFEu200DuD83EuDD1Du200DuD83DuDC69)(?:uD83C[uDFFB-uDFFD])|(?:uD83EuDDD1uD83CuDFFCu200DuD83EuDD1Du200DuD83EuDDD1|uD83DuDC69uD83CuDFFDu200DuD83EuDD1Du200DuD83DuDC69)(?:uD83C[uDFFBuDFFC])|uD83DuDC69u200DuD83DuDC69u200D(?:uD83DuDC66u200DuD83DuDC66|uD83DuDC67u200D(?:uD83D[uDC66uDC67]))|uD83DuDC69u200DuD83DuDC66u200DuD83DuDC66|uD83DuDC69u200DuD83DuDC69u200D(?:uD83D[uDC66uDC67])|(?:uD83DuDC41uFE0Fu200DuD83DuDDE8|uD83DuDC69(?:uD83CuDFFFu200D[u2695u2696u2708]|uD83CuDFFEu200D[u2695u2696u2708]|uD83CuDFFDu200D[u2695u2696u2708]|uD83CuDFFCu200D[u2695u2696u2708]|uD83CuDFFBu200D[u2695u2696u2708]|u200D[u2695u2696u2708])|(?:uD83C[uDFC3uDFC4uDFCA]|uD83D[uDC6EuDC71uDC73uDC77uDC81uDC82uDC86uDC87uDE45-uDE47uDE4BuDE4DuDE4EuDEA3uDEB4-uDEB6]|uD83E[uDD26uDD37-uDD39uDD3DuDD3EuDDB8uDDB9uDDCD-uDDCFuDDD6-uDDDD])(?:uD83C[uDFFB-uDFFF])u200D[u2640u2642]|(?:u26F9|uD83C[uDFCBuDFCC]|uD83DuDD75)(?:uFE0Fu200D[u2640u2642]|(?:uD83C[uDFFB-uDFFF])u200D[u2640u2642])|uD83CuDFF4u200Du2620|(?:uD83C[uDFC3uDFC4uDFCA]|uD83D[uDC6EuDC6FuDC71uDC73uDC77uDC81uDC82uDC86uDC87uDE45-uDE47uDE4BuDE4DuDE4EuDEA3uDEB4-uDEB6]|uD83E[uDD26uDD37-uDD39uDD3C-uDD3EuDDB8uDDB9uDDCD-uDDCFuDDD6-uDDDF])u200D[u2640u2642])uFE0F|uD83DuDC69u200DuD83DuDC67u200D(?:uD83D[uDC66uDC67])|uD83CuDFF3uFE0Fu200DuD83CuDF08|uD83DuDC69u200DuD83DuDC67|uD83DuDC69u200DuD83DuDC66|uD83DuDC15u200DuD83EuDDBA|uD83CuDDFDuD83CuDDF0|uD83CuDDF6uD83CuDDE6|uD83CuDDF4uD83CuDDF2|uD83EuDDD1(?:uD83C[uDFFB-uDFFF])|uD83DuDC69(?:uD83C[uDFFB-uDFFF])|uD83CuDDFF(?:uD83C[uDDE6uDDF2uDDFC])|uD83CuDDFE(?:uD83C[uDDEAuDDF9])|uD83CuDDFC(?:uD83C[uDDEBuDDF8])|uD83CuDDFB(?:uD83C[uDDE6uDDE8uDDEAuDDECuDDEEuDDF3uDDFA])|uD83CuDDFA(?:uD83C[uDDE6uDDECuDDF2uDDF3uDDF8uDDFEuDDFF])|uD83CuDDF9(?:uD83C[uDDE6uDDE8uDDE9uDDEB-uDDEDuDDEF-uDDF4uDDF7uDDF9uDDFBuDDFCuDDFF])|uD83CuDDF8(?:uD83C[uDDE6-uDDEAuDDEC-uDDF4uDDF7-uDDF9uDDFBuDDFD-uDDFF])|uD83CuDDF7(?:uD83C[uDDEAuDDF4uDDF8uDDFAuDDFC])|uD83CuDDF5(?:uD83C[uDDE6uDDEA-uDDEDuDDF0-uDDF3uDDF7-uDDF9uDDFCuDDFE])|uD83CuDDF3(?:uD83C[uDDE6uDDE8uDDEA-uDDECuDDEEuDDF1uDDF4uDDF5uDDF7uDDFAuDDFF])|uD83CuDDF2(?:uD83C[uDDE6uDDE8-uDDEDuDDF0-uDDFF])|uD83CuDDF1(?:uD83C[uDDE6-uDDE8uDDEEuDDF0uDDF7-uDDFBuDDFE])|uD83CuDDF0(?:uD83C[uDDEAuDDEC-uDDEEuDDF2uDDF3uDDF5uDDF7uDDFCuDDFEuDDFF])|uD83CuDDEF(?:uD83C[uDDEAuDDF2uDDF4uDDF5])|uD83CuDDEE(?:uD83C[uDDE8-uDDEAuDDF1-uDDF4uDDF

以上是关于Regex:过滤特殊字符(如日语),但保留表情符号的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 javascript 检测表情符号

如何使用 javascript 检测表情符号

preg_replace 保留字母数字 + 拉丁语 + 表情符号

emoji表情复制后变成了文字

微信公众平台开发(57)Emoji表情符号

emoji纸飞机表情符号怎么输入