python的正则表达式
Posted 编程坑太多
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python的正则表达式相关的知识,希望对你有一定的参考价值。
比较多用于过滤条件,先确认想要过滤的信息,确认此信息和其他信息的不同。(找出特点)
然后对着正则表达式的列表转码就行。
基本其他的编程语言都有,linux一般常见于grep处理文本。
python的库几乎都不用记,想查可以import x, dir(x)来看
#for linux
$ grep '^From:' mbox-short.txt
记录一些python re常见的符号和用法,来自py4e
^ Matches the beginning of the line.
$ Matches the end of the line.
. Matches any character (a wildcard).
\s Matches a whitespace character.
\S Matches a non-whitespace character (opposite of \s).
* Applies to the immediately preceding character and indicates to match zero or more of the preceding character(s).
*? Applies to the immediately preceding character and indicates to match zero or more of the preceding character(s) in "non-greedy mode".
+ Applies to the immediately preceding character and indicates to match one or more of the preceding character(s).
+? Applies to the immediately preceding character and indicates to match one or more of the preceding character(s) in "non-greedy mode".
[aeiou] Matches a single character as long as that character is in the specified set. In this example, it would match "a", "e", "i", "o", or "u", but no other characters.
[a-z0-9] You can specify ranges of characters using the minus sign. This example is a single character that must be a lowercase letter or a digit.
[^A-Za-z] When the first character in the set notation is a caret, it inverts the logic. This example matches a single character that is anything other than an uppercase or lowercase letter.
( ) When parentheses are added to a regular expression, they are ignored for the purpose of matching, but allow you to extract a particular subset of the matched string rather than the whole string when using findall().
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
\d Matches any decimal digit; equivalent to the set [0-9].
\D Matches any non-digit character; equivalent to the set [^0-9].
greedy matching
The notion that the "+" and "*" characters in a regular expression expand outward to match the largest possible string.
用dir查询库里含有的模块
>>> import re
>>> dir(re)
[.. 'compile', 'copy_reg', 'error', 'escape', 'findall',
'finditer', 'match', 'purge', 'search', 'split', 'sre_compile',
'sre_parse', 'sub', 'subn', 'sys', 'template']
>>> help (re.search)
Help on function search in module re:
MATCH1.png
MATCH2.png
查找全部内容
re.findall
greedy matching 外扩到能找的最多为止。
greedy-matching.png
non-greedy matching ,找到最短契合的。
以上是关于python的正则表达式的主要内容,如果未能解决你的问题,请参考以下文章