python 中的反斜杠匹配的问题

Posted 2020-11-25 FLYMOOD

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python 中的反斜杠匹配的问题相关的知识，希望对你有一定的参考价值。

关于反斜杠的匹配问题可以参考：

https://www.cnblogs.com/mzc1997/p/7689235.html

文章中提出了两个概念：字符串转义和正则转义，我觉得是理解反斜杠的关键所在

1. 字符串转义

在一个字符串中因为有\\n\\t 等特殊含义的字符存在，所以字符“\\” 是转义字符（字符串转义），如果要取消其转义属性有两种方式：

1）再加一个转义字符：\'\\\\\'
2）使用原始字符串：r\'\\\'

2. 正则转义

正则表达式中，因为有\\d,\\s 等表示特殊意义的字符存在，所以正则表达式中的“\\” 也表示转义（正则转义），如果要取消其转义属性只有一种方式：

用两个\'\\\'

其实Perl 正则表达式中匹配\'\\\' 的确是用的‘\\\\’.

那么匹配\'\\\' 的正则表达就很好理解了，因为正则表达式同时也是字符串，所以既要有字符转义，又要有正则表达式转义，那就有两种方式：

\'\\\\\\\\\' 和

r\'\\\\\'

将字符串中的\\\\ 替换为\\

a=r\'3\\\\8\\\\9\'

print(a)

3\\\\8\\\\9

c=re.sub(r\'\\\\\\\\\',\'\\\\\\\\\',a)

c
Out[213]: \'3\\\\8\\\\9\'

print (c)
3\\8\\9

举例：

再举一个反斜杠的例子，这是在用bs 抓取网页后得到字符串d 再将其转换成字典

    d=soup.find(\'div\',{\'class\':\'page-box house-lst-page-box\'}).get(\'page-data\')
    print("original is ",d," type is ",type(d))
    print("asic code is ",d.encode(\'ascii\'))
    d=d.encode(\'ascii\').decode(\'unicode-escape\')
    print("before is ",d," type is ",type(d))
    d=re.sub(r"\'",\'\',d)
    print("after sub is ", d," type is ",type(d))   
    exec(d)
    print("after exec is ", d," type is ",type(d))
    a=eval(d)
    print("after eval is ", a," type is ",type(a))

得到的结果如下：

原始得到的字符串中包含了\'\\\', 这个反斜杠是为了转义\', 将其转换为ascii code 发现的确原始字符串用了双斜杠，并且在‘ 前也用了斜杠表示转义。

通过d.encode(\'\'asii\').decode(\'unicode-escape\') 可以将原始字符串中的斜杠去掉。

但是字符串中还包含了单引号，用sub 将其去掉。

执行eval 后被转换成字典格式。

original is  \\\'{"totalPage":2,"curPage":1}\\\'  type is  <class \'str\'>
asic code is  b\'\\\\\\\'{"totalPage":2,"curPage":1}\\\\\\\'\'
before is  \'{"totalPage":2,"curPage":1}\'  type is  <class \'str\'>
after sub is  {"totalPage":2,"curPage":1}  type is  <class \'str\'>
after exec is  {"totalPage":2,"curPage":1}  type is  <class \'str\'>
after eval is  {\'totalPage\': 2, \'curPage\': 1}  type is  <class \'dict\'>

strip() 和replace()

删除字符串中的换行， replace(\'\\n\',\'\')

删除字符串首尾的空格， strip()

以上是关于python 中的反斜杠匹配的问题的主要内容，如果未能解决你的问题，请参考以下文章