熊猫删除括号之间的字符[重复]

Posted

技术标签:

【中文标题】熊猫删除括号之间的字符[重复]【英文标题】:Pandas remove characters between brackets [duplicate] 【发布时间】:2019-11-14 04:07:48 【问题描述】:

我想删除 [] 之间的字符,目前我正在做

df['Text'] = df['Text'].str.replace(r"\[.*\]","")

但输出并不理想。之前是[image] This document,之后是******* This document,其中* 是空格。

我如何摆脱这个空白。

编辑 1

dfText 列如下所示:

ID    Text
0     REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5     Lease AureementMade and signed on the \ of Aug...
6     FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8     [image: image0.jpg] Jack[image: image1.jb2] ...
9     [image: image0.jpg] ABC SALES Meeting 97...
14    FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17    [image: image0.tif] Deep ML LEASE SERVI...
22    [image: image0.jpg] F 15 083 EX [image: image1...
26    LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28    [image: image0.jpg] 17. Medical VERIFICATION...
31    [image: image0.jpg]  [image: image1.jb2] PLL 3...
32    SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34    [image: image0.tif] Lease Agreement May 10, 20...
35    13057968.3  1 Initials:  _____  _____  SECOND ...
42    [image: image0.jpg] Jack Dowson Buy Real MI...
46     Deep – Machine Learning LEASE   B...

我想看看

ID    Text
0     REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5     Lease AureementMade and signed on the \ of Aug...
6     FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8     Jack ...
9     ABC SALES Meeting 97...
14    FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17    Deep ML LEASE SERVI...
22    F 15 083 EX ...
26    LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28    17. Medical VERIFICATION...
31    PLL 3...
32    SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34    Lease Agreement May 10, 20...
35    13057968.3  1 Initials:  _____  _____  SECOND ...
42    Jack Dowson Buy Real MI...
46    Deep – Machine Learning LEASE   B...

【问题讨论】:

请花时间阅读how to provide a great pandas example 上的这篇文章以及如何提供minimal, Complete, and Verifiable example 并相应地修改您的问题。这些关于如何提出好问题的提示也可能很有用。 df['Text'] = df['Text'].str.replace(r"\[.*\]","").str.strip()? 如果我使用 @Rakesh 的解决方案,它会删除整行。 【参考方案1】:

看来你需要.str.strip()

例如:

df = pd.DataFrame("ID": [1,2,3], "Text": ["[image: 123.jpg] This document", "[image: image.jpg] Readers of the article", "The agreement between [image: image.jpg] two parties"])
df["Text"] = df["Text"].str.replace(r"(\s*\[.*?\]\s*)", " ").str.strip()
print(df)

输出:

0                        This document
1               Readers of the article
2    The agreement between two parties
Name: Text, dtype: object

【讨论】:

请注意,单词 betweentwo 之间有 two 个空格,所以这个命题不起作用. str.strip()整个文本中删除前导和尾随空格,而不是在每次匹配之前/之后。 @Valdi_Bo。谢谢 没看到。【参考方案2】:

在您的正则表达式中添加可选空格 (?),因此整个正则表达式(匹配部分)应该是:

r'\[.*\] ?'

另一个提示:您的正则表达式括在括号中(捕获组)。 它们不是必需的。删除它们。

【讨论】:

以上是关于熊猫删除括号之间的字符[重复]的主要内容,如果未能解决你的问题,请参考以下文章

括号之间的Python正则表达式替换[重复]

熊猫:计算df列之间的时间差[重复]

带和不带括号的熊猫逻辑和运算符产生不同的结果[重复]

删除熊猫数据框中具有特定值的行[重复]

如何根据条件表达式从熊猫数据框中删除行[重复]

如何根据条件表达式从熊猫数据框中删除行[重复]