python、pandas中string.contains的反转

Posted 2023-02-23

技术标签:

【中文标题】python、pandas中string.contains的反转【英文标题】：Reversal of string.contains In python, pandas 【发布时间】：2014-01-30 01:27:35 【问题描述】：

我的代码中有这样的内容：

df2 = df[df['A'].str.contains("Hello|World")]

但是，我希望所有不包含 Hello 或 World 的行。如何最有效地扭转这种情况？

【问题讨论】：

【参考方案1】：

您可以使用波浪号~ 翻转布尔值：

>>> df = pd.DataFrame("A": ["Hello", "this", "World", "apple"])
>>> df.A.str.contains("Hello|World")
0     True
1    False
2     True
3    False
Name: A, dtype: bool
>>> ~df.A.str.contains("Hello|World")
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[~df.A.str.contains("Hello|World")]
       A
1   this
3  apple

[2 rows x 1 columns]

这是否是最有效的方式，我不知道；您必须根据其他选择进行计时。有时使用正则表达式比 df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))] 之类的要慢，但我不擅长猜测交叉点在哪里。

【讨论】：

比复杂的负面环视测试要好得多。然而，我自己没有使用 Pandas 的经验，所以我不知道什么是更快的方法。正则表达式环视测试花费的时间明显更长（大约 30 秒 vs 20 秒），并且这两种方法的结果显然略有不同（3663K 结果与 3504K - 来自 ~3G 原始 - 尚未查看具体细节） . @DSM 我已经多次看到这个~ 符号，特别是在 javascript 中。在python中没有见过。究竟是什么意思？【参考方案2】：

.contains() 方法使用正则表达式，因此您可以使用negative lookahead test 来确定一个词不包含：

df['A'].str.contains(r'^(?:(?!Hello|World).)*$')

此表达式匹配任何字符串，其中在字符串中的任何位置未找到单词 Hello 和 World。

演示：

>>> df = pd.DataFrame("A": ["Hello", "this", "World", "apple"])
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')]
       A
1   this
3  apple

【讨论】：

我从这里得到了

C:\Python27\lib\site-packages\pandas\core\strings.py:176: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.

。使组不被捕获。

以上是关于python、pandas中string.contains的反转的主要内容，如果未能解决你的问题，请参考以下文章