在 Pandas 数据框中查找和替换子字符串忽略大小写

Posted 2023-02-23

技术标签:

【中文标题】在 Pandas 数据框中查找和替换子字符串忽略大小写【英文标题】：Find and replace substrings in a Pandas dataframe ignore case 【发布时间】：2019-01-22 15:54:30 【问题描述】：

df.replace('Number', 'NewWord', regex=True)

如何用 NewWord 替换 Number 或 number 或 NUMBER

【问题讨论】：

【参考方案1】：

与使用标准正则表达式相同，使用i flag。

df = df.replace('(?i)Number', 'NewWord', regex=True)

当然，df.replace 的限制在于标志必须作为正则表达式字符串的一部分（而不是标志）传递。如果这是使用str.replace，您可以使用case=False 或flags=re.IGNORECASE。

【讨论】：

其实我没想过通过 /shakesheadinshame 模式传递标志。 @piRSquared 我认为你的没问题，只要 OP 只需要担心这两种情况。是的，我喜欢你的，因为它捕捉到了我猜他们想要的精神。 sacul 也是如此。如果我使用列表替换，是否可以使用标志。例如：df = df[col].replace('(?i)['No, Number'], ['NewWord'])?【参考方案2】：

只需在str.replace 中使用case=False。

例子：

df = pd.DataFrame('col':['this is a Number', 'and another NuMBer', 'number'])

>>> df
                  col
0    this is a Number
1  and another NuMBer
2              number

df['col'] = df['col'].str.replace('Number', 'NewWord', case=False)

>>> df
                   col
0    this is a NewWord
1  and another NewWord
2              NewWord

[编辑]：在有多个列的情况下，您要在其中查找子字符串，您可以选择具有object dtypes 的所有列，并将上述解决方案应用于它们。示例：

>>> df
                  col                col2  col3
0    this is a Number  numbernumbernumber     1
1  and another NuMBer                   x     2
2              number                   y     3

str_columns = df.select_dtypes('object').columns

df[str_columns] = (df[str_columns]
                   .apply(lambda x: x.str.replace('Number', 'NewWord', case=False)))

>>> df
                   col                   col2  col3
0    this is a NewWord  NewWordNewWordNewWord     1
1  and another NewWord                      x     2
2              NewWord                      y     3

【讨论】：

这是正确的答案，假设它只是 OP 担心的一列，所以 +1。但是，如果要对多列执行此操作，您可能需要 1) 应用 str.replace 或 2) stack + str.replace。真的，谢谢，我已经为该场景提供了一个示例。 stack 方法的问题在于它会改变数字 dtypes【参考方案3】：

野蛮。这仅在整个字符串为'Number' 或'NUMBER' 时才有效。它不会替换较大字符串中的那些。当然，也仅限于这两个词。

df.replace(['Number', 'NUMBER'], 'NewWord')

更多暴力如果不够明显，这远不如@coldspeed 的答案

import re

df.applymap(lambda x: re.sub('number', 'NewWord', x, flags=re.IGNORECASE))

或者从@coldspeed 的回答中得到提示

df.applymap(lambda x: re.sub('(?i)number', 'NewWord', x))

【讨论】：

【参考方案4】：

如果您要转换的文本位于数据框的特定列中，则此解决方案将起作用：

    df['COL_n'] = df['COL_n'].str.lower() 
    df['COL_n'] = df['COL_n'].replace('number', 'NewWord', regex=True)

【讨论】：

以上是关于在 Pandas 数据框中查找和替换子字符串忽略大小写的主要内容，如果未能解决你的问题，请参考以下文章