将方法列表中的方法应用于熊猫数据框

Posted 2023-03-11

技术标签:

【中文标题】将方法列表中的方法应用于熊猫数据框【英文标题】：Apply a method from a list of methods to pandas dataframe 【发布时间】：2018-10-25 19:59:22 【问题描述】：

这是我的第一个问题，请耐心等待。

我的问题如下：

假设我们有一个 pandas 数据框，并且我们希望动态地将一些 pd.Series 方法应用于此数据框的一组列。为什么下面的例子不起作用？

testframe=pd.DataFrame.from_dict('col1': [1,2] ,'col2': [3,4] )
funcdict='col1':[pd.Series.astype,str.replace],'col2':[pd.Series.astype,str.replace]
argdict= 'col1':[['str'],['1','A']],'col2':[['str'],['3','B']]

for col in testframe.columns:
    for func in funcdict[col]:
            idx=funcdict[col].index(func)
            testframe[col]=testframe[col].func(*argdict[col][idx])

预期结果是

  col1 col2
0  'A'  'B'
1  '1'  '4'

但是我得到了

AttributeError: 'Series' object has no attribute 'func'

显着

testframe['col1']=testframe['col1'].astype(*argdict['col1'][0])

按预期工作，因此尽管

print(func)

产生所需的输出：'function NDFrame.astype at 0x00000186954EB840' etc.

【问题讨论】：

【参考方案1】：

您调用方法的语法不正确。在 Python 中有两种方法可以调用方法。

直接

如您所见，这将起作用。请注意，astype 没有引用其他对象，它是属于pd.Series 的方法的实际名称。

testframe['col1'] = testframe['col1'].astype(*argdict['col1'][0])

功能性

函数式方法明确表明astype 是方法的名称。

from operator import methodcaller

testframe['col1'] = methodcaller('astype', *argdict['col1'][0])(testframe[col])

尝试testframe[col].func(...) 永远不会起作用，因为func 不是pd.Series 方法的名称。

【讨论】：

这通过使用 $func.__name__$ 作为方法调用者的第一个参数解决了我的问题。不知道那个功能..thx！【参考方案2】：

您可以使用rgettattr 从系列中获取属性testframe[col]：例如，

In [74]: s = pd.Series(['1','2'])

In [75]: rgetattr(s, 'str.replace')('1', 'A')
Out[75]: 
0    A
1    2
dtype: object

import functools
import pandas as pd

def rgetattr(obj, attr, *args):
    def _getattr(obj, attr):
        return getattr(obj, attr, *args)
    return functools.reduce(_getattr, [obj] + attr.split('.'))

testframe = pd.DataFrame.from_dict('col1': [1, 2], 'col2': [3, 4])

funcdict = 'col1': ['astype', 'str.replace'],
            'col2': ['astype', 'str.replace']

argdict = 'col1': [['str'], ['1', 'A']], 'col2': [['str'], ['3', 'B']]

for col in testframe.columns:
    for attr, args in zip(funcdict[col], argdict[col]):
        testframe[col] = rgetattr(testframe[col], attr)(*args)
print(testframe)

产量

  col1 col2
0    A    B
1    2    4

getattr 是 Python 标准库中的函数，用于在以字符串形式给出名称时从对象获取命名属性。例如，给定

In [92]: s = pd.Series(['1','2']); s
Out[92]: 
0    1
1    2
dtype: object

我们可以得到s.str使用

In [85]: getattr(s, 'str')
Out[85]: <pandas.core.strings.StringMethods at 0x7f334a847208>
In [91]: s.str == getattr(s, 'str')
Out[91]: True

要获得s.str.replace，我们需要

In [88]: getattr(getattr(s, 'str'), 'replace')
Out[88]: <bound method StringMethods.replace of <pandas.core.strings.StringMethods object at 0x7f334a847208>>

In [90]: s.str.replace == getattr(getattr(s, 'str'), 'replace')
Out[90]: True

但是，如果我们指定

funcdict = 'col1': ['astype', 'str.replace'],
            'col2': ['astype', 'str.replace']

那么我们需要某种方式来处理需要一次调用getattr（例如getattr(testframe[col], 'astype')）的情况，而不是需要多次调用getattr（例如getattr(getattr(testframe[col], 'str'), 'replace')）的情况。

为了将这两种情况统一为一种简单的语法，我们可以使用rgetattr，这是getattr 的递归替换，它可以处理字符串属性名称的点链，例如'str.replace'。

递归由reduce 处理。文档以reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) 计算((((1+2)+3)+4)+5) 为例。同样，您可以想象+ 被getattr 替换，以便rgetattr(s, 'str.replace') 计算getattr(getattr(s, 'str'), 'replace')。

【讨论】：

我对 python 还很陌生，所以我需要一些时间来消化细节，但这看起来非常优雅（+1）

以上是关于将方法列表中的方法应用于熊猫数据框的主要内容，如果未能解决你的问题，请参考以下文章