如何使用 pandas 对多索引数据框使用 map 函数？ [复制]

Posted 2023-03-11

技术标签:

【中文标题】如何使用 pandas 对多索引数据框使用 map 函数？ [复制]【英文标题】：how to use map function for multiindex dataframe using pandas? [duplicate] 【发布时间】：2021-06-24 18:15:47 【问题描述】：

我有一个如下图所示的数据框

df = pd.DataFrame('source_code':['11','11','12','13','14',np.nan],
                   'source_description':['test1', 'test1','test2','test3',np.nan,'test5'],
                   'key_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan])

我还有一个如下所示的 hash_file 数据框

hash_file = pd.DataFrame('source_id':['11','12','13','14','15'],
                          'source_code':['test1','test2','test3','test4','test5'],
                          'hash_id':[911,512,713,814,616])
id_file =  hash_file.set_index(['source_id','source_code'])['hash_id']

id_file 中不会有重复项（source_id，source_code）将始终是唯一的

现在，我想根据来自hash_file 的source_code、source_description 与source_id 和source_code 列的匹配条目来填写df 中的key_id 列。

所以，我尝试了以下

df['key_id'] = df['source_code','source_description'].map(id_file)

报错了

KeyError: ('source_code', 'source_description')

所以，我在下面尝试了另一种方法

df['key_id'] = df[['source_code','source_description']].map(id_file)

又抛出一个错误

AttributeError: 'DataFrame' 对象没有属性 'map'

所以，我希望我的输出如下所示。请注意，两者之间可能有NA，它必须不区分大小写。这意味着id_file 中的索引与df 中的列的比较必须不区分大小写。

我只想使用map 方法。也欢迎任何其他优雅的方法

source_code source_description  key_id
11            test1              911
11            test1              911
12            test2              512
13            test3              713
14             NaN               814
NaN           test5              616

【问题讨论】：

能否详细说明there might be NA in between？意思是source_id或source_code列（任一列）可以是NA.. 在这种情况下，您希望如何将值映射到df？我更新了输出。代码应比较两列以获取 key_ids。如果其中一列是NA，那么它应该查看另一列并尝试根据它找到匹配项我更喜欢 map 而不是合并，因为对于 single 列，它可以正常工作并且只需一行代码.. 对非程序员来说也很容易理解..我也想做同样的事情map 用于多个键列.. 因此映射合并 【参考方案1】：

这似乎是一个相当标准的merge，有一些重命名：

(df.merge(hash_file, left_on = ['source_code','source_description'], right_on = ['source_id','source_code'])
    .drop(columns = ['key_id','source_id','source_code_y'])
    .rename(columns = 'source_code_x':'source_code','hash_id':'key_id')
)

输出


    source_code source_description  key_id
0   11          test1               911
1   11          test1               911
2   12          test2               512
3   13          test3               713

使用`map`（用于问题中更新的输入值）

df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)

输出

    source_code source_description  key_id
0   11          test1               911.0
1   11          test1               911.0
2   12          test2               512.0
3   13          test3               713.0
4   14          NaN                 NaN
5   NaN         test5               NaN

【讨论】：

谢谢@piterbarg。但我想通过 map 而不是 merge @TheGreat 请查看编辑

以上是关于如何使用 pandas 对多索引数据框使用 map 函数？ [复制]的主要内容，如果未能解决你的问题，请参考以下文章

如何使用 Pandas 将多索引系列加入单个索引数据框？

Pandas：如何向多索引数据框添加列？

如何使用单个索引更新多索引数据框中的记录

使用多索引列连接两个 pandas 数据框

Python，pandas：如何从对称的多索引数据框中提取值

在 pandas 数据框中使用多索引连接数据框

如何使用 pandas 对多索引数据框使用 map 函数？ [复制]

使用map（用于问题中更新的输入值）

使用`map`（用于问题中更新的输入值）