通过映射到字典创建新列（字符串包含匹配）

Posted 2023-03-17

技术标签:

【中文标题】通过映射到字典创建新列（字符串包含匹配）【英文标题】：Creating new column by mapping to dictionary (with string contain match) 【发布时间】：2021-12-19 08:52:12 【问题描述】：

我正在尝试根据字典df2 在df1 中创建Factor 列。但是映射的Code 列并不完全相同，字典仅包含部分Code 字符串。

import pandas as pd
df1 = pd.DataFrame(
    'Date':['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Ratings':[9.0, 8.0, 5.0, 3.0, 2, 3, 6, 5],
    'Code':['R:EST 5R', 'R:EKG EK', 'R:EKG EK', 'R:EST 5R', 'R:EKGP', 'R:EST 5R', 'R:OID_P', 'R:OID_P'])

df2 = pd.DataFrame(
    'Code':['R:EST', 'R:EKG', 'R:OID'],
    'Factor':[1, 1.3, 0.9])

到目前为止，我无法正确映射数据框，因为列并不完全相同。 Code 列不必以“R:”开头。

df1['Factor'] = df1['Code'].map(df2.set_index('Code')['Factor'])

这是首选输出的样子：

df3 = pd.DataFrame(
    'Date':['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Ratings':[9.0, 8.0, 5.0, 3.0, 2, 3, 6, 5],
    'Code':['R:EST 5R', 'R:EKG EK', 'R:EKG EK', 'R:EST 5R', 'R:EKGP', 'R:EST 5R', 'R:OID_P', 'R:OID_P'],
    'Factor':[1, 1.3, 1.3, 1, 1.3, 1, 0.9, 0.9])

非常感谢！

【问题讨论】：

【参考方案1】：

>>> df1['Code'].str[:5].map(df2.set_index('Code')['Factor'])
0    1.0
1    1.3
2    1.3
3    1.0
4    1.3
5    1.0
6    0.9
7    0.9
Name: Code, dtype: float64

>>> (df2.Code
         .apply(lambda x:df1.Code.str.contains(x))
         .T
         .idxmax(axis=1)
         .apply(lambda x:df2.Factor.iloc[x])
)

0    1.0
1    1.3
2    1.3
3    1.0
4    1.3
5    1.0
6    0.9
7    0.9
dtype: float64

【讨论】：

感谢您的解决方案。 “代码”不一定总是在开头。有没有办法用字符串包含函数？ df2.Code.apply(lambda x:df1.Code.str.contains(x)).T.idxmax(axis=1).apply(lambda x:df2.Factor.iloc[x ])

以上是关于通过映射到字典创建新列（字符串包含匹配）的主要内容，如果未能解决你的问题，请参考以下文章