以字符串形式存储在 Pandas 数据框列中的解析列表

Posted 2023-02-23

技术标签:

【中文标题】以字符串形式存储在 Pandas 数据框列中的解析列表【英文标题】：Parsing list stored as string in Pandas dataframe column 【发布时间】：2021-10-26 20:29:50 【问题描述】：

我正在尝试使用如下所示的值解析数据框列。这是我在原始数据集上运行json_normalize 后的结果。目标是获取 'name':'Org Lvl 4' 值，这样我就可以解析得到实际的 'Org Lvl 4' 名称。

index	org [dtype: Object]
0	['name': 'Org Lvl 1', 'name': 'Org Lvl 2', 'name': 'Org Lvl 3', 'name': 'Org Lvl 4']

我读到 Pandas 将其存储为字符串而不是列表，所以我尝试了其他人的建议，即split，但我收到以下错误AttributeError: Can only use .str accessor with string values!

代码：

df['org'] = df['org'].str.split(',').str[3]

数据帧：

df = pd.DataFrame('org [dtype: Object]': 0: "['name': 'Org Lvl 1', 'name': 'Org Lvl 2', 'name': 'Org Lvl 3', 'name': 'Org Lvl 4']")

更新：

尝试此操作后，我可以为索引 0 打印“Org Lvl 4”，但现在我需要将其应用于整个列。

import ast
print(df['org'].astype(str).map(ast.literal_eval)[0][3].get('name'))

有什么想法吗？

【问题讨论】：

【参考方案1】：

我能够使用lambda解决：

df['org'] = df.apply(lambda row: list(row.org)[-1]['name'])

【讨论】：

【参考方案2】：

不是一个可靠的答案，而是提供一些东西......

查看答案：How to flatten a pandas dataframe with some columns as json?

例如：

import ast

def list_of_dicts(ld):
    '''
    Create a mapping of the tuples formed after 
    converting json strings of list to a python list   
    '''
    return dict([(list(d.values())[0], list(d.values())[0]) for d in ast.literal_eval(ld)])

df = pd.DataFrame('org [dtype: Object]': 0: "['name': 'Org Lvl 1', 'name': 'Org Lvl 2', 'name': 'Org Lvl 3', 'name': 'Org Lvl 4']")

B = pd.json_normalize(df['org [dtype: Object]'].apply(list_of_dicts).tolist()) #.add_prefix('dict_') 

print(df, '\n\n')

print(B)

>>                                  org [dtype: Object]
>> 0  ['name': 'Org Lvl 1', 'name': 'Org Lvl 2',... 


>>    Org Lvl 1  Org Lvl 2  Org Lvl 3  Org Lvl 4
>> 0  Org Lvl 1  Org Lvl 2  Org Lvl 3  Org Lvl 4

【讨论】：

以上是关于以字符串形式存储在 Pandas 数据框列中的解析列表的主要内容，如果未能解决你的问题，请参考以下文章