将字典列表的 Python 数据框列转换为具有单个元素的列
Posted
技术标签:
【中文标题】将字典列表的 Python 数据框列转换为具有单个元素的列【英文标题】:Python dataframe column of lists of dicts into columns with single elements 【发布时间】:2021-10-04 07:35:33 【问题描述】:我尝试以不同的格式提出这个问题,但我得到的答案是针对问题的特定部分而不是整个问题。为了避免混淆,我再次尝试并以不同的方式表达问题。
我有一个数据框,其中几列具有常规数据,但一列具有作为元素的字典列表。这是一个例子。
list_of_dicts = ['a':'sam','b':2,'a':'diana','c':'grape', 'd':5,'a':'jody','c':7,'e':'foo','f':9]
list_of_dicts_2 = ['a':'joe','b':2,'a':'steve','c':'pizza','a':'alex','c':7,'e':'doh']
df4.loc[0,'lists_of_stuff'] = list_of_dicts
df4.loc[1,'lists_of_stuff'] = list_of_dicts_2
df4.loc[0,'other1'] = 'Susie'
df4.loc[1,'other1'] = 'Rachel'
df4.loc[0,'other2'] = 123
df4.loc[1,'other2'] = 456
df4
other1 lists_of_stuff other2
0 Susie ['a':'sam','b':2,'a':'diana','c':'grape', 'd':5,'a':'jody','c':7,'e':'foo','f':9] 123
1 Rachel ['a':'joe','b':2,'a':'steve','c':'pizza','a':alex,'c':7,'e':'doh'] 456
我正在尝试将这些字典拆分为列,以便我拥有一个更简单的数据框。像这样的东西(列顺序可能不同)
other1 a_1 b a_2 c d a_3 c_2 e f other2
0 Susie sam 2 diana grape 5 jody 7 foo 9 123
1 Rachel joe 2 steve pizza NaN alex 7 doh NaN 456
或者像这样
other1 a b c d e f other2
0 Susie sam 2 NaN NaN NaN NaN 123
1 Susie diana NaN 4 5 NaN NaN 123
2 Susie jody NaN 7 NaN foo 9 123
3 Rachel joe 2 NaN NaN NaN NaN 456
4 Rachel steve NaN pizza NaN NaN NaN 456
5 Rachel alex NaN 7 NaN doh NaN 456
不起作用的两件事是pd.DataFrame(df4['list_of_stuff'])
(它只是按原样显示数据框;即它不会改变任何东西)和pd.json_normalize(df4['list_of_stuff'])
(这会引发错误)。此外,flatten_json 和涉及 Series 的解决方案也没有产生可行的结果。
将 df4 转换为提议的输出之一的正确 Python 方法是什么?
(是的,我在其他地方问了几乎相同的问题。List of variable size dicts to a dataframe。那个问题不清楚,所以我决定用一个新问题再试一次,而不是在另一个问题上添加一堆东西,使其难以理解。 )
【问题讨论】:
【参考方案1】:试试:
# if the lists_of_stuff are strings, apply literal_eval
#from ast import literal_eval
#df["lists_of_stuff"] = df["lists_of_stuff"].apply(literal_eval)
df = df.explode("lists_of_stuff")
df = pd.concat([df, df.pop("lists_of_stuff").apply(pd.Series)], axis=1)
print(df)
打印:
other1 other2 a b c d e f
0 Susie 123 sam 2.0 NaN NaN NaN NaN
0 Susie 123 diana NaN grape 5.0 NaN NaN
0 Susie 123 jody NaN 7 NaN foo 9.0
1 Rachel 456 joe 2.0 NaN NaN NaN NaN
1 Rachel 456 steve NaN pizza NaN NaN NaN
1 Rachel 456 alex NaN 7 NaN doh NaN
编辑:重新索引列:
#... code as above
df = df.reset_index(drop=True).reindex(
[*df.columns[:1]] + [*df.columns[2:]] + [*df.columns[1:2]], axis=1
)
print(df)
打印:
other1 a b c d e f other2
0 Susie sam 2.0 NaN NaN NaN NaN 123
1 Susie diana NaN grape 5.0 NaN NaN 123
2 Susie jody NaN 7 NaN foo 9.0 123
3 Rachel joe 2.0 NaN NaN NaN NaN 456
4 Rachel steve NaN pizza NaN NaN NaN 456
5 Rachel alex NaN 7 NaN doh NaN 456
【讨论】:
以上是关于将字典列表的 Python 数据框列转换为具有单个元素的列的主要内容,如果未能解决你的问题,请参考以下文章
如何将 pandas 数据框列转换为本机 python 数据类型?
将标准 python 键值字典列表转换为 pyspark 数据框