将字典列表的 Python 数据框列转换为具有单个元素的列

Posted

技术标签:

【中文标题】将字典列表的 Python 数据框列转换为具有单个元素的列【英文标题】:Python dataframe column of lists of dicts into columns with single elements 【发布时间】:2021-10-04 07:35:33 【问题描述】:

我尝试以不同的格式提出这个问题,但我得到的答案是针对问题的特定部分而不是整个问题。为了避免混淆,我再次尝试并以不同的方式表达问题。

我有一个数据框,其中几列具有常规数据,但一列具有作为元素的字典列表。这是一个例子。

list_of_dicts = ['a':'sam','b':2,'a':'diana','c':'grape', 'd':5,'a':'jody','c':7,'e':'foo','f':9]
list_of_dicts_2 = ['a':'joe','b':2,'a':'steve','c':'pizza','a':'alex','c':7,'e':'doh']

df4.loc[0,'lists_of_stuff'] = list_of_dicts
df4.loc[1,'lists_of_stuff'] = list_of_dicts_2

df4.loc[0,'other1'] = 'Susie'
df4.loc[1,'other1'] = 'Rachel'

df4.loc[0,'other2'] = 123
df4.loc[1,'other2'] = 456

df4
    other1  lists_of_stuff                                                              other2
0   Susie   ['a':'sam','b':2,'a':'diana','c':'grape', 'd':5,'a':'jody','c':7,'e':'foo','f':9]                 123
1   Rachel  ['a':'joe','b':2,'a':'steve','c':'pizza','a':alex,'c':7,'e':'doh']        456

我正在尝试将这些字典拆分为列,以便我拥有一个更简单的数据框。像这样的东西(列顺序可能不同)

    other1 a_1   b   a_2   c     d   a_3      c_2   e   f   other2
0   Susie  sam   2   diana grape 5   jody     7     foo 9   123
1   Rachel joe   2   steve pizza NaN alex     7     doh NaN 456

或者像这样

    other1 a     b   c     d   e   f   other2
0   Susie  sam   2   NaN   NaN NaN NaN 123
1   Susie  diana NaN 4     5   NaN NaN 123
2   Susie  jody  NaN 7     NaN foo 9   123
3   Rachel joe   2   NaN   NaN NaN NaN 456 
4   Rachel steve NaN pizza NaN NaN NaN 456
5   Rachel alex  NaN 7     NaN doh NaN 456

起作用的两件事是pd.DataFrame(df4['list_of_stuff'])(它只是按原样显示数据框;即它不会改变任何东西)和pd.json_normalize(df4['list_of_stuff']) (这会引发错误)。此外,flatten_json 和涉及 Series 的解决方案也没有产生可行的结果。

将 df4 转换为提议的输出之一的正确 Python 方法是什么?

(是的,我在其他地方问了几乎相同的问题。List of variable size dicts to a dataframe。那个问题不清楚,所以我决定用一个新问题再试一次,而不是在另一个问题上添加一堆东西,使其难以理解。 )

【问题讨论】:

【参考方案1】:

试试:

# if the lists_of_stuff are strings, apply literal_eval
#from ast import literal_eval
#df["lists_of_stuff"] = df["lists_of_stuff"].apply(literal_eval)

df = df.explode("lists_of_stuff")
df = pd.concat([df, df.pop("lists_of_stuff").apply(pd.Series)], axis=1)
print(df)

打印:

   other1  other2      a    b      c    d    e    f
0   Susie     123    sam  2.0    NaN  NaN  NaN  NaN
0   Susie     123  diana  NaN  grape  5.0  NaN  NaN
0   Susie     123   jody  NaN      7  NaN  foo  9.0
1  Rachel     456    joe  2.0    NaN  NaN  NaN  NaN
1  Rachel     456  steve  NaN  pizza  NaN  NaN  NaN
1  Rachel     456   alex  NaN      7  NaN  doh  NaN

编辑:重新索引列:

#... code as above
df = df.reset_index(drop=True).reindex(
    [*df.columns[:1]] + [*df.columns[2:]] + [*df.columns[1:2]], axis=1
)
print(df)

打印:

   other1      a    b      c    d    e    f  other2
0   Susie    sam  2.0    NaN  NaN  NaN  NaN     123
1   Susie  diana  NaN  grape  5.0  NaN  NaN     123
2   Susie   jody  NaN      7  NaN  foo  9.0     123
3  Rachel    joe  2.0    NaN  NaN  NaN  NaN     456
4  Rachel  steve  NaN  pizza  NaN  NaN  NaN     456
5  Rachel   alex  NaN      7  NaN  doh  NaN     456

【讨论】:

以上是关于将字典列表的 Python 数据框列转换为具有单个元素的列的主要内容,如果未能解决你的问题,请参考以下文章

如何将 pandas 数据框列转换为本机 python 数据类型?

熊猫数据框列有带逗号的字符串如何将其转换为列表[关闭]

将标准 python 键值字典列表转换为 pyspark 数据框

Python将列表转换为具有多个键值的字典[关闭]

如何将返回的python JSON字典转换为字典中的列表,并将数据转换为SQL插入

如何迭代数据列的每个单元格,转换和附加每个单元格?