将具有值列表的字典转换为数据框
Posted
技术标签:
【中文标题】将具有值列表的字典转换为数据框【英文标题】:Convert dictionaries with list of values into a dataframe 【发布时间】:2020-01-18 09:41:48 【问题描述】:假设我有三本字典
dictionary_col2
'MOB': [1, 2], 'ASP': [1, 2], 'YIP': [1, 2]
dictionary_col3
'MOB': ['MOB_L001_R1_001.gz',
'MOB_L002_R1_001.gz'],
'ASP': ['ASP_L001_R1_001.gz',
'ASP_L002_R1_001.gz'],
'YIP': ['YIP_L001_R1_001.gz',
'YIP_L002_R1_001.gz']
dictionary_col4
'MOB': ['MOB_L001_R2_001.gz',
'MOB_L002_R2_001.gz'],
'ASP': ['ASP_L001_R2_001.gz',
'ASP_L002_R2_001.gz'],
'YIP': ['YIP_L001_R2_001.gz',
'YIP_L002_R2_001.gz']
我想将上述字典转换成数据框。我尝试了以下方法,
df = pd.DataFrame([dictionary_col2, dictionary_col3, dictionary_col4])
df
数据框看起来像,
ASP MOB YIP
0 [1, 2] [1, 2] [1, 2]
1 [ASP_L001_R1_001.gz, ASP_L002_R1_001.gz] [MOB_L001_R1_001.gz, MOB_L002_R1_001.gz] [YIP_L001_R1_001.gz, YIP_L002_R1_001.gz]
2 [ASP_L001_R2_001.gz, ASP_L002_R2_001.gz] [MOB_L001_R2_001.gz, MOB_L002_R2_001.gz] [YIP_L001_R2_001.gz, YIP_L002_R2_001.gz]
我的目标是拥有一个包含以下列的数据框:
col1 col2 col3 col4
MOB 1 MOB_L001_R1_001.gz MOB_L001_R2_001.gz
MOB 2 MOB_L002_R1_001.gz MOB_L002_R2_001.gz
ASP 1 ASP_L001_R1_001.gz ASP_L001_R2_001.gz
ASP 2 ASP_L002_R1_001.gz MOB_L002_R2_001.gz
YIP 1 YIP_L001_R1_001.gz YIP_L001_R2_001.gz
YIP 2 YIP_L002_R1_001.gz YIP_L002_R2_001.gz
感谢任何帮助/建议!
【问题讨论】:
Convert list of dictionaries to a pandas DataFrame的可能重复 【参考方案1】:在 pandas 0.25.0 中使用 concat
和 explode
通知可以做什么
pd.concat([pd.Series(x).explode() for x in [d1,d2]],axis=1)
【讨论】:
【参考方案2】:pd.DataFrame('col2': pd.DataFrame(col2).unstack(),
'col3': pd.DataFrame(col3).unstack(),
'col4': pd.DataFrame(col4).unstack()).reset_index(level=0)
返回
level_0 col2 col3 col4
0 ASP 1 ASP_L001_R1_001.gz ASP_L001_R2_001.gz
1 ASP 2 ASP_L002_R1_001.gz ASP_L002_R2_001.gz
0 MOB 1 MOB_L001_R1_001.gz MOB_L001_R2_001.gz
1 MOB 2 MOB_L002_R1_001.gz MOB_L002_R2_001.gz
0 YIP 1 YIP_L001_R1_001.gz YIP_L001_R2_001.gz
1 YIP 2 YIP_L002_R1_001.gz YIP_L002_R2_001.gz
【讨论】:
这个解决方案最适合我的问题,因为它也会生成列名。谢谢!【参考方案3】:dict_list = [dictionary_col2, dictionary_col3, dictionary_col4]
df = pd.concat([pd.DataFrame.from_dict(x, orient = 'index').unstack() for x in dict_list], axis = 1)
输出:
>>> df
0 1 2
0 MOB 1 MOB_L001_R1_001.gz MOB_L001_R2_001.gz
ASP 1 ASP_L001_R1_001.gz ASP_L001_R2_001.gz
YIP 1 YIP_L001_R1_001.gz YIP_L001_R2_001.gz
1 MOB 2 MOB_L002_R1_001.gz MOB_L002_R2_001.gz
ASP 2 ASP_L002_R1_001.gz ASP_L002_R2_001.gz
YIP 2 YIP_L002_R1_001.gz YIP_L002_R2_001.gz
【讨论】:
【参考方案4】:IIUC,你可以这样做:
pd.concat([pd.DataFrame(d).stack() for d in (d1,d2,d3)], axis=1)
输出:
0 1 2
0 MOB 1 MOB_L001_R1_001.gz MOB_L001_R2_001.gz
ASP 1 ASP_L001_R1_001.gz ASP_L001_R2_001.gz
YIP 1 YIP_L001_R1_001.gz YIP_L001_R2_001.gz
1 MOB 2 MOB_L002_R1_001.gz MOB_L002_R2_001.gz
ASP 2 ASP_L002_R1_001.gz ASP_L002_R2_001.gz
YIP 2 YIP_L002_R1_001.gz YIP_L002_R2_001.gz
【讨论】:
以上是关于将具有值列表的字典转换为数据框的主要内容,如果未能解决你的问题,请参考以下文章