如何将嵌套字典转换为数据框
Posted
技术标签:
【中文标题】如何将嵌套字典转换为数据框【英文标题】:How to convert a nested dict into dataframe 【发布时间】:2021-08-30 20:52:37 【问题描述】:假设我有一个 API 响应:
"fact":
"UP": [
"SCODE": "CNB",
"SNAME": "Kanpur Central"
,
"SCODE": "JHS",
"SNAME": "Jhansi Junction"
],
"MP": [
"SCODE": "BPL",
"SNAME": "Bhopal Junction"
,
"SCODE": "JBP",
"SNAME": "Jabalpur Junction"
]
我必须将其转换为如下所示的数据框(预期输出):
fact SCODE SNAME
UP CNB Kanpur Central
UP JHS Jhansi Junction
MP BPL Bhopal Junction
MP JBP Jabalpur Junction
我的努力:我尝试使用 json_normalize() 但没有达到预期的输出:
pd.json_normalize(response).apply(pd.Series.explode)
【问题讨论】:
【参考方案1】:一种选择是用 python 重塑:
df = pd.DataFrame(['fact': k, **item
for k, lst in response['fact'].items()
for item in lst])
fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junction
pandas
选项通过explode
+ apply
pd.Series
:
df = (
pd.DataFrame(response)['fact']
.explode()
.apply(pd.Series)
.rename_axis('fact')
.reset_index()
)
fact SCODE SNAME
0 MP BPL Bhopal Junction
1 MP JBP Jabalpur Junction
2 UP CNB Kanpur Central
3 UP JHS Jhansi Junction
【讨论】:
【参考方案2】: 使用来自 OP 的response
。
您必须创建另一个结构,因为json_normalize
与字典列表一起使用,并且其中必须包含fact
:
new_response = ["fact": rfact, **r for rfact in response["fact"] for r in response["fact"][rfact]]
最后,你只需要应用函数:
final_result = pd.json_normalize(new_response)
fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junction
【讨论】:
【参考方案3】:不如直接在字典中工作效率高(所选答案做得很好):
data =
"fact":
"UP": [
"SCODE": "CNB",
"SNAME": "Kanpur Central"
,
"SCODE": "JHS",
"SNAME": "Jhansi Junction"
],
"MP": [
"SCODE": "BPL",
"SNAME": "Bhopal Junction"
,
"SCODE": "JBP",
"SNAME": "Jabalpur Junction"
]
keys = data['fact']
(pd.concat([jn(data['fact'][key]) for key in keys],
keys = keys)
.droplevel(-1)
.rename_axis(index='fact')
.reset_index()
)
fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junction
【讨论】:
以上是关于如何将嵌套字典转换为数据框的主要内容,如果未能解决你的问题,请参考以下文章