如何将嵌套字典转换为 pandas DataFrame?
Posted
技术标签:
【中文标题】如何将嵌套字典转换为 pandas DataFrame?【英文标题】:How to convert nested dictionaries to a pandas DataFrame? 【发布时间】:2019-12-09 11:10:23 【问题描述】:我想将调用结果从 API 转换为数据帧。 API 调用的结果是一个嵌套字典,但生成的数据帧不是我需要的。
除了json_normalize,我还尝试了pd.DataFrame.from_dict。然而,直到现在都没有成功。我也试过把字典弄平,但没有。
我使用了以下调用:
[73] results = requests.get(url).json()
results
输出是:
'result': 'totalrows': 3124,
'rows': ['rownum': 1,
'values': ['field': 'querydate', 'value': '7/31/2019 3:19 PM',
'field': 'issueid', 'value': 472683,
'field': 'ticker', 'value': 'AAPL',
'field': 'companyname', 'value': 'APPLE INC',
'field': 'issuetitle', 'value': 'COM',
'field': 'filerid', 'value': 1089387],
'rownum': 2,
'values': ['field': 'querydate', 'value': '7/31/2019 3:19 PM',
'field': 'issueid', 'value': 472683,
'field': 'ticker', 'value': 'AAPL',
'field': 'companyname', 'value': 'APPLE INC',
'field': 'issuetitle', 'value': 'COM',
'field': 'filerid', 'value': 1086893],
'rownum': 3,
'values': ['field': 'querydate', 'value': '7/31/2019 3:19 PM',
'field': 'issueid', 'value': 472683,
'field': 'ticker', 'value': 'AAPL',
'field': 'companyname', 'value': 'APPLE INC',
'field': 'issuetitle', 'value': 'COM',
'field': 'filerid', 'value': 1085803]
然后为了生成数据框,我使用了以下代码:
[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()
这是输出:
rownum values
0 1 ['field': 'querydate', 'value': '7/31/2019 3:19 PM',
'field': 'issueid', 'value': 472683, 'field':
'ticker', 'value': 'AAPL', 'field': 'companyname',
'value': 'APPLE INC', 'field': 'issuetitle', 'value':
'COM', 'field': 'filerid', 'value': 1089387
1 2 ['field': 'querydate', 'value': '7/31/2019 3:19 PM',
'field': 'issueid', 'value': 472683, 'field':
'ticker', 'value': 'AAPL', 'field': 'companyname',
'value': 'APPLE INC', 'field': 'issuetitle', 'value':
'COM', 'field': 'filerid', 'value': 1086893
2 3 ['field': 'querydate', 'value': '7/31/2019 3:19 PM', 'field':
'issueid', 'value': 472683, 'field': 'ticker', 'value': 'AAPL',
'field': 'companyname', 'value': 'APPLE INC', 'field':
'issuetitle', 'value': 'COM', 'field': 'filerid', 'value': 1085803
但是,我想获得一个格式如下的DataFrame:
【问题讨论】:
【参考方案1】:您可以使用pandas.DataFrame.from_dict
,但您需要删除数据中所有不必要的数据。实际上,您只想保留每行的 field
值和 value
。您可以通过列表理解来做到这一点:
data = [ field["field"]:field["value"] for field in row['values']
for row in data['result']["rows"]]
print(data)
# ['querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1089387,
#
# 'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1086893,
#
# 'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1085803
# ]
一旦你有了这本字典,你就可以调用from_dict
方法:
df = pd.DataFrame.from_dict(data)
print(df)
# companyname filerid issueid issuetitle querydate ticker
# 0 APPLE INC 1089387 472683 COM 7/31/2019 3:19 PM AAPL
# 1 APPLE INC 1086893 472683 COM 7/31/2019 3:19 PM AAPL
# 2 APPLE INC 1085803 472683 COM 7/31/2019 3:19 PM AAPL
如果您想将rownum
作为列(或索引):
data = [**field["field"]:field["value"] for field in row['values'], **'rownum': row["rownum"] for row in data['result']["rows"]]
df = pd.DataFrame.from_dict(data)
print(df)
# companyname filerid issueid issuetitle querydate rownum ticker
# 0 APPLE INC 1089387 472683 COM 7/31/2019 3:19 PM 1 AAPL
# 1 APPLE INC 1086893 472683 COM 7/31/2019 3:19 PM 2 AAPL
# 2 APPLE INC 1085803 472683 COM 7/31/2019 3:19 PM 3 AAPL
【讨论】:
差不多了,只是缺少rownum
@Akaisteph7 为了在列表理解中做到这一点,我设法通过合并两个 dict
来做到这一点。也许有更好的想法?
非常感谢! @AlexandreB。成功了!我不需要rownum
,但在我的问题中没有提到它。【参考方案2】:
朴素的嵌套 for 循环尝试...
import pandas as pd
df = pd.DataFrame([])
for row in json["result"]["rows"]:
rownum = row["rownum"]
querydate = issueid = ticker = companyname = issuetitle = filerid = None
for value_dict in row["values"]:
if value_dict["field"] == "querydate":
querydate = value_dict["value"]
elif value_dict["field"] == "issueid":
issueid = value_dict["value"]
elif value_dict["field"] == "ticker":
ticker = value_dict["value"]
elif value_dict["field"] == "companyname":
companyname = value_dict["value"]
elif value_dict["field"] == "filerid":
filerid = value_dict["value"]
df = df.append(pd.DataFrame("rownum": rownum,
"querydate": querydate,
"issueid": issueid,
"ticker": ticker,
"companyname": companyname,
"issuetitle": issuetitle,
"filerid": filerid,
, index=[0]), ignore_index=True)
print(df)
JSON 对象:
json =
"result":
"totalrows": 3,
"rows": [
"rownum": 1,
"values": [
"field": "querydate",
"value": "7/31/2019 3:19 PM"
,
"field": "issueid",
"value": 472683
,
"field": "ticker",
"value": "AAPL"
,
"field": "companyname",
"value": "APPLE INC"
,
"field": "issuetitle",
"value": "COM"
,
"field": "filerid",
"value": 1089387
]
,
"rownum": 2,
"values": [
"field": "querydate",
"value": "7/31/2019 3:19 PM"
,
"field": "issueid",
"value": 472683
,
"field": "ticker",
"value": "AAPL"
,
"field": "companyname",
"value": "APPLE INC"
,
"field": "issuetitle",
"value": "COM"
,
"field": "filerid",
"value": 1086893
]
,
"rownum": 3,
"values": [
"field": "querydate",
"value": "7/31/2019 3:19 PM"
,
"field": "issueid",
"value": 472683
,
"field": "ticker",
"value": "AAPL"
,
"field": "companyname",
"value": "APPLE INC"
,
"field": "issuetitle",
"value": "COM"
,
"field": "filerid",
"value": 1085803
]
]
输出:
rownum querydate issueid ticker companyname issuetitle filerid
0 1 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1089387
1 2 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1086893
2 3 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1085803
【讨论】:
感谢您的贡献!谢谢@shash678以上是关于如何将嵌套字典转换为 pandas DataFrame?的主要内容,如果未能解决你的问题,请参考以下文章