多层次查询深度嵌套复杂的 JSON 数据
Posted
技术标签:
【中文标题】多层次查询深度嵌套复杂的 JSON 数据【英文标题】:Querying deeply nested and complex JSON data with multiple levels 【发布时间】:2021-09-26 16:35:10 【问题描述】:我正在努力分解从深度嵌套的复杂 JSON 数据中提取数据所需的方法。我有以下代码来获取 JSON。
import requests
import pandas as pd
import json
import pprint
import seaborn as sns
import matplotlib.pyplot as plt
base_url="https://data.sec.gov/api/xbrl/companyfacts/CIK0001627475.json"
headers='User-Agent': 'Myheaderdata'
first_response=requests.get(base_url,headers=headers)
response_dic=first_response.json()
print(response_dic)
base_df=pd.DataFrame(response_dic)
base_df.head()
它提供了一个显示 JSON 和 Pandas DataFrame 的输出。数据框有两列,第三列 (FACTS) 包含大量嵌套数据。
我想了解的是如何导航到该嵌套结构中,以检索某些数据。例如,我可能想要转到 DEI 级别或 US GAAP 级别并检索特定属性。假设 DEI > EntityCommonStockSharesOutstanding 并获取“标签”、“价值”和“FY”详细信息。
当我尝试如下使用get函数时;
data=[]
for response in response_dic:
data.append("EntityCommonStockSharesOutstanding":response.get('EntityCommonStockSharesOutstanding'))
new_df=pd.DataFrame(data)
new_df.head()
我最终得到以下属性错误;
AttributeError Traceback (most recent call last)
<ipython-input-15-15c1685065f0> in <module>
1 data=[]
2 for response in response_dic:
----> 3 data.append("EntityCommonStockSharesOutstanding":response.get('EntityCommonStockSharesOutstanding'))
4 base_df=pd.DataFrame(data)
5 base_df.head()
AttributeError: 'str' object has no attribute 'get'
【问题讨论】:
你看过response_dic的结构了吗?这是一个嵌套字典。你的循环,即for response in response_dic:
只是循环遍历它的键,这些键是字符串 cik、entityName、facts(不知道你为什么这样做)。要导航到“dei”中的“标签”,只需:response_dic['facts']['dei']['EntityCommonStockSharesOutstanding']['label']
,结果为“实体普通股,流通股”
【参考方案1】:
使用pd.json_normalize
:
例如:
entity1 = response_dic['facts']['dei']['EntityCommonStockSharesOutstanding']
entity2 = response_dic['facts']['dei']['EntityPublicFloat']
df1 = pd.json_normalize(entity1, record_path=['units', 'shares'],
meta=['label', 'description'])
df2 = pd.json_normalize(entity2, record_path=['units', 'USD'],
meta=['label', 'description'])
>>> df1
end val accn ... frame label description
0 2018-10-31 106299106 0001564590-18-028629 ... CY2018Q3I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
1 2019-02-28 106692030 0001627475-19-000007 ... NaN Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
2 2019-04-30 107160359 0001627475-19-000015 ... CY2019Q1I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
3 2019-07-31 110803709 0001627475-19-000025 ... CY2019Q2I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
4 2019-10-31 112020807 0001628280-19-013517 ... CY2019Q3I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
5 2020-02-28 113931825 0001627475-20-000006 ... NaN Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
6 2020-04-30 115142604 0001627475-20-000018 ... CY2020Q1I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
7 2020-07-31 120276173 0001627475-20-000031 ... CY2020Q2I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
8 2020-10-31 122073553 0001627475-20-000044 ... CY2020Q3I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
9 2021-01-31 124962279 0001627475-21-000015 ... CY2020Q4I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
10 2021-04-30 126144849 0001627475-21-000022 ... CY2021Q1I Entity Common Stock, Shares Outstanding Indicate number of shares or other units outst...
[11 rows x 10 columns]
>>> df2
end val accn fy fp form filed frame label description
0 2018-10-03 900000000 0001627475-19-000007 2018 FY 10-K 2019-03-07 CY2018Q3I Entity Public Float The aggregate market value of the voting and n...
1 2019-06-28 1174421292 0001627475-20-000006 2019 FY 10-K 2020-03-02 CY2019Q2I Entity Public Float The aggregate market value of the voting and n...
2 2020-06-30 1532720862 0001627475-21-000015 2020 FY 10-K 2021-02-24 CY2020Q2I Entity Public Float The aggregate market value of the voting and n...
【讨论】:
以上是关于多层次查询深度嵌套复杂的 JSON 数据的主要内容,如果未能解决你的问题,请参考以下文章