多层次查询深度嵌套复杂的 JSON 数据

Posted

技术标签:

【中文标题】多层次查询深度嵌套复杂的 JSON 数据【英文标题】:Querying deeply nested and complex JSON data with multiple levels 【发布时间】:2021-09-26 16:35:10 【问题描述】:

我正在努力分解从深度嵌套的复杂 JSON 数据中提取数据所需的方法。我有以下代码来获取 JSON。

import requests
import pandas as pd
import json
import pprint
import seaborn as sns
import matplotlib.pyplot as plt

base_url="https://data.sec.gov/api/xbrl/companyfacts/CIK0001627475.json"
headers='User-Agent': 'Myheaderdata'
first_response=requests.get(base_url,headers=headers)
response_dic=first_response.json()   
print(response_dic)
base_df=pd.DataFrame(response_dic)
base_df.head()

它提供了一个显示 JSON 和 Pandas DataFrame 的输出。数据框有两列,第三列 (FACTS) 包含大量嵌套数据。

我想了解的是如何导航到该嵌套结构中,以检索某些数据。例如,我可能想要转到 DEI 级别或 US GAAP 级别并检索特定属性。假设 DEI > EntityCommonStockSharesOutstanding 并获取“标签”、“价值”和“FY”详细信息。

当我尝试如下使用get函数时;

data=[]
for response in response_dic:

        data.append("EntityCommonStockSharesOutstanding":response.get('EntityCommonStockSharesOutstanding'))
    new_df=pd.DataFrame(data)
    new_df.head()

我最终得到以下属性错误;

AttributeError                            Traceback (most recent call last)
<ipython-input-15-15c1685065f0> in <module>
      1 data=[]
      2 for response in response_dic:
----> 3     data.append("EntityCommonStockSharesOutstanding":response.get('EntityCommonStockSharesOutstanding'))
      4 base_df=pd.DataFrame(data)
      5 base_df.head()

AttributeError: 'str' object has no attribute 'get'

【问题讨论】:

你看过response_dic的结构了吗?这是一个嵌套字典。你的循环,即for response in response_dic: 只是循环遍历它的键,这些键是字符串 cik、entityName、facts(不知道你为什么这样做)。要导航到“dei”中的“标签”,只需:response_dic['facts']['dei']['EntityCommonStockSharesOutstanding']['label'],结果为“实体普通股,流通股” 【参考方案1】:

使用pd.json_normalize:

例如:

entity1 = response_dic['facts']['dei']['EntityCommonStockSharesOutstanding']
entity2 = response_dic['facts']['dei']['EntityPublicFloat']

df1 = pd.json_normalize(entity1, record_path=['units', 'shares'],
                        meta=['label', 'description'])

df2 = pd.json_normalize(entity2, record_path=['units', 'USD'],
                        meta=['label', 'description'])
>>> df1
           end        val                  accn  ...      frame                                    label                                        description
0   2018-10-31  106299106  0001564590-18-028629  ...  CY2018Q3I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
1   2019-02-28  106692030  0001627475-19-000007  ...        NaN  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
2   2019-04-30  107160359  0001627475-19-000015  ...  CY2019Q1I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
3   2019-07-31  110803709  0001627475-19-000025  ...  CY2019Q2I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
4   2019-10-31  112020807  0001628280-19-013517  ...  CY2019Q3I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
5   2020-02-28  113931825  0001627475-20-000006  ...        NaN  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
6   2020-04-30  115142604  0001627475-20-000018  ...  CY2020Q1I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
7   2020-07-31  120276173  0001627475-20-000031  ...  CY2020Q2I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
8   2020-10-31  122073553  0001627475-20-000044  ...  CY2020Q3I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
9   2021-01-31  124962279  0001627475-21-000015  ...  CY2020Q4I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
10  2021-04-30  126144849  0001627475-21-000022  ...  CY2021Q1I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...

[11 rows x 10 columns]


>>> df2
          end         val                  accn    fy  fp  form       filed      frame                label                                        description
0  2018-10-03   900000000  0001627475-19-000007  2018  FY  10-K  2019-03-07  CY2018Q3I  Entity Public Float  The aggregate market value of the voting and n...
1  2019-06-28  1174421292  0001627475-20-000006  2019  FY  10-K  2020-03-02  CY2019Q2I  Entity Public Float  The aggregate market value of the voting and n...
2  2020-06-30  1532720862  0001627475-21-000015  2020  FY  10-K  2021-02-24  CY2020Q2I  Entity Public Float  The aggregate market value of the voting and n...

【讨论】:

以上是关于多层次查询深度嵌套复杂的 JSON 数据的主要内容,如果未能解决你的问题,请参考以下文章

具有深度嵌套层次结构的不可变 NSDictionary:更改键的值?

Rails 使用 JSON 将深度嵌套属性到 Vue 实例

我可以从 Relay 查询结果中提取深度嵌套的节点吗?

如何标准化具有很多不同集合的深度嵌套的 JSON 结果?

Perl的浅拷贝和深度拷贝

具有深度嵌套数组循环的 SQL 查询 OpenJson