如何将嵌套字典转换为 pandas DataFrame?

Posted

技术标签:

【中文标题】如何将嵌套字典转换为 pandas DataFrame?【英文标题】:How to convert nested dictionaries to a pandas DataFrame? 【发布时间】:2019-12-09 11:10:23 【问题描述】:

我想将调用结果从 API 转换为数据帧。 API 调用的结果是一个嵌套字典,但生成的数据帧不是我需要的。

除了json_normalize,我还尝试了pd.DataFrame.from_dict。然而,直到现在都没有成功。我也试过把字典弄平,但没有。

我使用了以下调用:

[73] results = requests.get(url).json()
results

输出是:

'result': 'totalrows': 3124,
  'rows': ['rownum': 1,
    'values': ['field': 'querydate', 'value': '7/31/2019 3:19 PM',
     'field': 'issueid', 'value': 472683,
     'field': 'ticker', 'value': 'AAPL',
     'field': 'companyname', 'value': 'APPLE INC',
     'field': 'issuetitle', 'value': 'COM',
     'field': 'filerid', 'value': 1089387],
   'rownum': 2,
    'values': ['field': 'querydate', 'value': '7/31/2019 3:19 PM',
     'field': 'issueid', 'value': 472683,
     'field': 'ticker', 'value': 'AAPL',
     'field': 'companyname', 'value': 'APPLE INC',
     'field': 'issuetitle', 'value': 'COM',
     'field': 'filerid', 'value': 1086893],
   'rownum': 3,
    'values': ['field': 'querydate', 'value': '7/31/2019 3:19 PM',
     'field': 'issueid', 'value': 472683,
     'field': 'ticker', 'value': 'AAPL',
     'field': 'companyname', 'value': 'APPLE INC',
     'field': 'issuetitle', 'value': 'COM',
     'field': 'filerid', 'value': 1085803]

然后为了生成数据框,我使用了以下代码:


[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()

这是输出:

  rownum    values
0   1      ['field': 'querydate', 'value': '7/31/2019 3:19 PM', 
           'field': 'issueid', 'value': 472683, 'field': 
           'ticker', 'value': 'AAPL', 'field': 'companyname', 
           'value': 'APPLE INC', 'field': 'issuetitle', 'value': 
           'COM', 'field': 'filerid', 'value': 1089387 

1   2      ['field': 'querydate', 'value': '7/31/2019 3:19 PM', 
           'field': 'issueid', 'value': 472683, 'field': 
           'ticker', 'value': 'AAPL', 'field': 'companyname', 
           'value': 'APPLE INC', 'field': 'issuetitle', 'value': 
           'COM', 'field': 'filerid', 'value': 1086893

2   3      ['field': 'querydate', 'value': '7/31/2019 3:19 PM', 'field': 
           'issueid', 'value': 472683, 'field': 'ticker', 'value': 'AAPL', 
           'field': 'companyname', 'value': 'APPLE INC', 'field': 
           'issuetitle', 'value': 'COM', 'field': 'filerid', 'value': 1085803

但是,我想获得一个格式如下的DataFrame:

【问题讨论】:

【参考方案1】:

您可以使用pandas.DataFrame.from_dict,但您需要删除数据中所有不必要的数据。实际上,您只想保留每行的 field 值和 value。您可以通过列表理解来做到这一点:

data = [ field["field"]:field["value"] for field in row['values']
                     for row in data['result']["rows"]]
print(data)
# ['querydate': '7/31/2019 3:19 PM', 
#     'issueid': 472683, 
#     'ticker': 'AAPL', 
#     'companyname': 'APPLE INC',
#     'issuetitle': 'COM',
#     'filerid': 1089387,
# 
#     'querydate': '7/31/2019 3:19 PM',
#     'issueid': 472683,
#     'ticker': 'AAPL',
#     'companyname': 'APPLE INC',
#     'issuetitle': 'COM',
#     'filerid': 1086893,
# 
#     'querydate': '7/31/2019 3:19 PM', 
#     'issueid': 472683, 
#     'ticker': 'AAPL', 
#     'companyname': 'APPLE INC', 
#     'issuetitle': 'COM', 
#     'filerid': 1085803
# ]

一旦你有了这本字典,你就可以调用from_dict方法:

df = pd.DataFrame.from_dict(data)
print(df)
#   companyname  filerid  issueid issuetitle          querydate ticker
# 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM   AAPL
# 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM   AAPL
# 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM   AAPL

如果您想将rownum 作为列(或索引):

data = [**field["field"]:field["value"] for field in row['values'], **'rownum': row["rownum"] for row in data['result']["rows"]]

df = pd.DataFrame.from_dict(data)
print(df)
#   companyname  filerid  issueid issuetitle          querydate  rownum ticker
# 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM       1   AAPL
# 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM       2   AAPL
# 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM       3   AAPL

【讨论】:

差不多了,只是缺少rownum @Akaisteph7 为了在列表理解中做到这一点,我设法通过合并两个 dict 来做到这一点。也许有更好的想法? 非常感谢! @AlexandreB。成功了!我不需要rownum,但在我的问题中没有提到它。【参考方案2】:

朴素的嵌套 for 循环尝试...

import pandas as pd

df = pd.DataFrame([])

for row in json["result"]["rows"]:
    rownum = row["rownum"]
    querydate = issueid = ticker = companyname = issuetitle = filerid = None
    for value_dict in row["values"]:
        if value_dict["field"] == "querydate":
            querydate = value_dict["value"]
        elif value_dict["field"] == "issueid":
            issueid = value_dict["value"]
        elif value_dict["field"] == "ticker":
            ticker = value_dict["value"]
        elif value_dict["field"] == "companyname":
            companyname = value_dict["value"]
        elif value_dict["field"] == "filerid":
            filerid = value_dict["value"]
    df = df.append(pd.DataFrame("rownum": rownum,
                                 "querydate": querydate,
                                 "issueid": issueid,
                                 "ticker": ticker,
                                 "companyname": companyname,
                                 "issuetitle": issuetitle,
                                 "filerid": filerid,
                                , index=[0]), ignore_index=True)

print(df)

JSON 对象:

json = 
    "result": 
        "totalrows": 3,
        "rows": [
            
                "rownum": 1,
                "values": [
                    
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    ,
                    
                        "field": "issueid",
                        "value": 472683
                    ,
                    
                        "field": "ticker",
                        "value": "AAPL"
                    ,
                    
                        "field": "companyname",
                        "value": "APPLE INC"
                    ,
                    
                        "field": "issuetitle",
                        "value": "COM"
                    ,
                    
                        "field": "filerid",
                        "value": 1089387
                    
                ]
            ,
            
                "rownum": 2,
                "values": [
                    
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    ,
                    
                        "field": "issueid",
                        "value": 472683
                    ,
                    
                        "field": "ticker",
                        "value": "AAPL"
                    ,
                    
                        "field": "companyname",
                        "value": "APPLE INC"
                    ,
                    
                        "field": "issuetitle",
                        "value": "COM"
                    ,
                    
                        "field": "filerid",
                        "value": 1086893
                    
                ]
            ,
            
                "rownum": 3,
                "values": [
                    
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    ,
                    
                        "field": "issueid",
                        "value": 472683
                    ,
                    
                        "field": "ticker",
                        "value": "AAPL"
                    ,
                    
                        "field": "companyname",
                        "value": "APPLE INC"
                    ,
                    
                        "field": "issuetitle",
                        "value": "COM"
                    ,
                    
                        "field": "filerid",
                        "value": 1085803
                    
                ]
            
        ]
    

输出:

   rownum          querydate  issueid ticker companyname issuetitle  filerid
0       1  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1089387
1       2  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1086893
2       3  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1085803

【讨论】:

感谢您的贡献!谢谢@shash678

以上是关于如何将嵌套字典转换为 pandas DataFrame?的主要内容,如果未能解决你的问题,请参考以下文章

pandas 将嵌套字典转换为 mutiIndex 行和列

Python Pandas:将嵌套字典转换为数据框

将 pandas DataFrame 转换为嵌套字典

将嵌套字典转换为 pandas 数据框并绘图

将带有嵌套字典的json响应转换为pandas数据框[重复]

如何自动将csv转换为pandas?