从字典列表中创建嵌套的 json 对象

Posted

技术标签:

【中文标题】从字典列表中创建嵌套的 json 对象【英文标题】:Create nested json object out of a list of dicts 【发布时间】:2019-04-02 18:32:31 【问题描述】:

我想将字典列表翻译成嵌套的.json 文件对象。我有一个字典列表,字典中的一个字段指示特定字段是否应该嵌套在 .json 文件中的哪个位置,如果是,则指示。

我可以将内容嵌套到适当的表中,但是让它们嵌套在字段中的更下方会让我陷入循环。

我的数据采用以下格式:

table_list = [
    "Table": "table1", "Field": "field1", "Description": "description1", "Type": "STR", 
    "Table": "table1", "Field": "field2", "Description": "description2", "Type": "STR", 
    "Table": "table1", "Field": "field3", "Description": "description3", "Type": "STR",
    "Table": "table1", "Field": "field4", "Description": "description4", "Type": "STR",
    "Table": "table1", "Field": "field5", "Description": "description5", "Type": "RECORD",
    "Table": "table1", "Field": "field5.nest1", "Description": "description6", "Type": "STR",
    "Table": "table1", "Field": "field5.nest2", "Description": "description7", "Type": "STR",
    "Table": "table1", "Field": "field5.nest3", "Description": "description8", "Type": "STR",
    "Table": "table1", "Field": "field5.nest4", "Description": "description9", "Type": "RECORD",
    "Table": "table1", "Field": "field5.nest4.nest1", "Description": "description10", "Type": "STR",
    "Table": "table1", "Field": "field5.nest4.nest2", "Description": "description11", "Type": "STR",
    "Table": "table2", "Field": "field1", "Description": "description1", "Type": "STR"
]

我希望它输出为这种格式(抱歉有任何拼写错误):


    "table1": [
    
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "field2",
        "Description": "description2",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "field3",
        "Description": "description3",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "field4",
        "Description": "description4",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "field5",
        "Description": "description5",
        "Mode": "REPEATED",
        "Type": "RECORD",
        "Fields": [
            
                "Field": "nest1",
                "Description": "description6",
                "Mode": "NULLABLE",
                "Type": "STR"
            ,
            
                "Field": "nest2",
                "Description": "description7",
                "Mode": "NULLABLE",
                "Type": "STR"
            ,
            
                "Field": "nest3",
                "Description": "description8",
                "Mode": "NULLABLE",
                "Type": "STR"
            ,
            
                "Field": "nest4",
                "Description": "description9",
                "Mode": "REPEATED",
                "Type": "RECORD",
                "Fields": [
                    
                        "Field": "nest1",
                        "Description": "description10",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    ,
                    
                        "Field": "nest2",
                        "Description": "description11",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    
                ]
            
        ]
    
    ]
    "table2": [
    
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    ]

我无法让nest1 和nest2 在现有dict 中创建一个新字段,其中包含一个可以添加到carying 深度的开放列表。本例中的巢穴只有 3 层深,但我可能需要最多达到 15 层

我的代码将在第一级使用"Table" 应用此代码,但进入字段以添加到该列表具有挑战性,我还没有找到具有完全相同问题的问题。

我看到很多人试图通过展平嵌套结构来反向执行此操作,但我正在尝试创建嵌套。

import json


def create_schema(file_to_read):
    all_tables = 
    for row in file_to_read:
        if row['Table'] in all_tables.keys():
            all_tables[row['Table']].append("Mode": "NULLABLE",
                                             "Field": row['Field'],
                                             "Type": row['Type'],
                                             "Description": row['Description'])
        else:
            all_tables[row['Table']] = []
            all_tables[row['Table']].append("Mode": "NULLABLE",
                                             "Field": row['Field'],
                                             "Type": row['Type'],
                                             "Description": row['Description'])
    return json.dumps(all_tables, indent=4, sort_keys=True)

我实际使用此功能得到的是:


    "table1": [
    
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "field2",
        "Description": "description2",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "field3",
        "Description": "description3",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "field4",
        "Description": "description4",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "field5",
        "Description": "description5",
        "Mode": "NULLABLE",
        "Type": "RECORD",
    ,
    
        "Field": "nest1",
        "Description": "description6",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "nest2",
        "Description": "description7",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "nest3",
        "Description": "description8",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "nest4",
        "Description": "description9",
        "Mode": "NULLABLE",
        "Type": "RECORD",
    ,
    
        "Field": "nest1",
        "Description": "description10",
        "Mode": "NULLABLE",
        "Type": "STR"
    ,
    
        "Field": "nest2",
        "Description": "description11",
        "Mode": "NULLABLE",
        "Type": "STR"
    
    ]
    "table2": [
    
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    
    ]

(对于上下文,这旨在作为 BigQuery json 架构着陆)

【问题讨论】:

【参考方案1】:

这应该可以实现您的目标:

from collections import defaultdict

d = defaultdict(list)
for t in table_list:
    field_list = d[t['Table']]
    field = t['Field'].split('.')
    for f in field[:-1]:
        field_list = next(el['Fields'] for el in field_list if el['Field'] == f)
    new_d = 'Field': field[-1], 'Description': t['Description'], 'Mode': 'NULLABLE' if t['Type'] == 'STR' else 'REPEATED', 'Type': t['Type']
    field_list.append(defaultdict(list, new_d))

print(json.dumps(d, indent=4))

或者如果您不想使用defaultdict

d = 
for t in table_list:
    if t['Table'] not in d:
        d[t['Table']] = []
    field_list = d[t['Table']]
    field = t['Field'].split('.')
    for f in field[:-1]:
        inner = next(el for el in field_list if el['Field'] == f)
        if 'Fields' not in inner:
            inner['Fields'] = []
        field_list = inner['Fields']
    new_d = 'Field': field[-1], 'Description': t['Description'], 'Mode': 'NULLABLE' if t['Type'] == 'STR' else 'REPEATED', 'Type': t['Type']
    field_list.append(new_d)

输出:


    "table1": [
        
            "Field": "field1",
            "Description": "description1",
            "Mode": "NULLABLE",
            "Type": "STR"
        ,
        
            "Field": "field2",
            "Description": "description2",
            "Mode": "NULLABLE",
            "Type": "STR"
        ,
        
            "Field": "field3",
            "Description": "description3",
            "Mode": "NULLABLE",
            "Type": "STR"
        ,
        
            "Field": "field4",
            "Description": "description4",
            "Mode": "NULLABLE",
            "Type": "STR"
        ,
        
            "Field": "field5",
            "Description": "description5",
            "Mode": "REPEATED",
            "Type": "RECORD",
            "Fields": [
                
                    "Field": "nest1",
                    "Description": "description6",
                    "Mode": "NULLABLE",
                    "Type": "STR"
                ,
                
                    "Field": "nest2",
                    "Description": "description7",
                    "Mode": "NULLABLE",
                    "Type": "STR"
                ,
                
                    "Field": "nest3",
                    "Description": "description8",
                    "Mode": "NULLABLE",
                    "Type": "STR"
                ,
                
                    "Field": "nest4",
                    "Description": "description9",
                    "Mode": "REPEATED",
                    "Type": "RECORD",
                    "Fields": [
                        
                            "Field": "nest1",
                            "Description": "description10",
                            "Mode": "NULLABLE",
                            "Type": "STR"
                        ,
                        
                            "Field": "nest2",
                            "Description": "description11",
                            "Mode": "NULLABLE",
                            "Type": "STR"
                        
                    ]
                
            ]
        
    ],
    "table2": [
        
            "Field": "field1",
            "Description": "description1",
            "Mode": "NULLABLE",
            "Type": "STR"
        
    ]

【讨论】:

太棒了!以前从未使用过 defaultdict,但我已经爱上了它!

以上是关于从字典列表中创建嵌套的 json 对象的主要内容,如果未能解决你的问题,请参考以下文章

在字典中创建嵌套列表,列表中没有重复项

从嵌套字典列表中获取熊猫数据框

添加嵌套字典会导致 JSONSerialization 返回 nil

如何在 ios 中创建嵌套字典结构?

在 Python 中创建嵌套的数据类对象

Python - 将字典列表附加到嵌套的默认字典时出现关键错误