如何使用 flatten_json 递归地展平嵌套的 JSON

Posted

技术标签:

【中文标题】如何使用 flatten_json 递归地展平嵌套的 JSON【英文标题】:How to flatten a nested JSON recursively, with flatten_json 【发布时间】:2020-02-14 23:18:15 【问题描述】:

这个问题特定于使用来自GitHub Repo: flatten 的flatten_json

包在pypi上flatten-json 0.1.7,可以用pip install flatten-json安装 此问题特定于软件包的以下组件:
def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'nameasep')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'nameisep')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

使用递归来展平嵌套的dicts

Thinking Recursively in Python Flattening JSON objects in Python

data 可以嵌套多少?:

flatten_json 已用于解压最终超过 100000 列的文件

扁平化的 JSON 是否可以不扁平化?:

是的,这个问题不包括这个。不过如果你安装flatten这个包,有unflatten的方法,不过我没有测试过。

【问题讨论】:

【参考方案1】:

如何展平JSONdict 是一个常见问题,有很多答案。

此答案侧重于使用flatten_json 递归地展平嵌套的dictJSON

假设:

此答案假定您已经将JSONdict 加载到某个变量(例如文件、api 等)中 在这种情况下,我们将使用data

data是如何加载到flatten_json中的:

它接受dict,如函数类型提示所示。

data的最常见形式:

只是一个字典: flatten_json(data) 字典列表:[, , ] [flatten_json(x) for x in data] 带有***键的 JSON,其中值重复:1: , 2: , 3: [flatten_json(data[key]) for key in data.keys()] 其他 'key': [, , ]: [flatten_json(x) for x in data['key']]

实例:

我通常将data 扁平化为pandas.DataFrame 以供进一步分析。 用import pandas as pd 加载pandas flatten_json 返回一个dict,可以直接使用csv 包保存。

数据1:


    "id": 1,
    "class": "c1",
    "owner": "myself",
    "metadata": 
        "m1": 
            "value": "m1_1",
            "timestamp": "d1"
        ,
        "m2": 
            "value": "m1_2",
            "timestamp": "d2"
        ,
        "m3": 
            "value": "m1_3",
            "timestamp": "d3"
        ,
        "m4": 
            "value": "m1_4",
            "timestamp": "d4"
        
    ,
    "a1": 
        "a11": [

        ]
    ,
    "m1": ,
    "comm1": "COMM1",
    "comm2": "COMM21529089656387",
    "share": "xxx",
    "share1": "yyy",
    "hub1": "h1",
    "hub2": "h2",
    "context": [

    ]

展平 1:

df = pd.DataFrame([flatten_json(data)])

 id class   owner metadata_m1_value metadata_m1_timestamp metadata_m2_value metadata_m2_timestamp metadata_m3_value metadata_m3_timestamp metadata_m4_value metadata_m4_timestamp  comm1               comm2 share share1 hub1 hub2
  1    c1  myself              m1_1                    d1              m1_2                    d2              m1_3                    d3              m1_4                    d4  COMM1  COMM21529089656387   xxx    yyy   h1   h2

数据2:

[
        'accuracy': 17,
        'activity': [
                'activity': [
                        'confidence': 100,
                        'type': 'STILL'
                    
                ],
                'timestampMs': '1542652'
            
        ],
        'altitude': -10,
        'latitudeE7': 3777321,
        'longitudeE7': -122423125,
        'timestampMs': '1542654',
        'verticalAccuracy': 2
    , 
        'accuracy': 17,
        'activity': [
                'activity': [
                        'confidence': 100,
                        'type': 'STILL'
                    
                ],
                'timestampMs': '1542652'
            
        ],
        'altitude': -10,
        'latitudeE7': 3777321,
        'longitudeE7': -122423125,
        'timestampMs': '1542654',
        'verticalAccuracy': 2
    , 
        'accuracy': 17,
        'activity': [
                'activity': [
                        'confidence': 100,
                        'type': 'STILL'
                    
                ],
                'timestampMs': '1542652'
            
        ],
        'altitude': -10,
        'latitudeE7': 3777321,
        'longitudeE7': -122423125,
        'timestampMs': '1542654',
        'verticalAccuracy': 2
    
]

展平 2:

df = pd.DataFrame([flatten_json(x) for x in data])

 accuracy  activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs  altitude  latitudeE7  longitudeE7 timestampMs  verticalAccuracy
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2

数据3:


    "1": 
        "VENUE": "JOEBURG",
        "COUNTRY": "HAE",
        "ITW": "XAD",
        "RACES": 
            "1": 
                "NO": 1,
                "TIME": "12:35"
            ,
            "2": 
                "NO": 2,
                "TIME": "13:10"
            ,
            "3": 
                "NO": 3,
                "TIME": "13:40"
            ,
            "4": 
                "NO": 4,
                "TIME": "14:10"
            ,
            "5": 
                "NO": 5,
                "TIME": "14:55"
            ,
            "6": 
                "NO": 6,
                "TIME": "15:30"
            ,
            "7": 
                "NO": 7,
                "TIME": "16:05"
            ,
            "8": 
                "NO": 8,
                "TIME": "16:40"
            
        
    ,
    "2": 
        "VENUE": "FOOBURG",
        "COUNTRY": "ABA",
        "ITW": "XAD",
        "RACES": 
            "1": 
                "NO": 1,
                "TIME": "12:35"
            ,
            "2": 
                "NO": 2,
                "TIME": "13:10"
            ,
            "3": 
                "NO": 3,
                "TIME": "13:40"
            ,
            "4": 
                "NO": 4,
                "TIME": "14:10"
            ,
            "5": 
                "NO": 5,
                "TIME": "14:55"
            ,
            "6": 
                "NO": 6,
                "TIME": "15:30"
            ,
            "7": 
                "NO": 7,
                "TIME": "16:05"
            ,
            "8": 
                "NO": 8,
                "TIME": "16:40"
            
        
    

展平 3:

df = pd.DataFrame([flatten_json(data[key]) for key in data.keys()])

   VENUE COUNTRY  ITW  RACES_1_NO RACES_1_TIME  RACES_2_NO RACES_2_TIME  RACES_3_NO RACES_3_TIME  RACES_4_NO RACES_4_TIME  RACES_5_NO RACES_5_TIME  RACES_6_NO RACES_6_TIME  RACES_7_NO RACES_7_TIME  RACES_8_NO RACES_8_TIME
 JOEBURG     HAE  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40
 FOOBURG     ABA  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40

其他例子:

    Python Pandas - Flatten Nested JSON handling nested json in pandas How to flatten a nested JSON from the NASA Weather Insight API in Python

【讨论】:

以上是关于如何使用 flatten_json 递归地展平嵌套的 JSON的主要内容,如果未能解决你的问题,请参考以下文章

如何在keras中递归扩展/解析/展平嵌套模型?

使用递归(并且不使用循环)展平嵌套数组

将嵌套列表展平为 1 深列表

以递归方式展平包含未知级别的嵌套数组和映射的嵌套映射

一日一技:如何把多层嵌套的列表展平

递归地展平数组(不循环)javascript