如何使用 flatten_json 递归地展平嵌套的 JSON
Posted
技术标签:
【中文标题】如何使用 flatten_json 递归地展平嵌套的 JSON【英文标题】:How to flatten a nested JSON recursively, with flatten_json 【发布时间】:2020-02-14 23:18:15 【问题描述】:这个问题特定于使用来自GitHub Repo: flatten 的flatten_json
包在pypi上flatten-json 0.1.7,可以用pip install flatten-json
安装
此问题特定于软件包的以下组件:
def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:
"""
Flatten a list of nested dicts.
"""
out = dict()
def flatten(x: (list, dict, str), name: str='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude:
flatten(x[a], f'nameasep')
elif type(x) is list:
i = 0
for a in x:
flatten(a, f'nameisep')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
使用递归来展平嵌套的dicts
Thinking Recursively in Python
Flattening JSON objects in Python
data
可以嵌套多少?:
flatten_json
已用于解压最终超过 100000 列的文件
扁平化的 JSON 是否可以不扁平化?:
是的,这个问题不包括这个。不过如果你安装flatten
这个包,有unflatten
的方法,不过我没有测试过。
【问题讨论】:
【参考方案1】:如何展平JSON
或dict
是一个常见问题,有很多答案。
此答案侧重于使用flatten_json
递归地展平嵌套的dict
或JSON
。
假设:
此答案假定您已经将JSON
或dict
加载到某个变量(例如文件、api 等)中
在这种情况下,我们将使用data
data
是如何加载到flatten_json
中的:
它接受dict
,如函数类型提示所示。
data
的最常见形式:
只是一个字典:
flatten_json(data)
字典列表:[, , ]
[flatten_json(x) for x in data]
带有***键的 JSON,其中值重复:1: , 2: , 3:
[flatten_json(data[key]) for key in data.keys()]
其他
'key': [, , ]
: [flatten_json(x) for x in data['key']]
实例:
我通常将data
扁平化为pandas.DataFrame
以供进一步分析。
用import pandas as pd
加载pandas
flatten_json
返回一个dict
,可以直接使用csv
包保存。
数据1:
"id": 1,
"class": "c1",
"owner": "myself",
"metadata":
"m1":
"value": "m1_1",
"timestamp": "d1"
,
"m2":
"value": "m1_2",
"timestamp": "d2"
,
"m3":
"value": "m1_3",
"timestamp": "d3"
,
"m4":
"value": "m1_4",
"timestamp": "d4"
,
"a1":
"a11": [
]
,
"m1": ,
"comm1": "COMM1",
"comm2": "COMM21529089656387",
"share": "xxx",
"share1": "yyy",
"hub1": "h1",
"hub2": "h2",
"context": [
]
展平 1:
df = pd.DataFrame([flatten_json(data)])
id class owner metadata_m1_value metadata_m1_timestamp metadata_m2_value metadata_m2_timestamp metadata_m3_value metadata_m3_timestamp metadata_m4_value metadata_m4_timestamp comm1 comm2 share share1 hub1 hub2
1 c1 myself m1_1 d1 m1_2 d2 m1_3 d3 m1_4 d4 COMM1 COMM21529089656387 xxx yyy h1 h2
数据2:
[
'accuracy': 17,
'activity': [
'activity': [
'confidence': 100,
'type': 'STILL'
],
'timestampMs': '1542652'
],
'altitude': -10,
'latitudeE7': 3777321,
'longitudeE7': -122423125,
'timestampMs': '1542654',
'verticalAccuracy': 2
,
'accuracy': 17,
'activity': [
'activity': [
'confidence': 100,
'type': 'STILL'
],
'timestampMs': '1542652'
],
'altitude': -10,
'latitudeE7': 3777321,
'longitudeE7': -122423125,
'timestampMs': '1542654',
'verticalAccuracy': 2
,
'accuracy': 17,
'activity': [
'activity': [
'confidence': 100,
'type': 'STILL'
],
'timestampMs': '1542652'
],
'altitude': -10,
'latitudeE7': 3777321,
'longitudeE7': -122423125,
'timestampMs': '1542654',
'verticalAccuracy': 2
]
展平 2:
df = pd.DataFrame([flatten_json(x) for x in data])
accuracy activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs altitude latitudeE7 longitudeE7 timestampMs verticalAccuracy
17 100 STILL 1542652 -10 3777321 -122423125 1542654 2
17 100 STILL 1542652 -10 3777321 -122423125 1542654 2
17 100 STILL 1542652 -10 3777321 -122423125 1542654 2
数据3:
"1":
"VENUE": "JOEBURG",
"COUNTRY": "HAE",
"ITW": "XAD",
"RACES":
"1":
"NO": 1,
"TIME": "12:35"
,
"2":
"NO": 2,
"TIME": "13:10"
,
"3":
"NO": 3,
"TIME": "13:40"
,
"4":
"NO": 4,
"TIME": "14:10"
,
"5":
"NO": 5,
"TIME": "14:55"
,
"6":
"NO": 6,
"TIME": "15:30"
,
"7":
"NO": 7,
"TIME": "16:05"
,
"8":
"NO": 8,
"TIME": "16:40"
,
"2":
"VENUE": "FOOBURG",
"COUNTRY": "ABA",
"ITW": "XAD",
"RACES":
"1":
"NO": 1,
"TIME": "12:35"
,
"2":
"NO": 2,
"TIME": "13:10"
,
"3":
"NO": 3,
"TIME": "13:40"
,
"4":
"NO": 4,
"TIME": "14:10"
,
"5":
"NO": 5,
"TIME": "14:55"
,
"6":
"NO": 6,
"TIME": "15:30"
,
"7":
"NO": 7,
"TIME": "16:05"
,
"8":
"NO": 8,
"TIME": "16:40"
展平 3:
df = pd.DataFrame([flatten_json(data[key]) for key in data.keys()])
VENUE COUNTRY ITW RACES_1_NO RACES_1_TIME RACES_2_NO RACES_2_TIME RACES_3_NO RACES_3_TIME RACES_4_NO RACES_4_TIME RACES_5_NO RACES_5_TIME RACES_6_NO RACES_6_TIME RACES_7_NO RACES_7_TIME RACES_8_NO RACES_8_TIME
JOEBURG HAE XAD 1 12:35 2 13:10 3 13:40 4 14:10 5 14:55 6 15:30 7 16:05 8 16:40
FOOBURG ABA XAD 1 12:35 2 13:10 3 13:40 4 14:10 5 14:55 6 15:30 7 16:05 8 16:40
其他例子:
-
Python Pandas - Flatten Nested JSON
handling nested json in pandas
How to flatten a nested JSON from the NASA Weather Insight API in Python
【讨论】:
以上是关于如何使用 flatten_json 递归地展平嵌套的 JSON的主要内容,如果未能解决你的问题,请参考以下文章