将 JSON 数组嵌套到 Python Pandas DataFrame
Posted
技术标签:
【中文标题】将 JSON 数组嵌套到 Python Pandas DataFrame【英文标题】:Nested JSON Array to Python Pandas DataFrame 【发布时间】:2020-02-25 22:23:21 【问题描述】:我正在尝试在 pandas 数据框中扩展嵌套的 json 数组。
这就是我拥有的 JSON:
[
"id": "0001",
"name": "Stiven",
"location": [
"country": "Colombia",
"department": "Chocó",
"city": "Quibdó"
,
"country": "Colombia",
"department": "Antioquia",
"city": "Medellin"
,
"country": "Colombia",
"department": "Cundinamarca",
"city": "Bogotá"
]
,
"id": "0002",
"name": "Jhon Jaime",
"location": [
"country": "Colombia",
"department": "Valle del Cauca",
"city": "Cali"
,
"country": "Colombia",
"department": "Putumayo",
"city": "Mocoa"
,
"country": "Colombia",
"department": "Arauca",
"city": "Arauca"
]
,
"id": "0003",
"name": "Francisco",
"location": [
"country": "Colombia",
"department": "Atlántico",
"city": "Barranquilla"
,
"country": "Colombia",
"department": "Bolívar",
"city": "Cartagena"
,
"country": "Colombia",
"department": "La Guajira",
"city": "Riohacha"
]
]
这就是我拥有的数据框:
index id name location
0 0001 Stiven ['country':'Colombia', 'department': 'Chocó', 'city': 'Quibdó', 'country':'Colombia', 'department': 'Antioquia', 'city': 'Medellin', 'country':'Colombia', 'department': 'Cundinamarca', 'city': 'Bogotá']
1 0002 Jhon Jaime ['country':'Colombia', 'department': 'Valle del Cauca', 'city': 'Cali', 'country':'Colombia', 'department': 'Putumayo', 'city': 'Mocoa', 'country':'Colombia', 'department': 'Arauca', 'city': 'Arauca']
2 0003 Francisco ['country':'Colombia', 'department': 'Atlántico', 'city': 'Barranquilla', 'country':'Colombia', 'department': 'Bolívar', 'city': 'Cartagena', 'country':'Colombia', 'department': 'La Guajira', 'city': 'Riohacha']
我需要将每个 id 转换为数据帧,如下所示:
index id name country department city
0 0001 Stiven Colombia Chocó Quibdó
1 0001 Stiven Colombia Antioquia Medellin
2 0001 Stiven Colombia Cundinamarca Bogotá
3 0002 Jhon Jaime Colombia Valle del Cauca Cali
4 0002 Jhon Jaime Colombia Putumayo Mocoa
5 0002 Jhon Jaime Colombia Arauca Arauca
6 0003 Francisco Colombia Atlántico Barranquilla
7 0003 Francisco Colombia Bolívar Cartagena
8 0003 Francisco Colombia La Guajira Riohacha
提前致谢。
【问题讨论】:
【参考方案1】: 如果JSON
是从文件加载的,请使用 json.loads
,但如果 JSON
直接来自 API,则可能没有必要。
使用pandas.json_normalize
和meta
参数,将JSON
转换为DataFrame。
import pandas as pd
from pathlib import Path
import json
# path to file
p = Path(r'c:\path_to_file\test.json')
# read json
with p.open('r', encoding='utf-8') as f:
data = json.loads(f.read())
# create dataframe
df = pd.json_normalize(data, record_path='location', meta=['id', 'name'])
# output
country department city id name
Colombia Chocó Quibdó 0001 Stiven
Colombia Antioquia Medellin 0001 Stiven
Colombia Cundinamarca Bogotá 0001 Stiven
Colombia Valle del Cauca Cali 0002 Jhon Jaime
Colombia Putumayo Mocoa 0002 Jhon Jaime
Colombia Arauca Arauca 0002 Jhon Jaime
Colombia Atlántico Barranquilla 0003 Francisco
Colombia Bolívar Cartagena 0003 Francisco
Colombia La Guajira Riohacha 0003 Francisco
【讨论】:
以上是关于将 JSON 数组嵌套到 Python Pandas DataFrame的主要内容,如果未能解决你的问题,请参考以下文章
Javascript - 使用 jQuery 将 JSON 嵌套到数组中
无法使用 Mongoose 将嵌套的 JSON 数组存储到 MongoDB