将 JSON 数组嵌套到 Python Pandas DataFrame

Posted

技术标签:

【中文标题】将 JSON 数组嵌套到 Python Pandas DataFrame【英文标题】:Nested JSON Array to Python Pandas DataFrame 【发布时间】:2020-02-25 22:23:21 【问题描述】:

我正在尝试在 pandas 数据框中扩展嵌套的 json 数组。

这就是我拥有的 JSON:

[ 
        "id": "0001",
        "name": "Stiven",
        "location": [
                "country": "Colombia",
                "department": "Chocó",
                "city": "Quibdó"
            , 
                "country": "Colombia",
                "department": "Antioquia",
                "city": "Medellin"
            , 
                "country": "Colombia",
                "department": "Cundinamarca",
                "city": "Bogotá"
            
        ]
    , 
        "id": "0002",
        "name": "Jhon Jaime",
        "location": [
                "country": "Colombia",
                "department": "Valle del Cauca",
                "city": "Cali"
            , 
                "country": "Colombia",
                "department": "Putumayo",
                "city": "Mocoa"
            , 
                "country": "Colombia",
                "department": "Arauca",
                "city": "Arauca"
            
        ]
    , 
        "id": "0003",
        "name": "Francisco",
        "location": [
                "country": "Colombia",
                "department": "Atlántico",
                "city": "Barranquilla"
            , 
                "country": "Colombia",
                "department": "Bolívar",
                "city": "Cartagena"
            , 
                "country": "Colombia",
                "department": "La Guajira",
                "city": "Riohacha"
            
        ]
    
]

这就是我拥有的数据框:

index   id    name         location
0       0001  Stiven       ['country':'Colombia', 'department': 'Chocó', 'city': 'Quibdó', 'country':'Colombia', 'department': 'Antioquia', 'city': 'Medellin', 'country':'Colombia', 'department': 'Cundinamarca', 'city': 'Bogotá']
1       0002  Jhon Jaime   ['country':'Colombia', 'department': 'Valle del Cauca', 'city': 'Cali', 'country':'Colombia', 'department': 'Putumayo', 'city': 'Mocoa', 'country':'Colombia', 'department': 'Arauca', 'city': 'Arauca']
2       0003  Francisco    ['country':'Colombia', 'department': 'Atlántico', 'city': 'Barranquilla', 'country':'Colombia', 'department': 'Bolívar', 'city': 'Cartagena', 'country':'Colombia', 'department': 'La Guajira', 'city': 'Riohacha'] 

我需要将每个 id 转换为数据帧,如下所示:

index   id    name         country   department       city
0       0001  Stiven       Colombia  Chocó            Quibdó
1       0001  Stiven       Colombia  Antioquia        Medellin
2       0001  Stiven       Colombia  Cundinamarca     Bogotá
3       0002  Jhon Jaime   Colombia  Valle del Cauca  Cali
4       0002  Jhon Jaime   Colombia  Putumayo         Mocoa
5       0002  Jhon Jaime   Colombia  Arauca           Arauca
6       0003  Francisco    Colombia  Atlántico        Barranquilla
7       0003  Francisco    Colombia  Bolívar          Cartagena 
8       0003  Francisco    Colombia  La Guajira       Riohacha   

提前致谢。

【问题讨论】:

【参考方案1】: 如果 JSON 是从文件加载的,请使用 json.loads,但如果 JSON 直接来自 API,则可能没有必要。 使用pandas.json_normalizemeta 参数,将JSON 转换为DataFrame。
import pandas as pd
from pathlib import Path
import json

# path to file
p = Path(r'c:\path_to_file\test.json')

# read json
with p.open('r', encoding='utf-8') as f:
    data = json.loads(f.read())

# create dataframe
df = pd.json_normalize(data, record_path='location', meta=['id', 'name'])

# output
  country       department          city    id        name
 Colombia            Chocó        Quibdó  0001      Stiven
 Colombia        Antioquia      Medellin  0001      Stiven
 Colombia     Cundinamarca        Bogotá  0001      Stiven
 Colombia  Valle del Cauca          Cali  0002  Jhon Jaime
 Colombia         Putumayo         Mocoa  0002  Jhon Jaime
 Colombia           Arauca        Arauca  0002  Jhon Jaime
 Colombia        Atlántico  Barranquilla  0003   Francisco
 Colombia          Bolívar     Cartagena  0003   Francisco
 Colombia       La Guajira      Riohacha  0003   Francisco

【讨论】:

以上是关于将 JSON 数组嵌套到 Python Pandas DataFrame的主要内容,如果未能解决你的问题,请参考以下文章

如何使用材料角度将嵌套的json数组显示到html表中

Javascript - 使用 jQuery 将 JSON 嵌套到数组中

无法使用 Mongoose 将嵌套的 JSON 数组存储到 MongoDB

如何将带有嵌套数组的 JSON 对象映射到打字稿模型中?

将 numpy 数组复制到 Panda 多索引中(大小相同)

是否可以使用 Spring Boot 将嵌套的 Json 数组存储到数据库?