使用 pandas python 将嵌套的 JSON 解析为多个数据帧

Posted

技术标签:

【中文标题】使用 pandas python 将嵌套的 JSON 解析为多个数据帧【英文标题】:parsing nested JSON into multiple dataframe using pandas python 【发布时间】:2017-03-01 23:00:27 【问题描述】:

我有一个如下所示的嵌套 JSON,并想在 python 中解析成多个数据帧..请帮助


"tableName": "cases",
"url": "EndpointVoid",
"tableDataList": [
    "_id": "100017252700",
    "title": "Test",
    "type": "TECH",
    "created": "2016-09-06T19:00:17.071Z",
    "createdBy": "193164275",
    "lastModified": "2016-10-04T21:50:49.539Z",
    "lastModifiedBy": "1074113719",
    "notes": [
        "id": "30",
        "title": "Multiple devices",
        "type": "INCCL",
        "origin": "D",
        "componentCode": "PD17A",
        "issueCode": "IP321",
        "affectedProduct": "134322",
        "summary": "testing the json",

        "caller": 
            "email": "katie.slabiak@spps.org",
            "phone": "651-744-4522"
        
    , 
        "id": "50",
        "title": "EDU: Multiple Devices - Lightning-to-USB Cable",
        "type": "INCCL",
        "origin": "D",
        "componentCode": "PD17A",
        "issueCode": "IP321",
        "affectedProduct": "134322",
        "summary": "parsing json 2",
        "caller": 
            "email": "testing1@test.org",
            "phone": "123-345-1111"
        
    ],
    "syncCount": 2316,
    "repair": [
            "id": "D208491610",
            "created": "2016-09-06T19:02:48.000Z",
            "createdBy": "193164275",
            "lastModified": "2016-09-21T12:49:47.000Z"
        , 
            "id": "D208491610"
        , 
            "id": "D208491628",
            "created": "2016-09-06T19:03:37.000Z",
            "createdBy": "193164275",
            "lastModified": "2016-09-21T12:49:47.000Z"
        

    ],
    "enterpriseStatus": "8"
],
"dateTime": 1475617849,
"primaryKeys": ["$._id"],
"primaryKeyVals": ["100017252700"],
"operation": "UPDATE"

我想解析这个并创建 3 个表/dataframe/csv,如下所示..请帮助..

Output table in this format

【问题讨论】:

我认为您的 json 无效 - 请检查 http://jsonlint.com/ 感谢 jezrael 让我知道..这是复制粘贴错误..我刚刚修复了 JSON 文件.. 【参考方案1】:

我认为这不是最好的方法,但我想向您展示可能性。

import pandas as pd
from pandas.io.json import json_normalize
import json

with open('your_sample.json') as f:    
    dt = json.load(f)

表1

df1 = json_normalize(dt, 'tableDataList', 'dateTime')[['_id', 'title', 'type', 'created', 'createdBy', 'lastModified', 'lastModifiedBy', 'dateTime']]
print df1


            _id title  type                   created  createdBy  \
0  100017252700  Test  TECH  2016-09-06T19:00:17.071Z  193164275   

               lastModified lastModifiedBy    dateTime  
0  2016-10-04T21:50:49.539Z     1074113719  1475617849  

表 2

df2 = json_normalize(dt['tableDataList'], 'notes', '_id')
df2['phone'] = df2['caller'].map(lambda x: x['phone'])
df2['email'] = df2['caller'].map(lambda x: x['email'])
df2 = df2[['_id', 'id', 'title', 'email', 'phone']]
print df2


            _id  id                                           title  \
0  100017252700  30                                Multiple devices   
1  100017252700  50  EDU: Multiple Devices - Lightning-to-USB Cable   

                    email         phone  
0  katie.slabiak@spps.org  651-744-4522  
1       testing1@test.org  123-345-1111  

表 3

df3 = json_normalize(dt['tableDataList'], 'repair', '_id').dropna()
print df3


                    created  createdBy          id              lastModified  \
0  2016-09-06T19:02:48.000Z  193164275  D208491610  2016-09-21T12:49:47.000Z   
2  2016-09-06T19:03:37.000Z  193164275  D208491628  2016-09-21T12:49:47.000Z   

            _id  
0  100017252700  
2  100017252700  

【讨论】:

此代码有效.. 基本上我是从 mongodb 以 JSON 格式导出数据,如果我得到多个案例记录,则代码无法正常工作,有时 JSON 中不会填充几列并再次面临 json 索引不可用的问题...

以上是关于使用 pandas python 将嵌套的 JSON 解析为多个数据帧的主要内容,如果未能解决你的问题,请参考以下文章

将 JSON 数组嵌套到 Python Pandas DataFrame

Python Flatten 用 Pandas 将嵌套字典 JSON 相乘

python - 如何将csv转换为python pandas中的嵌套json?

Python Pandas:将嵌套字典转换为数据框

如何将嵌套的 JSON 键规范化为 pandas 数据帧

使用 python/pandas 从特定文件夹中读取几个嵌套的 .json 文件到 excel 中