如何在几个键相等的情况下使用 Python 正确解析嵌套的 json

Posted

技术标签:

【中文标题】如何在几个键相等的情况下使用 Python 正确解析嵌套的 json【英文标题】:How to correctly parse nested json with Python where several keys are equal 【发布时间】:2020-05-18 17:27:50 【问题描述】:

flatten / json_normalize 函数有问题。有一个嵌套的 json,里面有 6 个“收据”,但是展平这个 json 只会给我 1 行和 1 个收据,这也是最后一个,我的 pandas 数据框中需要全部 6 个。

[
  
    "_index": "packets-2020-02-03",
    "_type": "receipts_file",
    "_score": null,
    "_source": 
      "layers": 
        "frame": 
          "frame.encap_type": "25",
          "frame.time": "Feb  3, 2019 00:17:14.004011000 MSK",
          "frame.offset_shift": "0.000000000",
          "frame.time_epoch": "2575325034.004011000",
          "frame.time_delta": "0.002843000",
          "frame.time_delta_displayed": "0.002843000",
          "frame.time_relative": "0.002852000",
          "frame.number": "4",
          "frame.len": "1294",
          "frame.cap_len": "1294",
          "frame.marked": "0",
          "frame.ignored": "0",
          "frame.protocols": "several"
        ,
        "receipts": 
          "receipts.command_length": "238",
          "receipts.command_id": "0x00000005",
          "receipts.sequence_number": "47207",
          "receipts.data_coding": "0x00000000",
          "receipts.data_coding_tree": 
            "receipts.rps": "0x00000000",
            "Receipt Type 1 Data Coding": 
              "receipts.rps.rc_coding_group": "0x00000000",
              "receipts.rps.text_compression": "0",
              "receipts.rps.class_present": "0",
              "receipts.rps.charset": "0x00000000"
            ,
            "Receipt Type 2 Data Coding": 
              "receipts.rps.rpk._coding_group": "0x00000000",
              "receipts.rps.rpk._language": "0x00000000"
            
          ,
          "receipts.rc_default_receipt_id": "0",
          "receipts.rc_length": "117",
          "receipts.receipt": "29831",
          "receipts.opt_params": 
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003002",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "47912"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003001",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "98982"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003004",
              "receipts.opt_param_len": "1",
              "receipts.vendor_op": "00"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003000",
              "receipts.opt_param_len": "4",
              "receipts.vendor_op": "23080"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003003",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "29849"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x0000001e",
              "receipts.opt_param_len": "9",
              "receipts.receipted_receipt_id": "949BB6DE"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00000427",
              "receipts.opt_param_len": "1",
              "receipts.receipt_state": "2"
            
          
        ,
        "receipts": 
          "receipts.command_length": "241",
          "receipts.command_id": "0x00000005",
          "receipts.sequence_number": "47208",
          "receipts.data_coding": "0x00000000",
          "receipts.data_coding_tree": 
            "receipts.rps": "0x00000000",
            "Receipt Type 1 Data Coding": 
              "receipts.rps.rc_coding_group": "0x00000000",
              "receipts.rps.text_compression": "0",
              "receipts.rps.class_present": "0",
              "receipts.rps.charset": "0x00000000"
            ,
            "Receipt Type 2 Data Coding": 
              "receipts.rps.rpk._coding_group": "0x00000000",
              "receipts.rps.rpk._language": "0x00000000"
            
          ,
          "receipts.rc_default_receipt_id": "0",
          "receipts.rc_length": "117",
          "receipts.receipt": "98341",
          "receipts.opt_params": 
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003002",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "38220"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003001",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "93813"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003004",
              "receipts.opt_param_len": "1",
              "receipts.vendor_op": "00"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003000",
              "receipts.opt_param_len": "4",
              "receipts.vendor_op": "98381"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003003",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "77371"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x0000001e",
              "receipts.opt_param_len": "9",
              "receipts.receipted_receipt_id": "6DED391C"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00000427",
              "receipts.opt_param_len": "1",
              "receipts.receipt_state": "2"
            
          
        ,
        "receipts": 
          "receipts.command_length": "238",
          "receipts.command_id": "0x00000005",
          "receipts.sequence_number": "47209",
          "receipts.data_coding": "0x00000000",
          "receipts.data_coding_tree": 
            "receipts.rps": "0x00000000",
            "Receipt Type 1 Data Coding": 
              "receipts.rps.rc_coding_group": "0x00000000",
              "receipts.rps.text_compression": "0",
              "receipts.rps.class_present": "0",
              "receipts.rps.charset": "0x00000000"
            ,
            "Receipt Type 2 Data Coding": 
              "receipts.rps.rpk._coding_group": "0x00000000",
              "receipts.rps.rpk._language": "0x00000000"
            
          ,
          "receipts.rc_default_receipt_id": "0",
          "receipts.rc_length": "117",
          "receipts.opt_params": 
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003002",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "38717"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003001",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "37788"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003004",
              "receipts.opt_param_len": "1",
              "receipts.vendor_op": "74818"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003000",
              "receipts.opt_param_len": "4",
              "receipts.vendor_op": "77812"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003003",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "39999"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x0000001e",
              "receipts.opt_param_len": "9",
              "receipts.receipted_receipt_id": "273A872F"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00000427",
              "receipts.opt_param_len": "1",
              "receipts.receipt_state": "2"
            
          
        ,
        "receipts": 
          "receipts.command_length": "242",
          "receipts.command_id": "0x00000005",
          "receipts.sequence_number": "47210",
          "receipts.data_coding": "0x00000000",
          "receipts.data_coding_tree": 
            "receipts.rps": "0x00000000",
            "Receipt Type 1 Data Coding": 
              "receipts.rps.rc_coding_group": "0x00000000",
              "receipts.rps.text_compression": "0",
              "receipts.rps.class_present": "0",
              "receipts.rps.charset": "0x00000000"
            ,
            "Receipt Type 2 Data Coding": 
              "receipts.rps.rpk._coding_group": "0x00000000",
              "receipts.rps.rpk._language": "0x00000000"
            
          ,
          "receipts.rc_default_receipt_id": "0",
          "receipts.rc_length": "118",
          "receipts.receipt": "69322",
          "receipts.opt_params": 
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003002",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "83881"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003001",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "73188"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003004",
              "receipts.opt_param_len": "1",
              "receipts.vendor_op": "00"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003000",
              "receipts.opt_param_len": "4",
              "receipts.vendor_op": "78881"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003003",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "74388"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x0000001e",
              "receipts.opt_param_len": "9",
              "receipts.receipted_receipt_id": "949C60DF"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00000427",
              "receipts.opt_param_len": "1",
              "receipts.receipt_state": "2"
            
          
        ,
        "receipts": 
          "receipts.command_length": "238",
          "receipts.command_id": "0x00000005",
          "receipts.sequence_number": "47211",
          "receipts.data_coding": "0x00000000",
          "receipts.data_coding_tree": 
            "receipts.rps": "0x00000000",
            "Receipt Type 1 Data Coding": 
              "receipts.rps.rc_coding_group": "0x00000000",
              "receipts.rps.text_compression": "0",
              "receipts.rps.class_present": "0",
              "receipts.rps.charset": "0x00000000"
            ,
            "Receipt Type 2 Data Coding": 
              "receipts.rps.rpk._coding_group": "0x00000000",
              "receipts.rps.rpk._language": "0x00000000"
            
          ,
          "receipts.rc_default_receipt_id": "0",
          "receipts.rc_length": "117",
          "receipts.receipt": "12281",
          "receipts.opt_params": 
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003002",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "12727"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003001",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "18828"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003004",
              "receipts.opt_param_len": "1",
              "receipts.vendor_op": "00"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003000",
              "receipts.opt_param_len": "4",
              "receipts.vendor_op": "38218"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003003",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "47718"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x0000001e",
              "receipts.opt_param_len": "9",
              "receipts.receipted_receipt_id": "949BD094"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00000427",
              "receipts.opt_param_len": "1",
              "receipts.receipt_state": "2"
            
          
        ,
        "receipts": 
          "receipts.command_length": "25",
          "receipts.command_id": "0x80000004",
          "receipts.command_status": "0x00000000",
          "receipts.sequence_number": "35572",
          "receipts.receipt_id": "949C23B8"
        
      
    
  
]

我尝试使用此代码:

import json
import pandas as pd
from flatten_json import flatten

i_file_name = 'example.json'

with open(i_file_name) as fd:
     json_data = json.load(fd)
json_data = (flatten(d, '.') for d in json_data)

df = pd.DataFrame(json_data)

df.head()

import pandas as pd

i_file_name = 'example.json'

df = pd.read_json(i_file_name)
df = pd.json_normalize(df['_source'])

df.head()

他们给了我相同的结果:只有 1 行,而不是 6 行。我尝试将 record_pathmeta 设置为 json_normalize,但我不知道该怎么做。我对 json 解析有点陌生,在这里找不到类似的问题。我知道我需要设置正确的键,但我不知道如何

编辑:

不幸的是,*** 对问题表的支持有限,所以我将尝试解释我的预期输出。

现在我只得到这些列的一行:

_index _type _score _source.layers.frame.* _source.source.receipts.*

其中*表示同一级别下有多个列

receipts.* 仅包含 5 列:

command_length command_id command_status sequence_number receipt_id

我得到的 1 行包含来自最后“收据”级记录的这些列的值:

 "receipts": 
          "receipts.command_length": "25",
          "receipts.command_id": "0x80000004",
          "receipts.command_status": "0x00000000",
          "receipts.sequence_number": "35572",
          "receipts.receipt_id": "949C23B8"

但也有其他“收据”级别的记录,例如:

"receipts": 
          "receipts.command_length": "238",
          "receipts.command_id": "0x00000005",
          "receipts.sequence_number": "47207",
          "receipts.data_coding": "0x00000000",
          "receipts.data_coding_tree": 
            "receipts.rps": "0x00000000",
            "Receipt Type 1 Data Coding": 
              "receipts.rps.rc_coding_group": "0x00000000",
              "receipts.rps.text_compression": "0",
              "receipts.rps.class_present": "0",
              "receipts.rps.charset": "0x00000000"
            ,
            "Receipt Type 2 Data Coding": 
              "receipts.rps.rpk._coding_group": "0x00000000",
              "receipts.rps.rpk._language": "0x00000000"
            
          ,
          "receipts.rc_default_receipt_id": "0",
          "receipts.rc_length": "117",
          "receipts.receipt": "29831",
          "receipts.opt_params": 
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003002",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "47912"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003001",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "98982"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003004",
              "receipts.opt_param_len": "1",
              "receipts.vendor_op": "00"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003000",
              "receipts.opt_param_len": "4",
              "receipts.vendor_op": "23080"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00003003",
              "receipts.opt_param_len": "10",
              "receipts.vendor_op": "29849"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x0000001e",
              "receipts.opt_param_len": "9",
              "receipts.receipted_receipt_id": "949BB6DE"
            ,
            "receipts.opt_param": 
              "receipts.opt_param_tag": "0x00000427",
              "receipts.opt_param_len": "1",
              "receipts.receipt_state": "2"
            
          
        ,

我也想在 pandas 数据框中看到行。所以我得到的当前行应该是第 6 行。

我有点理解我的 json 以某种方式损坏了,因为它有 6 个同名的不同键(收据),但也许我可以以不同的方式解析它,以便我可以正确地将其导入 Pandas

【问题讨论】:

您感兴趣的密钥:receipts.opt_param?你能列出这6个键吗?您还可以添加预期输出的示例数据框 谢谢!我编辑了我的问题,以提供有关预期输出的更多信息。我的意思是“收据”级别的记录,在我的示例中有 6 个 发生了一些我不太明白的事情。检查数据,没有 pandas,只是你的常规 python,你发现其他键(收据类型 2 数据编码,...)没有显示出来。 我不认为python/pandas有什么问题,肯定是在json中,因为有非唯一键。只是让我大吃一惊。不幸的是,我仍然需要艰难地解析它 【参考方案1】:

我意识到我没有回答我的问题,但实际上设法解决了它。我为下面的代码道歉,但如果你想解决这样的问题,它可能会有所帮助。我已经决定,我宁愿向世界展示我的愚蠢代码,也不愿让它没有任何解决方案。

首先,我按照我在问题中提到的那样做了:

import pandas as pd

i_file_name = 'example.json'

df = pd.read_json(i_file_name)
df = pd.json_normalize(df['_source'])

然后我将其转换为 json 并再次将其导入 Pandas:

df_json = df.to_json(orient='records')

df = pd.read_json(df_json, orient='columns')

然后我融化了一些层:

df_melt = pd.melt(df, id_vars=['layers.frame.frame.time',
                          'layers.frame.frame.number'
                       value_vars=['layers.receipts'])

之后,我用这些融化的值创建了一个新的 DataFrame 并保存了索引,以便以后加入 2 个数据帧。

df_melt2 = pd.DataFrame(df_melt['value'].values.tolist(), index=df_melt)

然后我将 2 个数据框连接在一起并删除了不再需要的列

df_melt_full = pd.concat([df_melt, df_melt2], axis=1)
df_melt_full = df_melt_full.drop(['value', 'variable'], axis=1)

在那之后,我又把它融化了(是的,这是我 2 月份的代码,我为它感到羞耻)

df_melt_full_melt = pd.melt(df_melt_full, 
                            id_vars=['layers.frame.frame.time',
                                     'layers.frame.frame.number']
                           )

然后再次导入

df_normalized = pd.json_normalize(df_melt_full_melt['value'])

然后,最后,我将 2 个数据框连接在一起并解决了我的问题

df_final = pd.concat([df_melt, df_normalized], axis=1)

【讨论】:

以上是关于如何在几个键相等的情况下使用 Python 正确解析嵌套的 json的主要内容,如果未能解决你的问题,请参考以下文章

在几个JLabel中设置相同的gif作为图标

Python 脚本在几个小时后停止

python:将字符串解包到列表中

正确使用伏地魔作为键值对?

Django:在几个页面上“拆分”一个表单

python 全解坦克大战 辅助类 附完整代码雏形