使用 python 访问嵌套的 JSON

Posted

技术标签:

【中文标题】使用 python 访问嵌套的 JSON【英文标题】:Accessing nested JSON using python 【发布时间】:2022-01-22 22:49:42 【问题描述】:

我很难找到一种正确的方法来从我拥有的 JSON 中打印我的结果。

我搜索了很多小时,但没有成功找到答案。

这是我拥有的 JSNO:

json = \

  "Envelope": 
    "Body": 
      "GetCTProductsResponse": 
        "GetCTProductsResult": 
          "CTPRODUCT": [
            
              "CODE": "TESLAAIR3",
              "PRODUCTGROUPCODE": "AIRPURIF",
              "NAME": "Tesla Air purifier AIR 3",
              "MANUFACTURER": "Tesla",
              "MANUFACTURERCODE": "TESLA",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "69,9000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "24M",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": ""
            ,
            
              "CODE": "SKV4140GL",
              "PRODUCTGROUPCODE": "AIRPURIF",
              "NAME": "Xiaomi MI SMART Antibacterial humidifier",
              "MANUFACTURER": "Xiaomi",
              "MANUFACTURERCODE": "XIAOMI",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "39,0000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "2G",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093216482.jpg"
            ,
            
              "CODE": "SKV4140GL",
              "PRODUCTGROUPCODE": "AIRPURIF",
              "NAME": "Xiaomi MI SMART Antibacterial humidifier",
              "MANUFACTURER": "Xiaomi",
              "MANUFACTURERCODE": "XIAOMI",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "39,0000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "2G",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093216098.jpg"
            ,
            
              "CODE": "SKV4140GL",
              "PRODUCTGROUPCODE": "AIRPURIF",
              "NAME": "Xiaomi MI SMART Antibacterial humidifier",
              "MANUFACTURER": "Xiaomi",
              "MANUFACTURERCODE": "XIAOMI",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "39,0000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "2G",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093215238.jpg"
            ,
            
              "CODE": "BHR4802GL",
              "PRODUCTGROUPCODE": "ZVUCNICI",
              "NAME": "Xiaomi Mi Portable Bluetooth Speaker Grey",
              "MANUFACTURER": "Xiaomi",
              "MANUFACTURERCODE": "XIAOMI",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "17,0000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "2G",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/BHR4802GL_1.jpg"
            
          ]
        
      
    
  

使用 Python,我想访问 CODE 变量 ("CODE": "TESLAAIR3")

我尝试了很多东西,转储、加载、加载等,但没有解决我的问题。

谢谢。

【问题讨论】:

这是存储在.json 文件中吗? 你展示的不是 JSON;它是一个 Python dict 文字,表示 json.load 可能返回的 dict 给定一个 string,如 '"Envelope": "Body": ...' 不,不是,它是从 API 调用的。这里我只展示了几个产品,但是 API 返回了大约 4000 个产品。 (50000行代码) 【参考方案1】:

你可以试试:

a["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][0]["CODE"]

【讨论】:

我收到以下错误:print(final_json["Envelope"]["Body"]["GetCTProductsResponse"] KeyError: 'Envelope' 检查final_json 正在打印什么。试试看:print(final_json) 它像我发送的那样打印整个 JSON。 您发送的是 Python dict确实Envelope 作为键,因此您在问题中输入的内容并不是您实际使用的内容。 @BlerdiKoliq 检查type(final_json)。它给了什么?【参考方案2】:

这将打印出所有代码:

for code in test["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"]:
    print(code["CODE"])

【讨论】:

没有id是不行的。我收到以下错误: print(final_json["Envelope"]["Body"]["GetCTProductsResponse"] KeyError: 'Envelope' 奇怪,我复制粘贴了你的字典,它对我有用。【参考方案3】:

另一种方法如下:

data = 
  "Envelope": 
    "Body": 
      "GetCTProductsResponse": 
        "GetCTProductsResult": 
          "CTPRODUCT": [
            
              "CODE": "TESLAAIR3",
              "PRODUCTGROUPCODE": "AIRPURIF",
              "NAME": "Tesla Air purifier AIR 3",
              "MANUFACTURER": "Tesla",
              "MANUFACTURERCODE": "TESLA",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "69,9000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "24M",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": ""
            ,
            
              "CODE": "SKV4140GL",
              "PRODUCTGROUPCODE": "AIRPURIF",
              "NAME": "Xiaomi MI SMART Antibacterial humidifier",
              "MANUFACTURER": "Xiaomi",
              "MANUFACTURERCODE": "XIAOMI",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "39,0000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "2G",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093216482.jpg"
            ,
            
              "CODE": "SKV4140GL",
              "PRODUCTGROUPCODE": "AIRPURIF",
              "NAME": "Xiaomi MI SMART Antibacterial humidifier",
              "MANUFACTURER": "Xiaomi",
              "MANUFACTURERCODE": "XIAOMI",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "39,0000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "2G",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093216098.jpg"
            ,
            
              "CODE": "SKV4140GL",
              "PRODUCTGROUPCODE": "AIRPURIF",
              "NAME": "Xiaomi MI SMART Antibacterial humidifier",
              "MANUFACTURER": "Xiaomi",
              "MANUFACTURERCODE": "XIAOMI",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "39,0000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "2G",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093215238.jpg"
            ,
            
              "CODE": "BHR4802GL",
              "PRODUCTGROUPCODE": "ZVUCNICI",
              "NAME": "Xiaomi Mi Portable Bluetooth Speaker Grey",
              "MANUFACTURER": "Xiaomi",
              "MANUFACTURERCODE": "XIAOMI",
              "QTTYINSTOCK": ">20",
              "TAX": 21,
              "PRICE": "17,0000",
              "RETAILPRICE": 0,
              "SHORT_DESCRIPTION": "",
              "WARRANTY": "2G",
              "EUR_ExchangeRate": "0,00",
              "BARCODE": "",
              "IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/BHR4802GL_1.jpg"
            
          ]
        
      
    
  

import pandas as pd
import json
json_object = json.dumps(data)
results = pd.json_normalize(data)

现在,定义以下函数:

def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'col.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            #print(f"exploding: col")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

和 这样做:

results = pd.json_normalize(data)
flatten_nested_json_df(results)

它返回一个 df,你可以从中选择任何你想要的东西:

 index  \
0      0   
0      0   
0      0   
0      0   
0      0   

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.CODE  \
0                                          TESLAAIR3                       
0                                          SKV4140GL                       
0                                          SKV4140GL                       
0                                          SKV4140GL                       
0                                          BHR4802GL                       

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.PRODUCTGROUPCODE  \
0                                           AIRPURIF                                   
0                                           AIRPURIF                                   
0                                           AIRPURIF                                   
0                                           AIRPURIF                                   
0                                           ZVUCNICI                                   

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.NAME  \
0                           Tesla Air purifier AIR 3                       
0           Xiaomi MI SMART Antibacterial humidifier                       
0           Xiaomi MI SMART Antibacterial humidifier                       
0           Xiaomi MI SMART Antibacterial humidifier                       
0          Xiaomi Mi Portable Bluetooth Speaker Grey                       

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.MANUFACTURER  \
0                                              Tesla                               
0                                             Xiaomi                               
0                                             Xiaomi                               
0                                             Xiaomi                               
0                                             Xiaomi                               

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.MANUFACTURERCODE  \
0                                              TESLA                                   
0                                             XIAOMI                                   
0                                             XIAOMI                                   
0                                             XIAOMI                                   
0                                             XIAOMI                                   

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.QTTYINSTOCK  \
0                                             >20                              
0                                             >20                              
0                                             >20                              
0                                             >20                              
0                                             >20                              

   Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.TAX  \
0                                                 21                       
0                                                 21                       
0                                                 21                       
0                                                 21                       
0                                                 21                       

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.PRICE  \
0                                            69,9000                        
0                                            39,0000                        
0                                            39,0000                        
0                                            39,0000                        
0                                            17,0000                        

   Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.RETAILPRICE  \
0                                                  0                               
0                                                  0                               
0                                                  0                               
0                                                  0                               
0                                                  0                               

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.SHORT_DESCRIPTION  \
0                                                                                       
0                                                                                       
0                                                                                       
0                                                                                       
0                                                                                       

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.WARRANTY  \
0                                                24M                           
0                                                 2G                           
0                                                 2G                           
0                                                 2G                           
0                                                 2G                           

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.EUR_ExchangeRate  \
0                                               0,00                                   
0                                               0,00                                   
0                                               0,00                                   
0                                               0,00                                   
0                                               0,00                                   

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.BARCODE  \
0                                                                             
0                                                                             
0                                                                             
0                                                                             
0                                                                             

  Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.IMAGE_URL  
0                                                                              
0  http://www.ct4partners.ba/UploadDownload/Produ...                           
0  http://www.ct4partners.ba/UploadDownload/Produ...                           
0  http://www.ct4partners.ba/UploadDownload/Produ...                           
0  http://www.ct4partners.ba/UploadDownload/Produ...         

它还具有在列名中向您显示所需内容的路径的优点,因此:

 data['Envelope']['Body']['GetCTProductsResponse']['GetCTProductsResult']['CTPRODUCT'][0]['CODE']             

【讨论】:

【参考方案4】:

我建议考虑对象的结构。这是有点过度确定的代码,但它会帮助您更快地识别任何错误:

import json
json_string =  # Your whole json file from the question
ctproducts = None
try:
    ct_products_object = json.loads(json_string)
    ct_envelope = ct_products_object.get("Envelope") # See below
    ct_body = ct_envelope.get("Body")
    get_ct_products = ct_body.get("GetCTProductsResponse")
    get_ct_result = get_ct_products.get("GetCTProductsResult")
    ctproducts = get_ct_result.get("CTPRODUCT")
except json.JSONDecodeError as e:
    print(f"This is not a valid JSON file because str(e)")
    # Not a json file, handle that
except AttributeError as e:
    print(str(e))
    # You got one of the keys wrong, handle that.
if ctproducts:
    for ctproduct in ctproducts:
        code = ctproduct.get("CODE")
        print(f"CODE: code")

看起来您实际上并未在代码中获取 JSON 文件的根,因此您可能希望跳过 ct_envelope 并使用 ct_body = ct_products_object.get("Body")。如果这会引发 AttributeError,请尝试 get_ct_products = ct_products_object.get("GetCTProductsResponse") 等,直到找到对象的哪一部分是当前对象的根。

【讨论】:

【参考方案5】:

您可以编写一个函数find_json,它为您提供第一次出现键“CODE”及其值的路径:

def find_json(key, json, acc=[]):
    if type(json) == list:
        for i, x in enumerate(json):
            if type(x) == list or type(x) == dict:
                return find_json(key, x, acc + [i])
    elif type(json) == dict:
        for k, v in json.items():
            if k == key:
                return acc + [k], v
            elif type(v) == list or type(v) == dict:
                return find_json(key, json[k], acc + [k])

你运行的:

find_json('CODE', json)
# returning the path to the first 'CODE' in `json` and the value:
(['Envelope',
  'Body',
  'GetCTProductsResponse',
  'GetCTProductsResult',
  'CTPRODUCT',
  0,
  'CODE'],
 'TESLAAIR3')

我们可以通过以下方式生成调用第一个CODE 的代码来改进输出:

from functools import reduce

def path_to_code(lst, dict_name="json"): 
    return reduce(lambda x,y: f"x[\"y\"]" if type(y) == str else f"x[y]", lst, dict_name)

def find_json(key, json, acc=[]):
    if type(json) == list:
        for i, x in enumerate(json):
            if type(x) == list or type(x) == dict:
                return find_json(key, x, acc + [i])
    elif type(json) == dict:
        for k, v in json.items():
            if k == key:
                return path_to_code(acc + [k], "json"), v
            elif type(v) == list or type(v) == dict:
                return find_json(key, json[k], acc + [k])
        
find_json('CODE', json)

# which outputs:
('json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][0]["CODE"]',
 'TESLAAIR3')

最后,在这个 json 中找到“CODE”键的所有路径:

from functools import reduce

def path_to_code(lst, dict_name="json"): 
    return reduce(lambda x,y: f"x[\"y\"]" if type(y) == str else f"x[y]", lst, dict_name)

def find_all_json(key, json):
    res = []
    def _find_all_json(key, json, acc=[]):
        nonlocal res
        if type(json) == list:
            for i, x in enumerate(json):
                if type(x) == list or type(x) == dict:
                    _find_all_json(key, x, acc + [i])
        elif type(json) == dict:
            for k, v in json.items():
                if k == key:
                    res = res + ["cmd": path_to_code(acc + [k], "json"), "val": v]
                if type(v) == list or type(v) == dict:
                    _find_all_json(key, v, acc + [k])
    _find_all_json(key, json)
    return res
        
find_all_json('CODE', json)
# ['cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][0]["CODE"]',
#   'val': 'TESLAAIR3',
#  'cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][1]["CODE"]',
#   'val': 'SKV4140GL',
#  'cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][2]["CODE"]',
#   'val': 'SKV4140GL',
#  'cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][3]["CODE"]',
#   'val': 'SKV4140GL',
#  'cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][4]["CODE"]',
#   'val': 'BHR4802GL']

【讨论】:

以上是关于使用 python 访问嵌套的 JSON的主要内容,如果未能解决你的问题,请参考以下文章

python 和 ctypes 访问具有嵌套结构的 c++ 类

ActiveCollab REST API 使用 Python 接收的嵌套 JSON 中的访问值 [已解决]

在没有 NoneType 错误的情况下访问嵌套字典的 pythonic 方法是啥

访问嵌套字典,它是密钥Python的值

无法在 python 3.8 中访问嵌套的 JSON

如何访问 Python DataFrame 中的嵌套 JSON 对象 [重复]