使用 python 访问嵌套的 JSON
Posted
技术标签:
【中文标题】使用 python 访问嵌套的 JSON【英文标题】:Accessing nested JSON using python 【发布时间】:2022-01-22 22:49:42 【问题描述】:我很难找到一种正确的方法来从我拥有的 JSON 中打印我的结果。
我搜索了很多小时,但没有成功找到答案。
这是我拥有的 JSNO:
json = \
"Envelope":
"Body":
"GetCTProductsResponse":
"GetCTProductsResult":
"CTPRODUCT": [
"CODE": "TESLAAIR3",
"PRODUCTGROUPCODE": "AIRPURIF",
"NAME": "Tesla Air purifier AIR 3",
"MANUFACTURER": "Tesla",
"MANUFACTURERCODE": "TESLA",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "69,9000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "24M",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": ""
,
"CODE": "SKV4140GL",
"PRODUCTGROUPCODE": "AIRPURIF",
"NAME": "Xiaomi MI SMART Antibacterial humidifier",
"MANUFACTURER": "Xiaomi",
"MANUFACTURERCODE": "XIAOMI",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "39,0000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "2G",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093216482.jpg"
,
"CODE": "SKV4140GL",
"PRODUCTGROUPCODE": "AIRPURIF",
"NAME": "Xiaomi MI SMART Antibacterial humidifier",
"MANUFACTURER": "Xiaomi",
"MANUFACTURERCODE": "XIAOMI",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "39,0000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "2G",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093216098.jpg"
,
"CODE": "SKV4140GL",
"PRODUCTGROUPCODE": "AIRPURIF",
"NAME": "Xiaomi MI SMART Antibacterial humidifier",
"MANUFACTURER": "Xiaomi",
"MANUFACTURERCODE": "XIAOMI",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "39,0000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "2G",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093215238.jpg"
,
"CODE": "BHR4802GL",
"PRODUCTGROUPCODE": "ZVUCNICI",
"NAME": "Xiaomi Mi Portable Bluetooth Speaker Grey",
"MANUFACTURER": "Xiaomi",
"MANUFACTURERCODE": "XIAOMI",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "17,0000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "2G",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/BHR4802GL_1.jpg"
]
使用 Python,我想访问 CODE 变量 ("CODE": "TESLAAIR3"
)
我尝试了很多东西,转储、加载、加载等,但没有解决我的问题。
谢谢。
【问题讨论】:
这是存储在.json
文件中吗?
你展示的不是 JSON;它是一个 Python dict
文字,表示 json.load
可能返回的 dict
给定一个 string,如 '"Envelope": "Body": ...'
。
不,不是,它是从 API 调用的。这里我只展示了几个产品,但是 API 返回了大约 4000 个产品。 (50000行代码)
【参考方案1】:
你可以试试:
a["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][0]["CODE"]
【讨论】:
我收到以下错误:print(final_json["Envelope"]["Body"]["GetCTProductsResponse"] KeyError: 'Envelope' 检查final_json
正在打印什么。试试看:print(final_json)
它像我发送的那样打印整个 JSON。
您发送的是 Python dict
,确实 有 Envelope
作为键,因此您在问题中输入的内容并不是您实际使用的内容。
@BlerdiKoliq 检查type(final_json)
。它给了什么?【参考方案2】:
这将打印出所有代码:
for code in test["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"]:
print(code["CODE"])
【讨论】:
没有id是不行的。我收到以下错误: print(final_json["Envelope"]["Body"]["GetCTProductsResponse"] KeyError: 'Envelope' 奇怪,我复制粘贴了你的字典,它对我有用。【参考方案3】:另一种方法如下:
data =
"Envelope":
"Body":
"GetCTProductsResponse":
"GetCTProductsResult":
"CTPRODUCT": [
"CODE": "TESLAAIR3",
"PRODUCTGROUPCODE": "AIRPURIF",
"NAME": "Tesla Air purifier AIR 3",
"MANUFACTURER": "Tesla",
"MANUFACTURERCODE": "TESLA",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "69,9000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "24M",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": ""
,
"CODE": "SKV4140GL",
"PRODUCTGROUPCODE": "AIRPURIF",
"NAME": "Xiaomi MI SMART Antibacterial humidifier",
"MANUFACTURER": "Xiaomi",
"MANUFACTURERCODE": "XIAOMI",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "39,0000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "2G",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093216482.jpg"
,
"CODE": "SKV4140GL",
"PRODUCTGROUPCODE": "AIRPURIF",
"NAME": "Xiaomi MI SMART Antibacterial humidifier",
"MANUFACTURER": "Xiaomi",
"MANUFACTURERCODE": "XIAOMI",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "39,0000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "2G",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093216098.jpg"
,
"CODE": "SKV4140GL",
"PRODUCTGROUPCODE": "AIRPURIF",
"NAME": "Xiaomi MI SMART Antibacterial humidifier",
"MANUFACTURER": "Xiaomi",
"MANUFACTURERCODE": "XIAOMI",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "39,0000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "2G",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/SKV4140GL_201117093215238.jpg"
,
"CODE": "BHR4802GL",
"PRODUCTGROUPCODE": "ZVUCNICI",
"NAME": "Xiaomi Mi Portable Bluetooth Speaker Grey",
"MANUFACTURER": "Xiaomi",
"MANUFACTURERCODE": "XIAOMI",
"QTTYINSTOCK": ">20",
"TAX": 21,
"PRICE": "17,0000",
"RETAILPRICE": 0,
"SHORT_DESCRIPTION": "",
"WARRANTY": "2G",
"EUR_ExchangeRate": "0,00",
"BARCODE": "",
"IMAGE_URL": "http://www.ct4partners.ba/UploadDownload/ProductImages/BHR4802GL_1.jpg"
]
和
import pandas as pd
import json
json_object = json.dumps(data)
results = pd.json_normalize(data)
现在,定义以下函数:
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'col.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
#print(f"exploding: col")
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
和 这样做:
results = pd.json_normalize(data)
flatten_nested_json_df(results)
它返回一个 df,你可以从中选择任何你想要的东西:
index \
0 0
0 0
0 0
0 0
0 0
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.CODE \
0 TESLAAIR3
0 SKV4140GL
0 SKV4140GL
0 SKV4140GL
0 BHR4802GL
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.PRODUCTGROUPCODE \
0 AIRPURIF
0 AIRPURIF
0 AIRPURIF
0 AIRPURIF
0 ZVUCNICI
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.NAME \
0 Tesla Air purifier AIR 3
0 Xiaomi MI SMART Antibacterial humidifier
0 Xiaomi MI SMART Antibacterial humidifier
0 Xiaomi MI SMART Antibacterial humidifier
0 Xiaomi Mi Portable Bluetooth Speaker Grey
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.MANUFACTURER \
0 Tesla
0 Xiaomi
0 Xiaomi
0 Xiaomi
0 Xiaomi
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.MANUFACTURERCODE \
0 TESLA
0 XIAOMI
0 XIAOMI
0 XIAOMI
0 XIAOMI
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.QTTYINSTOCK \
0 >20
0 >20
0 >20
0 >20
0 >20
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.TAX \
0 21
0 21
0 21
0 21
0 21
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.PRICE \
0 69,9000
0 39,0000
0 39,0000
0 39,0000
0 17,0000
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.RETAILPRICE \
0 0
0 0
0 0
0 0
0 0
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.SHORT_DESCRIPTION \
0
0
0
0
0
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.WARRANTY \
0 24M
0 2G
0 2G
0 2G
0 2G
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.EUR_ExchangeRate \
0 0,00
0 0,00
0 0,00
0 0,00
0 0,00
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.BARCODE \
0
0
0
0
0
Envelope.Body.GetCTProductsResponse.GetCTProductsResult.CTPRODUCT.IMAGE_URL
0
0 http://www.ct4partners.ba/UploadDownload/Produ...
0 http://www.ct4partners.ba/UploadDownload/Produ...
0 http://www.ct4partners.ba/UploadDownload/Produ...
0 http://www.ct4partners.ba/UploadDownload/Produ...
它还具有在列名中向您显示所需内容的路径的优点,因此:
data['Envelope']['Body']['GetCTProductsResponse']['GetCTProductsResult']['CTPRODUCT'][0]['CODE']
【讨论】:
【参考方案4】:我建议考虑对象的结构。这是有点过度确定的代码,但它会帮助您更快地识别任何错误:
import json
json_string = # Your whole json file from the question
ctproducts = None
try:
ct_products_object = json.loads(json_string)
ct_envelope = ct_products_object.get("Envelope") # See below
ct_body = ct_envelope.get("Body")
get_ct_products = ct_body.get("GetCTProductsResponse")
get_ct_result = get_ct_products.get("GetCTProductsResult")
ctproducts = get_ct_result.get("CTPRODUCT")
except json.JSONDecodeError as e:
print(f"This is not a valid JSON file because str(e)")
# Not a json file, handle that
except AttributeError as e:
print(str(e))
# You got one of the keys wrong, handle that.
if ctproducts:
for ctproduct in ctproducts:
code = ctproduct.get("CODE")
print(f"CODE: code")
看起来您实际上并未在代码中获取 JSON 文件的根,因此您可能希望跳过 ct_envelope 并使用 ct_body = ct_products_object.get("Body")
。如果这会引发 AttributeError,请尝试 get_ct_products = ct_products_object.get("GetCTProductsResponse")
等,直到找到对象的哪一部分是当前对象的根。
【讨论】:
【参考方案5】:您可以编写一个函数find_json
,它为您提供第一次出现键“CODE”及其值的路径:
def find_json(key, json, acc=[]):
if type(json) == list:
for i, x in enumerate(json):
if type(x) == list or type(x) == dict:
return find_json(key, x, acc + [i])
elif type(json) == dict:
for k, v in json.items():
if k == key:
return acc + [k], v
elif type(v) == list or type(v) == dict:
return find_json(key, json[k], acc + [k])
你运行的:
find_json('CODE', json)
# returning the path to the first 'CODE' in `json` and the value:
(['Envelope',
'Body',
'GetCTProductsResponse',
'GetCTProductsResult',
'CTPRODUCT',
0,
'CODE'],
'TESLAAIR3')
我们可以通过以下方式生成调用第一个CODE
的代码来改进输出:
from functools import reduce
def path_to_code(lst, dict_name="json"):
return reduce(lambda x,y: f"x[\"y\"]" if type(y) == str else f"x[y]", lst, dict_name)
def find_json(key, json, acc=[]):
if type(json) == list:
for i, x in enumerate(json):
if type(x) == list or type(x) == dict:
return find_json(key, x, acc + [i])
elif type(json) == dict:
for k, v in json.items():
if k == key:
return path_to_code(acc + [k], "json"), v
elif type(v) == list or type(v) == dict:
return find_json(key, json[k], acc + [k])
find_json('CODE', json)
# which outputs:
('json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][0]["CODE"]',
'TESLAAIR3')
最后,在这个 json 中找到“CODE”键的所有路径:
from functools import reduce
def path_to_code(lst, dict_name="json"):
return reduce(lambda x,y: f"x[\"y\"]" if type(y) == str else f"x[y]", lst, dict_name)
def find_all_json(key, json):
res = []
def _find_all_json(key, json, acc=[]):
nonlocal res
if type(json) == list:
for i, x in enumerate(json):
if type(x) == list or type(x) == dict:
_find_all_json(key, x, acc + [i])
elif type(json) == dict:
for k, v in json.items():
if k == key:
res = res + ["cmd": path_to_code(acc + [k], "json"), "val": v]
if type(v) == list or type(v) == dict:
_find_all_json(key, v, acc + [k])
_find_all_json(key, json)
return res
find_all_json('CODE', json)
# ['cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][0]["CODE"]',
# 'val': 'TESLAAIR3',
# 'cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][1]["CODE"]',
# 'val': 'SKV4140GL',
# 'cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][2]["CODE"]',
# 'val': 'SKV4140GL',
# 'cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][3]["CODE"]',
# 'val': 'SKV4140GL',
# 'cmd': 'json["Envelope"]["Body"]["GetCTProductsResponse"]["GetCTProductsResult"]["CTPRODUCT"][4]["CODE"]',
# 'val': 'BHR4802GL']
【讨论】:
以上是关于使用 python 访问嵌套的 JSON的主要内容,如果未能解决你的问题,请参考以下文章
python 和 ctypes 访问具有嵌套结构的 c++ 类
ActiveCollab REST API 使用 Python 接收的嵌套 JSON 中的访问值 [已解决]