使用 pandas json_normalize 扁平化包含多个嵌套列表的字典列表
Posted
技术标签:
【中文标题】使用 pandas json_normalize 扁平化包含多个嵌套列表的字典列表【英文标题】:Flattening List of Dict containing multiple nested lists using pandas json_normalize 【发布时间】:2022-01-17 11:43:33 【问题描述】:我们有以下列表 -
users_info = [
"Id": "21",
"Name": "ABC",
"Country":
"Name": "Country 1"
,
"Addresses":
"records": [
"addressId": "12",
"line1": "xyz, 102",
"city": "PQR"
,
"addressId": "13",
"line1": "YTR, 102",
"city": "NMS"
]
,
"Education":
"records": [
"Id": "45",
"Degree": "Bachelors"
,
"Id": "49",
"Degree": "Masters"
]
,
"Id": "26",
"Name": "PEW",
"Country":
"Name": "Country 2"
,
"Addresses":
"records": [
"addressId": "10",
"line1": "BTR, 12",
"city": "UYT"
,
"addressId": "123",
"line1": "MEQW, 6",
"city": "KJH"
]
,
"Education":
"records": [
"Id": "45",
"Degree": "Bachelors"
,
"Id": "49",
"Degree": "Masters"
]
,
"Id": "214",
"Name": "TUF",
"Country": None,
"Addresses":
"records": None
,
"Education":
"records": [
"Id": "45",
"Degree": "Bachelors"
,
"Id": "49",
"Degree": "Masters"
]
,
"Id": "2609",
"Name": "JJU",
"Country":
"Name": "Country 2"
,
"Addresses":
"records": [
"addressId": "10",
"line1": "BTR, 12",
"city": "UYT"
,
"addressId": "123",
"line1": "MEQW, 6",
"city": "KJH"
]
,
"Education": None
]
我想展平这个字典列表并创建 pandas 数据框 -
Id | Name | Country.Name | Addresses.addressId | Addresses.line1 | Addresses.city | Education.Id | Education.Degree
有两个字典列表 - 地址和教育。这些(国家、地址和教育)中的任何一个都可能无
如何使用上述数据创建数据框以获得所需的格式?
这是我尝试过的 -
dataset_1 = pandas.json_normalize([user for user in users_info],
record_path = ["Addresses"],
meta = ["Id", "Name", ["Country", "Name"]],
errors="ignore"
)
dataset_2 = pandas.json_normalize([user for user in users_info],
record_path = ["Education"],
meta = ["Id", "Name", ["Country", "Name"]],
errors="ignore"
)
dataset = pandas.concat([dataset_1, dataset_2], axis=1)
当 dataset_1 执行时,我得到 -
NoneType object is not iterable error
【问题讨论】:
There is possibilty that any of these (Country, Addresses and Education) can be None
- 您可以将其添加到示例数据中吗?样本中没有None
。
为无添加示例数据
【参考方案1】:
你可以使用:
df = pd.json_normalize(users_info)
df_addresses = df['Addresses.records'].explode().apply(pd.Series)
df_addresses.rename(columns=col:f'Addresses.col' for col in df_addresses.columns, inplace=True)
df_education = df['Education.records'].explode().apply(pd.Series)
df_education.rename(columns=col:f'Education.col' for col in df_education.columns, inplace=True)
cols = [col for col in df.columns if col not in ['Addresses.records', 'Education.records']]
df = df[cols].join(df_addresses).join(df_education)
df.dropna(axis=1, how='all', inplace=True)
print(df)
OUTPUT
Id Name Country.Name Addresses.addressId Addresses.line1 Addresses.city Education.Degree Education.Id
0 21 ABC Country 1 12 xyz, 102 PQR Bachelors 45
0 21 ABC Country 1 12 xyz, 102 PQR Masters 49
0 21 ABC Country 1 13 YTR, 102 NMS Bachelors 45
0 21 ABC Country 1 13 YTR, 102 NMS Masters 49
1 26 PEW Country 2 10 BTR, 12 UYT Bachelors 45
1 26 PEW Country 2 10 BTR, 12 UYT Masters 49
1 26 PEW Country 2 123 MEQW, 6 KJH Bachelors 45
1 26 PEW Country 2 123 MEQW, 6 KJH Masters 49
2 214 TUF NaN NaN NaN NaN Bachelors 45
2 214 TUF NaN NaN NaN NaN Masters 49
3 2609 JJU Country 2 10 BTR, 12 UYT NaN NaN
3 2609 JJU Country 2 123 MEQW, 6 KJH NaN NaN
【讨论】:
以上是关于使用 pandas json_normalize 扁平化包含多个嵌套列表的字典列表的主要内容,如果未能解决你的问题,请参考以下文章
Pandas json_normalize 会产生令人困惑的“KeyError”消息?
使用 pandas json_normalize 扁平化包含多个嵌套列表的字典列表
在带有数组的嵌套 Json 上使用 Pandas json_normalize