Pandas json_normalize 返回 KeyError
Posted
技术标签:
【中文标题】Pandas json_normalize 返回 KeyError【英文标题】:Pandas json_normalize returns KeyError 【发布时间】:2021-03-26 05:54:54 【问题描述】:我有一个来自 json 文件的数据集,格式如下:
data = 'data': 'content': ['gender': 'Female',
'id': 'covid-1004200003256',
'state_code': '3272',
'district_code': '3272040',
'subdistrict_code': '3272040004',
'latitude': -6.906,
'longitude': 106.923,
'state_name': 'KOTA SUKABUMI',
'district_name': 'Gunungpuyuh',
'subdistrict_name': 'Karamat',
'stage': 'Isolated',
'status': 'SUSPECT',
'gender': 'Female',
'id': 'covid-1004200003255',
'state_code': '3272',
'district_code': '3272040',
'subdistrict_code': '3272040004',
'latitude': -6.906,
'longitude': 106.923,
'state_name': 'KOTA SUKABUMI',
'district_name': 'Gunungpuyuh',
'subdistrict_name': 'Karamat',
'stage': 'Isolated',
'status': 'SUSPECT',
]
所以我想使用json_normalize
制作一个数据框
df = pd.json_normalize(data, 'content')
df.head(10)
但它会返回:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-36-4d8ad8c8743a> in <module>()
----> 1 df = pd.json_normalize(data, 'content')
2 df.head(10)
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
334 records.extend(recs)
335
--> 336 _recursive_extract(data, record_path, , level=0)
337
338 result = DataFrame(records)
/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _recursive_extract(data, path, seen_meta, level)
307 else:
308 for obj in data:
--> 309 recs = _pull_records(obj, path[0])
310 recs = [
311 nested_to_record(r, sep=sep, max_level=max_level)
/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _pull_records(js, spec)
246 if has non iterable value.
247 """
--> 248 result = _pull_field(js, spec)
249
250 # GH 31507 GH 30145, GH 26284 if result is not list, raise TypeError if not
/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _pull_field(js, spec)
237 result = result[field]
238 else:
--> 239 result = result[spec]
240 return result
241
KeyError: 'content'
任何想法如何解决这个问题?
【问题讨论】:
【参考方案1】:您的命令失败,因为您试图传递第二级嵌套键 (content
)。您只能传递first
级别的嵌套键。
所以,你需要传递data['data']
,如下所示:
In [934]: df = pd.json_normalize(data['data'], 'content')
In [934]: df
Out[934]:
gender id state_code district_code subdistrict_code latitude longitude state_name district_name subdistrict_name stage status
0 Female covid-1004200003256 3272 3272040 3272040004 -6.906 106.923 KOTA SUKABUMI Gunungpuyuh Karamat Isolated SUSPECT
1 Female covid-1004200003255 3272 3272040 3272040004 -6.906 106.923 KOTA SUKABUMI Gunungpuyuh Karamat Isolated SUSPECT
【讨论】:
【参考方案2】:尝试直接传入记录数组:
df = pd.json_normalize(data['data']['content'])
【讨论】:
以上是关于Pandas json_normalize 返回 KeyError的主要内容,如果未能解决你的问题,请参考以下文章
如何防止 json_normalize 在 Pandas 中重复列标题?
pandas json_normalize KeyError