Pandas json_normalize 返回 KeyError

Posted

技术标签:

【中文标题】Pandas json_normalize 返回 KeyError【英文标题】:Pandas json_normalize returns KeyError 【发布时间】:2021-03-26 05:54:54 【问题描述】:

我有一个来自 json 文件的数据集,格式如下:

data = 'data': 'content': ['gender': 'Female',
    'id': 'covid-1004200003256',
    'state_code': '3272',
    'district_code': '3272040',
    'subdistrict_code': '3272040004',
    'latitude': -6.906,
    'longitude': 106.923,
    'state_name': 'KOTA SUKABUMI',
    'district_name': 'Gunungpuyuh',
    'subdistrict_name': 'Karamat',
    'stage': 'Isolated',
    'status': 'SUSPECT',
   'gender': 'Female',
    'id': 'covid-1004200003255',
    'state_code': '3272',
    'district_code': '3272040',
    'subdistrict_code': '3272040004',
    'latitude': -6.906,
    'longitude': 106.923,
    'state_name': 'KOTA SUKABUMI',
    'district_name': 'Gunungpuyuh',
    'subdistrict_name': 'Karamat',
    'stage': 'Isolated',
    'status': 'SUSPECT',
    ]

所以我想使用json_normalize制作一个数据框

df = pd.json_normalize(data, 'content')
df.head(10)

但它会返回:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-36-4d8ad8c8743a> in <module>()
----> 1 df = pd.json_normalize(data, 'content')
      2 df.head(10)

3 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
    334                 records.extend(recs)
    335 
--> 336     _recursive_extract(data, record_path, , level=0)
    337 
    338     result = DataFrame(records)

/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _recursive_extract(data, path, seen_meta, level)
    307         else:
    308             for obj in data:
--> 309                 recs = _pull_records(obj, path[0])
    310                 recs = [
    311                     nested_to_record(r, sep=sep, max_level=max_level)

/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _pull_records(js, spec)
    246         if has non iterable value.
    247         """
--> 248         result = _pull_field(js, spec)
    249 
    250         # GH 31507 GH 30145, GH 26284 if result is not list, raise TypeError if not

/usr/local/lib/python3.6/dist-packages/pandas/io/json/_normalize.py in _pull_field(js, spec)
    237                 result = result[field]
    238         else:
--> 239             result = result[spec]
    240         return result
    241 

KeyError: 'content'

任何想法如何解决这个问题?

【问题讨论】:

【参考方案1】:

您的命令失败,因为您试图传递第二级嵌套键 (content)。您只能传递first 级别的嵌套键。

所以,你需要传递data['data'],如下所示:

In [934]: df = pd.json_normalize(data['data'], 'content')

In [934]: df
Out[934]: 
   gender                   id state_code district_code subdistrict_code  latitude  longitude     state_name district_name subdistrict_name     stage   status
0  Female  covid-1004200003256       3272       3272040       3272040004    -6.906    106.923  KOTA SUKABUMI   Gunungpuyuh          Karamat  Isolated  SUSPECT
1  Female  covid-1004200003255       3272       3272040       3272040004    -6.906    106.923  KOTA SUKABUMI   Gunungpuyuh          Karamat  Isolated  SUSPECT

【讨论】:

【参考方案2】:

尝试直接传入记录数组:

df = pd.json_normalize(data['data']['content'])

【讨论】:

以上是关于Pandas json_normalize 返回 KeyError的主要内容,如果未能解决你的问题,请参考以下文章

Pandas json_normalize 的逆

pandas json_normalize 展平嵌套字典

如何防止 json_normalize 在 Pandas 中重复列标题?

pandas json_normalize KeyError

Pandas json_normalize 会产生令人困惑的“KeyError”消息?

Pandas json_normalize 不会展平所有嵌套字段