Pandas 未正确从 JSON API 获取数据
Posted
技术标签:
【中文标题】Pandas 未正确从 JSON API 获取数据【英文标题】:Pandas not getting data from JSON API properly 【发布时间】:2020-09-23 18:46:40 【问题描述】:我正在尝试将数据从 JSON API 获取到 Pandas Dataframe。但是,Pandas 没有正确读取数据。以下是我的代码和输出:
import pandas as pd
import requests
r = requests.get('https://api.covid19india.org/raw_data5.json')
j = r.json()
df = pd.DataFrame.from_dict(j)
但是,我得到的输出不正确
raw_data
0 'agebracket': '', 'contractedfromwhichpatient...
1 'agebracket': '', 'contractedfromwhichpatient...
2 'agebracket': '', 'contractedfromwhichpatient...
3 'agebracket': '', 'contractedfromwhichpatient...
4 'agebracket': '', 'contractedfromwhichpatient...
当我运行df.info()
时,我得到:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20409 entries, 0 to 20408
Data columns (total 1 columns):
raw_data 20409 non-null object
dtypes: object(1)
memory usage: 159.5+ KB
谁能帮我解决这个问题?
【问题讨论】:
使用j = r.json()['raw_data']
【参考方案1】:
请尝试:
df = df['raw_data'].apply(pd.Series)
df.info()
输出
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20409 entries, 0 to 20408
Data columns (total 20 columns):
agebracket 20409 non-null object
contractedfromwhichpatientsuspected 20409 non-null object
currentstatus 20409 non-null object
dateannounced 20409 non-null object
detectedcity 20409 non-null object
detecteddistrict 20409 non-null object
detectedstate 20409 non-null object
entryid 20409 non-null object
gender 20409 non-null object
nationality 20409 non-null object
notes 20409 non-null object
numcases 20409 non-null object
patientnumber 20409 non-null object
source1 20409 non-null object
source2 20409 non-null object
source3 20409 non-null object
statecode 20409 non-null object
statepatientnumber 20409 non-null object
statuschangedate 20409 non-null object
typeoftransmission 20409 non-null object
dtypes: object(20)
memory usage: 3.1+ MB
【讨论】:
【参考方案2】:使用 j = r.json()['raw_data']
从 json 中选择 raw_data 键。
df.info()
输出:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20409 entries, 0 to 20408
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 agebracket 20409 non-null object
1 contractedfromwhichpatientsuspected 20409 non-null object
2 currentstatus 20409 non-null object
3 dateannounced 20409 non-null object
4 detectedcity 20409 non-null object
5 detecteddistrict 20409 non-null object
6 detectedstate 20409 non-null object
7 entryid 20409 non-null object
8 gender 20409 non-null object
9 nationality 20409 non-null object
10 notes 20409 non-null object
11 numcases 20409 non-null object
12 patientnumber 20409 non-null object
13 source1 20409 non-null object
14 source2 20409 non-null object
15 source3 20409 non-null object
16 statecode 20409 non-null object
17 statepatientnumber 20409 non-null object
18 statuschangedate 20409 non-null object
19 typeoftransmission 20409 non-null object
dtypes: object(20)
memory usage: 3.1+ MB
【讨论】:
以上是关于Pandas 未正确从 JSON API 获取数据的主要内容,如果未能解决你的问题,请参考以下文章
未从提取的 API 数据中获取正确的 JSON 格式 [关闭]