如何将此 JSON 数据读入 Python
Posted
技术标签:
【中文标题】如何将此 JSON 数据读入 Python【英文标题】:How to read this JSON data into Python 【发布时间】:2018-09-20 00:13:18 【问题描述】:我以 JSON 格式调用了世界卫生组织的一些数据,我想将其读入 Pandas DataFrame。
我从这个页面调用了这个: WHO Measles First Dose Rate
u'dimension': [u'display': u'Indicator', u'label': u'GHO',
u'display': u'PUBLISH STATES', u'label': u'PUBLISHSTATE',
u'display': u'Year', u'label': u'YEAR',
u'display': u'WHO region', u'label': u'REGION',
u'display': u'World Bank income group', u'label': u'WORLDBANKINCOMEGROUP',
u'display': u'Country', u'label': u'COUNTRY'],
u'fact': [u'Value': u'25',
u'dim': u'COUNTRY': u'Afghanistan',
u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
u'PUBLISHSTATE': u'Published',
u'REGION': u'Eastern Mediterranean',
u'WORLDBANKINCOMEGROUP': u'Low-income',
u'YEAR': u'1993',
u'Value': u'57',
u'dim': u'COUNTRY': u'Afghanistan',
u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
u'PUBLISHSTATE': u'Published',
u'REGION': u'Eastern Mediterranean',
u'WORLDBANKINCOMEGROUP': u'Low-income',
u'YEAR': u'2013',
u'Value': u'62',
u'dim': u'COUNTRY': u'Angola',
u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
u'PUBLISHSTATE': u'Published',
u'REGION': u'Africa',
u'WORLDBANKINCOMEGROUP': u'Upper-middle-income',
u'YEAR': u'1996',
u'Value': u'94',
u'dim': u'COUNTRY': u'Andorra',
u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
u'PUBLISHSTATE': u'Published',
u'REGION': u'Europe',
u'WORLDBANKINCOMEGROUP': u'High-income',
u'YEAR': u'2005',
u'Value': u'34',
u'dim': u'COUNTRY': u'United Arab Emirates',
u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
u'PUBLISHSTATE': u'Published',
u'REGION': u'Eastern Mediterranean',
u'WORLDBANKINCOMEGROUP': u'High-income',
u'YEAR': u'1980',
我试过了
#Setting Up and loading JSON into object ready to turn into dataframe
url = "http://apps.who.int/gho/athena/data/GHO/WHS8_110.json?profile=simple&filter=COUNTRY:*"
response = requests.get(url)
response_json = response.content
json.loads(response_json)
whoDataSetVaccinationRate = json.loads(response_json)
#Attempt to load JSON Data into Pandas Dataframe
whoDataSetVaccinationRateDF = pd.DataFrame(whoDataSetVaccinationRate['fact']
, columns=['COUNTRY', 'YEAR','REGION'])
whoDataSetVaccinationRateDF
这似乎是读取 - 但我只在 COUNTRY 和 YEAR 的数据框中获得 NaN 值:
而且我意识到无论如何我希望它在数据框中以不同的方式布局 - 我不知道如何调用它。这就是我希望我的数据框看起来的样子:
【问题讨论】:
【参考方案1】:将json_normalize
与pivot
一起使用:
from pandas.io.json import json_normalize
import urllib.request, json
#https://***.com/a/12965254
url = "http://apps.who.int/gho/athena/data/GHO/WHS8_110.json?profile=simple&filter=COUNTRY:*"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
df = json_normalize(data['fact']).pivot('dim.COUNTRY','dim.YEAR','Value').astype(float)
print (df.head())
dim.YEAR 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ... \
dim.COUNTRY ...
Afghanistan 11.0 NaN 8.0 9.0 14.0 14.0 14.0 31.0 34.0 22.0 ...
Albania 90.0 90.0 93.0 96.0 96.0 96.0 96.0 96.0 96.0 96.0 ...
Algeria NaN NaN NaN NaN NaN 68.0 67.0 73.0 81.0 82.0 ...
Andorra NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
Angola NaN NaN NaN 26.0 35.0 44.0 44.0 55.0 56.0 48.0 ...
dim.YEAR 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
dim.COUNTRY
Afghanistan 55.0 59.0 60.0 62.0 64.0 59.0 57.0 60.0 62.0 62.0
Albania 97.0 98.0 97.0 99.0 99.0 98.0 99.0 98.0 97.0 96.0
Algeria 92.0 88.0 92.0 95.0 95.0 95.0 95.0 95.0 95.0 94.0
Andorra 94.0 98.0 98.0 99.0 99.0 98.0 95.0 96.0 96.0 97.0
Angola 71.0 61.0 57.0 72.0 64.0 72.0 66.0 60.0 55.0 49.0
[5 rows x 37 columns]
【讨论】:
这太棒了!你真的是一个救星——这正是我想要做的。您添加的关于读取 JSON 的更好方法的额外信息也是非常优雅的代码 - 谢谢 不幸的是,您提供的代码是特定于 python 3 的 - 我仅限于 python 2.7。我正在尝试使用 urllib2.urlopen 进行修改,但运气不佳:(也许您可以建议 2.7 中的方法? @kiltannen - 来自https://***.com/a/12965254
的python2 解决方案不起作用?
谢谢你 - 现在我正在努力解决这个问题。会回复你的。感谢所有帮助
@kiltannen - 好消息,超级棒!以上是关于如何将此 JSON 数据读入 Python的主要内容,如果未能解决你的问题,请参考以下文章
使用 pyspark 将 json 文件读入 RDD(不是 dataFrame)