使用 Pandas 读取嵌套 JSON 的 KeyError
Posted
技术标签:
【中文标题】使用 Pandas 读取嵌套 JSON 的 KeyError【英文标题】:KeyError reading a nested JSON with Pandas 【发布时间】:2019-05-24 08:15:53 【问题描述】:我正在尝试访问 JSON 文件内的嵌套属性的值。文件是:
"_id" : "$oid" : "5b9058462f38434ab0d85cd3" , "user_day_code" : "ead1db07fa526e19fe237115d5516fbdc5acb99057b885e8f662a147990b3c4b", "idplug_base" : 5, "track" : "type" : "FeatureCollection", "features" : [ "geometry" : "type" : "Point", "coordinates" : [ -3.7073786, 40.4237274997222 ] , "type" : "Feature", "properties" : "var" : "28015,ES,Madrid,Madrid,CALLE SAN BERNARDO 38,Madrid", "speed" : 1.75, "secondsfromstart" : 205 , "geometry" : "type" : "Point", "coordinates" : [ -3.709896, 40.4191897997222 ] , "type" : "Feature", "properties" : "var" : "28013,ES,Madrid,Madrid,CUSTA SANTO DOMINGO 6,Madrid", "speed" : 4.63, "secondsfromstart" : 85 ] , "user_type" : 1, "idunplug_base" : 17, "travel_time" : 263, "idunplug_station" : 40, "ageRange" : 0, "idplug_station" : 16, "unplug_hourTime" : "$date" : "2018-09-01T01:00:00.000+0200" , "zip_code" : ""
"_id" : "$oid" : "5b9058462f38434ab0d85ce9" , "user_day_code" : "420d9e220bd8816681162e15e9afcb1c69c5a756090728701083c5c0b23502f2", "idplug_base" : 12, "track" : "type" : "FeatureCollection", "features" : [ "geometry" : "type" : "Point", "coordinates" : [ -3.7022001, 40.4052982997222 ] , "type" : "Feature", "properties" : "var" : "28012,ES,Madrid,Madrid,GTA EMBAJADORES,Madrid", "speed" : 0.33, "secondsfromstart" : 351 , "geometry" : "type" : "Point", "coordinates" : [ -3.698618, 40.4061700997222 ] , "type" : "Feature", "properties" : "var" : "28012,ES,Madrid,Madrid,RONDA ATOCHA 30,Madrid", "speed" : 6.36, "secondsfromstart" : 291 , "geometry" : "type" : "Point", "coordinates" : [ -3.6949231, 40.4072785997222 ] , "type" : "Feature", "properties" : "var" : "28012,ES,Madrid,Madrid,RONDA ATOCHA,Madrid", "speed" : 4.77, "secondsfromstart" : 231 , "geometry" : "type" : "Point", "coordinates" : [ -3.6920543, 40.4081501 ] , "type" : "Feature", "properties" : "var" : "28012,ES,Madrid,Madrid,PLAZA EMPERADOR CARLOS V 1,Madrid", "speed" : 4.38, "secondsfromstart" : 170 ] , "user_type" : 1, "idunplug_base" : 26, "travel_time" : 382, "idunplug_station" : 85, "ageRange" : 2, "idplug_station" : 52, "unplug_hourTime" : "$date" : "2018-09-01T01:00:00.000+0200" , "zip_code" : "28009"
"_id" : "$oid" : "5b9058462f38434ab0d85ced" , "user_day_code" : "780f5c8157efe8e6dca44dbd689817d4b126364fca917f0e668bad9e7bf96939", "idplug_base" : 1, "track" : "type" : "FeatureCollection", "features" : [ "geometry" : "type" : "Point", "coordinates" : [ -3.69610249972222, 40.427829 ] , "type" : "Feature", "properties" : "var" : "28004,ES,Madrid,Madrid,PLAZA ALONSO MARTINEZ,Madrid", "speed" : 6.22, "secondsfromstart" : 200 , "geometry" : "type" : "Point", "coordinates" : [ -3.69482799972222, 40.4282634997222 ] , "type" : "Feature", "properties" : "var" : "28010,ES,Madrid,Madrid,CALLE FERNANDO EL SANTO 4,Madrid", "speed" : 0, "secondsfromstart" : 140 , "geometry" : "type" : "Point", "coordinates" : [ -3.69164359972222, 40.4280088 ] , "type" : "Feature", "properties" : "var" : "28010,ES,Madrid,Madrid,CALLE FERNANDO EL SANTO 20,Madrid", "speed" : 5.05, "secondsfromstart" : 80 ] , "user_type" : 1, "idunplug_base" : 11, "travel_time" : 305, "idunplug_station" : 109, "ageRange" : 4, "idplug_station" : 58, "unplug_hourTime" : "$date" : "2018-09-01T01:00:00.000+0200" , "zip_code" : "28004"
"_id" : "$oid" : "5b9058462f38434ab0d85cee" , "user_day_code" : "a225ab7b4b74954cd9fbe8cc2ec63390cd04e92cdd1a2fe1e58d42faea082b21", "idplug_base" : 1, "track" : "type" : "FeatureCollection", "features" : [ "geometry" : "type" : "Point", "coordinates" : [ -3.72050759972222, 40.4277548 ] , "type" : "Feature", "properties" : "var" : "28008,ES,Madrid,Madrid,PASEO PINTOR ROSALES 49P,Madrid", "speed" : 0.86, "secondsfromstart" : 258 , "geometry" : "type" : "Point", "coordinates" : [ -3.717881, 40.4274713 ] , "type" : "Feature", "properties" : "var" : "28008,ES,Madrid,Madrid,CALLE QUINTANA 17,Madrid", "speed" : 6.75, "secondsfromstart" : 199 , "geometry" : "type" : "Point", "coordinates" : [ -3.7142441, 40.4297779997222 ] , "type" : "Feature", "properties" : "var" : "28015,ES,Madrid,Madrid,CALLE SERRANO JOVER 4D,Madrid", "speed" : 7.08, "secondsfromstart" : 139 , "geometry" : "type" : "Point", "coordinates" : [ -3.71240559972222, 40.4341422997222 ] , "type" : "Feature", "properties" : "var" : "28015,ES,Madrid,Madrid,CALLE FERNANDO EL CATOLICO 47A,Madrid", "speed" : 5.25, "secondsfromstart" : 79 , "geometry" : "type" : "Point", "coordinates" : [ -3.7089558, 40.4340593 ] , "type" : "Feature", "properties" : "var" : "28015,ES,Madrid,Madrid,CALLE FERNANDO EL CATOLICO 21,Madrid", "speed" : 5.61, "secondsfromstart" : 19 ] , "user_type" : 1, "idunplug_base" : 1, "travel_time" : 262, "idunplug_station" : 168, "ageRange" : 4, "idplug_station" : 120, "unplug_hourTime" : "$date" : "2018-09-01T01:00:00.000+0200" , "zip_code" : "28015"
使用的代码是:
d100 = pd.read_json('test 1.json', lines=True)
d100["track"]["features"][0]["geometry"]["coordinates"]
检索到 KeyError
,尽管看起来 Key 似乎是正确的:
/home/cloudera/anaconda2/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560
tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: 'features'
有什么帮助吗?
【问题讨论】:
【参考方案1】:您的d100["track"]
是您的d100
的完整列,即。 e. 系列:
In[1]: d100["track"]
Out[1]:
0 'type': 'FeatureCollection', 'features': ['geometry': 'type': 'Point', 'coord.....
1 'type': 'FeatureCollection', 'features': ['geometry': 'type': 'Point', 'coord.....
2 'type': 'FeatureCollection', 'features': ['geometry': 'type': 'Point', 'coord.....
3 'type': 'FeatureCollection', 'features': ['geometry': 'type': 'Point', 'coord.....
Name: track, dtype: object
它有 四个 项(行),但您没有指定其中任何一项。
所以不是
d100["track"]["features"][0]["geometry"]["coordinates"]
使用(用于d100["track"]
系列中的单个项目)
d100["track"][0]["features"][0]["geometry"]["coordinates"]
d100["track"][1]["features"][0]["geometry"]["coordinates"]
d100["track"][2]["features"][0]["geometry"]["coordinates"]
d100["track"][3]["features"][0]["geometry"]["coordinates"]
获得
[-3.7073786, 40.4237274997222] [-3.7022000999999998, 40.4052982997222] [-3.69610249972222, 40.427829] [-3.72050759972222, 40.4277548]
【讨论】:
以上是关于使用 Pandas 读取嵌套 JSON 的 KeyError的主要内容,如果未能解决你的问题,请参考以下文章
使用 python/pandas 从特定文件夹中读取几个嵌套的 .json 文件到 excel 中
将带有嵌套字典的json响应转换为pandas数据框[重复]