使用 Pandas 读取嵌套 JSON 的 KeyError

Posted

技术标签:

【中文标题】使用 Pandas 读取嵌套 JSON 的 KeyError【英文标题】:KeyError reading a nested JSON with Pandas 【发布时间】:2019-05-24 08:15:53 【问题描述】:

我正在尝试访问 JSON 文件内的嵌套属性的值。文件是:

 "_id" :  "$oid" : "5b9058462f38434ab0d85cd3" , "user_day_code" : "ead1db07fa526e19fe237115d5516fbdc5acb99057b885e8f662a147990b3c4b", "idplug_base" : 5, "track" :  "type" : "FeatureCollection", "features" : [  "geometry" :  "type" : "Point", "coordinates" : [ -3.7073786, 40.4237274997222 ] , "type" : "Feature", "properties" :  "var" : "28015,ES,Madrid,Madrid,CALLE SAN BERNARDO 38,Madrid", "speed" : 1.75, "secondsfromstart" : 205  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.709896, 40.4191897997222 ] , "type" : "Feature", "properties" :  "var" : "28013,ES,Madrid,Madrid,CUSTA SANTO DOMINGO 6,Madrid", "speed" : 4.63, "secondsfromstart" : 85   ] , "user_type" : 1, "idunplug_base" : 17, "travel_time" : 263, "idunplug_station" : 40, "ageRange" : 0, "idplug_station" : 16, "unplug_hourTime" :  "$date" : "2018-09-01T01:00:00.000+0200" , "zip_code" : "" 
 "_id" :  "$oid" : "5b9058462f38434ab0d85ce9" , "user_day_code" : "420d9e220bd8816681162e15e9afcb1c69c5a756090728701083c5c0b23502f2", "idplug_base" : 12, "track" :  "type" : "FeatureCollection", "features" : [  "geometry" :  "type" : "Point", "coordinates" : [ -3.7022001, 40.4052982997222 ] , "type" : "Feature", "properties" :  "var" : "28012,ES,Madrid,Madrid,GTA EMBAJADORES,Madrid", "speed" : 0.33, "secondsfromstart" : 351  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.698618, 40.4061700997222 ] , "type" : "Feature", "properties" :  "var" : "28012,ES,Madrid,Madrid,RONDA ATOCHA 30,Madrid", "speed" : 6.36, "secondsfromstart" : 291  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.6949231, 40.4072785997222 ] , "type" : "Feature", "properties" :  "var" : "28012,ES,Madrid,Madrid,RONDA ATOCHA,Madrid", "speed" : 4.77, "secondsfromstart" : 231  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.6920543, 40.4081501 ] , "type" : "Feature", "properties" :  "var" : "28012,ES,Madrid,Madrid,PLAZA EMPERADOR CARLOS V 1,Madrid", "speed" : 4.38, "secondsfromstart" : 170   ] , "user_type" : 1, "idunplug_base" : 26, "travel_time" : 382, "idunplug_station" : 85, "ageRange" : 2, "idplug_station" : 52, "unplug_hourTime" :  "$date" : "2018-09-01T01:00:00.000+0200" , "zip_code" : "28009" 
 "_id" :  "$oid" : "5b9058462f38434ab0d85ced" , "user_day_code" : "780f5c8157efe8e6dca44dbd689817d4b126364fca917f0e668bad9e7bf96939", "idplug_base" : 1, "track" :  "type" : "FeatureCollection", "features" : [  "geometry" :  "type" : "Point", "coordinates" : [ -3.69610249972222, 40.427829 ] , "type" : "Feature", "properties" :  "var" : "28004,ES,Madrid,Madrid,PLAZA ALONSO MARTINEZ,Madrid", "speed" : 6.22, "secondsfromstart" : 200  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.69482799972222, 40.4282634997222 ] , "type" : "Feature", "properties" :  "var" : "28010,ES,Madrid,Madrid,CALLE FERNANDO EL SANTO 4,Madrid", "speed" : 0, "secondsfromstart" : 140  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.69164359972222, 40.4280088 ] , "type" : "Feature", "properties" :  "var" : "28010,ES,Madrid,Madrid,CALLE FERNANDO EL SANTO 20,Madrid", "speed" : 5.05, "secondsfromstart" : 80   ] , "user_type" : 1, "idunplug_base" : 11, "travel_time" : 305, "idunplug_station" : 109, "ageRange" : 4, "idplug_station" : 58, "unplug_hourTime" :  "$date" : "2018-09-01T01:00:00.000+0200" , "zip_code" : "28004" 
 "_id" :  "$oid" : "5b9058462f38434ab0d85cee" , "user_day_code" : "a225ab7b4b74954cd9fbe8cc2ec63390cd04e92cdd1a2fe1e58d42faea082b21", "idplug_base" : 1, "track" :  "type" : "FeatureCollection", "features" : [  "geometry" :  "type" : "Point", "coordinates" : [ -3.72050759972222, 40.4277548 ] , "type" : "Feature", "properties" :  "var" : "28008,ES,Madrid,Madrid,PASEO PINTOR ROSALES 49P,Madrid", "speed" : 0.86, "secondsfromstart" : 258  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.717881, 40.4274713 ] , "type" : "Feature", "properties" :  "var" : "28008,ES,Madrid,Madrid,CALLE QUINTANA 17,Madrid", "speed" : 6.75, "secondsfromstart" : 199  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.7142441, 40.4297779997222 ] , "type" : "Feature", "properties" :  "var" : "28015,ES,Madrid,Madrid,CALLE SERRANO JOVER 4D,Madrid", "speed" : 7.08, "secondsfromstart" : 139  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.71240559972222, 40.4341422997222 ] , "type" : "Feature", "properties" :  "var" : "28015,ES,Madrid,Madrid,CALLE FERNANDO EL CATOLICO 47A,Madrid", "speed" : 5.25, "secondsfromstart" : 79  ,  "geometry" :  "type" : "Point", "coordinates" : [ -3.7089558, 40.4340593 ] , "type" : "Feature", "properties" :  "var" : "28015,ES,Madrid,Madrid,CALLE FERNANDO EL CATOLICO 21,Madrid", "speed" : 5.61, "secondsfromstart" : 19   ] , "user_type" : 1, "idunplug_base" : 1, "travel_time" : 262, "idunplug_station" : 168, "ageRange" : 4, "idplug_station" : 120, "unplug_hourTime" :  "$date" : "2018-09-01T01:00:00.000+0200" , "zip_code" : "28015" 

使用的代码是:

d100 =  pd.read_json('test 1.json', lines=True)
d100["track"]["features"][0]["geometry"]["coordinates"]

检索到 KeyError,尽管看起来 Key 似乎是正确的:

/home/cloudera/anaconda2/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_value(self, series, key)
2558         try:
2559             return self._engine.get_value(s, k,
-> 2560
tz=getattr(series.dtype, 'tz', None))
2561         except KeyError as e1:
2562             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'features'

有什么帮助吗?

【问题讨论】:

【参考方案1】:

您的d100["track"] 是您的d100完整列,即。 e. 系列:

In[1]: d100["track"]                                                                
Out[1]:                                                                             
0    'type': 'FeatureCollection', 'features': ['geometry': 'type': 'Point', 'coord.....
1    'type': 'FeatureCollection', 'features': ['geometry': 'type': 'Point', 'coord.....
2    'type': 'FeatureCollection', 'features': ['geometry': 'type': 'Point', 'coord.....
3    'type': 'FeatureCollection', 'features': ['geometry': 'type': 'Point', 'coord.....
Name: track, dtype: object      

它有 四个 项(行),但您没有指定其中任何一项

所以不是

d100["track"]["features"][0]["geometry"]["coordinates"]

使用(用于d100["track"] 系列中的单个项目)

d100["track"][0]["features"][0]["geometry"]["coordinates"] 
d100["track"][1]["features"][0]["geometry"]["coordinates"] 
d100["track"][2]["features"][0]["geometry"]["coordinates"] 
d100["track"][3]["features"][0]["geometry"]["coordinates"] 

获得

[-3.7073786, 40.4237274997222]
[-3.7022000999999998, 40.4052982997222]
[-3.69610249972222, 40.427829]
[-3.72050759972222, 40.4277548]

【讨论】:

以上是关于使用 Pandas 读取嵌套 JSON 的 KeyError的主要内容,如果未能解决你的问题,请参考以下文章

使用 python/pandas 从特定文件夹中读取几个嵌套的 .json 文件到 excel 中

将带有嵌套字典的json响应转换为pandas数据框[重复]

在 Pandas Dataframe 中读取嵌套的 json 文件

在熊猫数据框中读取嵌套 JSON 时遇到问题

如何在熊猫数据框中读取嵌套的 json 文件?

使用 Pandas 嵌套 JSON