将 Pandas 数据框转换为嵌套 JSON

Posted

技术标签:

【中文标题】将 Pandas 数据框转换为嵌套 JSON【英文标题】:Convert Pandas Dataframe to nested JSON 【发布时间】:2017-03-21 03:17:53 【问题描述】:

我是 Python 和 Pandas 的新手。我正在尝试将 Pandas Dataframe 转换为嵌套的 JSON。 .to_json() 函数没有给我足够的灵活性来实现我的目标。

以下是数据框的一些数据点(以 csv 格式,逗号分隔):

,ID,Location,Country,Latitude,Longitude,timestamp,tide  
0,1,BREST,FRA,48.383,-4.495,1807-01-01,6905.0  
1,1,BREST,FRA,48.383,-4.495,1807-02-01,6931.0  
2,1,BREST,FRA,48.383,-4.495,1807-03-01,6896.0  
3,1,BREST,FRA,48.383,-4.495,1807-04-01,6953.0  
4,1,BREST,FRA,48.383,-4.495,1807-05-01,7043.0  
2508,7,CUXHAVEN 2,DEU,53.867,8.717,1843-01-01,7093.0  
2509,7,CUXHAVEN 2,DEU,53.867,8.717,1843-02-01,6688.0  
2510,7,CUXHAVEN 2,DEU,53.867,8.717,1843-03-01,6493.0  
2511,7,CUXHAVEN 2,DEU,53.867,8.717,1843-04-01,6723.0  
2512,7,CUXHAVEN 2,DEU,53.867,8.717,1843-05-01,6533.0  
4525,9,MAASSLUIS,NLD,51.918,4.25,1848-02-01,6880.0  
4526,9,MAASSLUIS,NLD,51.918,4.25,1848-03-01,6700.0  
4527,9,MAASSLUIS,NLD,51.918,4.25,1848-04-01,6775.0  
4528,9,MAASSLUIS,NLD,51.918,4.25,1848-05-01,6580.0  
4529,9,MAASSLUIS,NLD,51.918,4.25,1848-06-01,6685.0  
6540,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-07-01,6957.0  
6541,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-08-01,6944.0  
6542,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-09-01,7084.0  
6543,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-10-01,6898.0  
6544,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-11-01,6859.0  
8538,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-07-01,6909.0  
8539,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-08-01,6940.0  
8540,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-09-01,6961.0  
8541,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-10-01,6952.0  
8542,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-11-01,6952.0  

有很多重复的信息,我想要一个这样的JSON:

[

    "ID": 1,
    "Location": "BREST",
    "Latitude": 48.383,
    "Longitude": -4.495,
    "Country": "FRA",
    "Tide-Data": 
        "1807-02-01": 6931,
        "1807-03-01": 6896,
        "1807-04-01": 6953,
        "1807-05-01": 7043
    
,

    "ID": 5,
    "Location": "HOLYHEAD",
    "Latitude": 53.31399999999999,
    "Longitude": -4.62,
    "Country": "GBR",
    "Tide-Data": 
        "1807-02-01": 6931,
        "1807-03-01": 6896,
        "1807-04-01": 6953,
        "1807-05-01": 7043
    

]

我怎样才能做到这一点?

编辑:

重现数据框的代码:

# input json
json_str = '["ID":1,"Location":"BREST","Country":"FRA","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-01-01","tide":6905,"ID":1,"Location":"BREST","Country":"FRA","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-02-01","tide":6931,"ID":1,"Location":"BREST","Country":"DEU","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-03-01","tide":6896,"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-01-01","tide":7093,"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-02-01","tide":6688,"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-03-01","tide":6493]'

# load json object
data_list = json.loads(json_str)

# create dataframe
df = json_normalize(data_list, None, None)

【问题讨论】:

pandas.DataFrame.to_json 有很多选择。看看你是否能通过这些选项得到你想要的。 特别是查看orient 选项。 我不知道怎么做。它一次又一次地重复所有相同的信息,但我希望列时间戳和潮汐是嵌套的。 如果您想嵌套timestamptide,最好在调用to_json 之前这样做。对不起,我一开始误解了这个问题。 但我的问题是:如何将它们放在一起? 【参考方案1】:

更新:

j = (df.groupby(['ID','Location','Country','Latitude','Longitude'])
       .apply(lambda x: x[['timestamp','tide']].to_dict('records'))
       .reset_index()
       .rename(columns=0:'Tide-Data')
       .to_json(orient='records'))
     

结果(格式化):

In [103]: print(json.dumps(json.loads(j), indent=2, sort_keys=True))
[
  
    "Country": "FRA",
    "ID": 1,
    "Latitude": 48.383,
    "Location": "BREST",
    "Longitude": -4.495,
    "Tide-Data": [
      
        "tide": 6905.0,
        "timestamp": "1807-01-01"
      ,
      
        "tide": 6931.0,
        "timestamp": "1807-02-01"
      ,
      
        "tide": 6896.0,
        "timestamp": "1807-03-01"
      ,
      
        "tide": 6953.0,
        "timestamp": "1807-04-01"
      ,
      
        "tide": 7043.0,
        "timestamp": "1807-05-01"
      
    ]
  ,
  
    "Country": "DEU",
    "ID": 7,
    "Latitude": 53.867,
    "Location": "CUXHAVEN 2",
    "Longitude": 8.717,
    "Tide-Data": [
      
        "tide": 7093.0,
        "timestamp": "1843-01-01"
      ,
      
        "tide": 6688.0,
        "timestamp": "1843-02-01"
      ,
      
        "tide": 6493.0,
        "timestamp": "1843-03-01"
      ,
      
        "tide": 6723.0,
        "timestamp": "1843-04-01"
      ,
      
        "tide": 6533.0,
        "timestamp": "1843-05-01"
      
    ]
  ,
  
    "Country": "DEU",
    "ID": 8,
    "Latitude": 53.899,
    "Location": "WISMAR 2",
    "Longitude": 11.458,
    "Tide-Data": [
      
        "tide": 6957.0,
        "timestamp": "1848-07-01"
      ,
      
        "tide": 6944.0,
        "timestamp": "1848-08-01"
      ,
      
        "tide": 7084.0,
        "timestamp": "1848-09-01"
      ,
      
        "tide": 6898.0,
        "timestamp": "1848-10-01"
      ,
      
        "tide": 6859.0,
        "timestamp": "1848-11-01"
      
    ]
  ,
  
    "Country": "NLD",
    "ID": 9,
    "Latitude": 51.918,
    "Location": "MAASSLUIS",
    "Longitude": 4.25,
    "Tide-Data": [
      
        "tide": 6880.0,
        "timestamp": "1848-02-01"
      ,
      
        "tide": 6700.0,
        "timestamp": "1848-03-01"
      ,
      
        "tide": 6775.0,
        "timestamp": "1848-04-01"
      ,
      
        "tide": 6580.0,
        "timestamp": "1848-05-01"
      ,
      
        "tide": 6685.0,
        "timestamp": "1848-06-01"
      
    ]
  ,
  
    "Country": "USA",
    "ID": 10,
    "Latitude": 37.807,
    "Location": "SAN FRANCISCO",
    "Longitude": -122.465,
    "Tide-Data": [
      
        "tide": 6909.0,
        "timestamp": "1854-07-01"
      ,
      
        "tide": 6940.0,
        "timestamp": "1854-08-01"
      ,
      
        "tide": 6961.0,
        "timestamp": "1854-09-01"
      ,
      
        "tide": 6952.0,
        "timestamp": "1854-10-01"
      ,
      
        "tide": 6952.0,
        "timestamp": "1854-11-01"
      
    ]
  
]

旧答案:

您可以使用groupby()apply()to_json() 方法来做到这一点:

j = (df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False)
       .apply(lambda x: dict(zip(x.timestamp,x.tide)))
       .reset_index()
       .rename(columns=0:'Tide-Data')
       .to_json(orient='records'))

输出:

In [112]: print(json.dumps(json.loads(j), indent=2, sort_keys=True))
[
  
    "Country": "FRA",
    "ID": 1,
    "Latitude": 48.383,
    "Location": "BREST",
    "Longitude": -4.495,
    "Tide-Data": 
      "1807-01-01": 6905.0,
      "1807-02-01": 6931.0,
      "1807-03-01": 6896.0,
      "1807-04-01": 6953.0,
      "1807-05-01": 7043.0
    
  ,
  
    "Country": "DEU",
    "ID": 7,
    "Latitude": 53.867,
    "Location": "CUXHAVEN 2",
    "Longitude": 8.717,
    "Tide-Data": 
      "1843-01-01": 7093.0,
      "1843-02-01": 6688.0,
      "1843-03-01": 6493.0,
      "1843-04-01": 6723.0,
      "1843-05-01": 6533.0
    
  ,
  
    "Country": "DEU",
    "ID": 8,
    "Latitude": 53.899,
    "Location": "WISMAR 2",
    "Longitude": 11.458,
    "Tide-Data": 
      "1848-07-01": 6957.0,
      "1848-08-01": 6944.0,
      "1848-09-01": 7084.0,
      "1848-10-01": 6898.0,
      "1848-11-01": 6859.0
    
  ,
  
    "Country": "NLD",
    "ID": 9,
    "Latitude": 51.918,
    "Location": "MAASSLUIS",
    "Longitude": 4.25,
    "Tide-Data": 
      "1848-02-01": 6880.0,
      "1848-03-01": 6700.0,
      "1848-04-01": 6775.0,
      "1848-05-01": 6580.0,
      "1848-06-01": 6685.0
    
  ,
  
    "Country": "USA",
    "ID": 10,
    "Latitude": 37.807,
    "Location": "SAN FRANCISCO",
    "Longitude": -122.465,
    "Tide-Data": 
      "1854-07-01": 6909.0,
      "1854-08-01": 6940.0,
      "1854-09-01": 6961.0,
      "1854-10-01": 6952.0,
      "1854-11-01": 6952.0
    
  
]

PS 如果你不关心身份,你可以直接写入 JSON 文件:

(df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False)
   .apply(lambda x: dict(zip(x.timestamp,x.tide)))
   .reset_index()
   .rename(columns=0:'Tide-Data')
   .to_json('/path/to/file_name.json', orient='records'))

【讨论】:

@Felix,很高兴我能帮上忙 :) 我刚刚意识到我需要这种格式的数据:“Tide-Data”:“timestamp”:“1848-07-01”,“tide”:“6957.0”。我必须改变你的功能吗? @MaxU 对于新版本的 pandas(例如 1.2.1),这不再适用于我。我收到此错误:ValueError: 1 列已通过,传递的数据有 n 列(在我的情况下 n 为 5)。为了实现这一点,熊猫发生了什么变化? 使用 pandas 1.3.1 版运行此示例: j = (df.groupby(['ID','Location','Country','Latitude','Longitude']) 。 apply(lambda x: x[['timestamp','tide']].to_dict('records')) .reset_index() .rename(columns=0:'Tide-Data') .to_json(orient='记录')) @mapsa,谢谢你的提示 - 我已经修复了答案中的代码,所以它现在应该适用于现代版本的 Pandas)

以上是关于将 Pandas 数据框转换为嵌套 JSON的主要内容,如果未能解决你的问题,请参考以下文章

将多个嵌套 JSON 转换为 Pandas 数据框

如何将此嵌套的 JSON 以柱状形式转换为 Pandas 数据框

将带有嵌套字典的json响应转换为pandas数据框[重复]

将 Pandas Dataframe 转换为表记录的嵌套 JSON

使用 Pandas 在 Python 中过滤嵌套的 JSON 数据

将嵌套对象的JSON转换为Pandas Dataframe