将 Json 转换为 Pandas 数据框

Posted

技术标签:

【中文标题】将 Json 转换为 Pandas 数据框【英文标题】:Trasform Json into Pandas Dataframe 【发布时间】:2021-12-13 16:18:37 【问题描述】:

我有这种 json,我会将其转换为带有特定列名的 pandas 数据框。


    "data": [
        
            "id": 1,
            "name": "3Way Result",
            "suspended": false,
            "bookmaker": 
                "data": [
                    
                        "id": 27802,
                        "name": "Ladbrokes",
                        "odds": 
                            "data": [
                                
                                    "label": "1",
                                    "value": "1.61",
                                    "probability": "62.11%",
                                    "dp3": "1.610",
                                    "american": -164,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": 
                                        "date": "2021-10-01 16:41:27.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    
                                ,
                                
                                    "label": "X",
                                    "value": "3.90",
                                    "probability": "25.64%",
                                    "dp3": "3.900",
                                    "american": 290,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": 
                                        "date": "2021-10-01 16:41:27.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    
                                ,
                                
                                    "label": "2",
                                    "value": "5.20",
                                    "probability": "19.23%",
                                    "dp3": "5.200",
                                    "american": 420,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": 
                                        "date": "2021-10-01 16:41:27.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    
                                
                            ]
                        
                    ,
                    
                        "id": 70,
                        "name": "Pncl",
                        "odds": 
                            "data": [
                                
                                    "label": "1",
                                    "value": "1.65",
                                    "probability": "60.61%",
                                    "dp3": "1.645",
                                    "american": -154,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": 
                                        "date": "2021-10-01 16:59:18.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    
                                ,
                                
                                    "label": "X",
                                    "value": "4.20",
                                    "probability": "23.81%",
                                    "dp3": "4.200",
                                    "american": 320,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": 
                                        "date": "2021-10-01 16:59:18.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    
                                ,
                                
                                    "label": "2",
                                    "value": "5.43",
                                    "probability": "18.42%",
                                    "dp3": "5.430",
                                    "american": 443,
                                    "factional": null,
                                    "winning": null,
                                    "handicap": null,
                                    "total": null,
                                    "bookmaker_event_id": null,
                                    "last_update": 
                                        "date": "2021-10-01 16:59:18.000000",
                                        "timezone_type": 3,
                                        "timezone": "UTC"
                                    
                                
                            ]
                        
                    
                ]
            
        
    ],
    "meta": 
        "plans": [
            
                "name": "Football Free Plan",
                "features": "Standard",
                "request_limit": "180,60",
                "sport": "Soccer"
            
        ],
        "sports": [
            
                "id": 1,
                "name": "Soccer",
                "current": true
            
        ]
    

所有列名称包含博彩公司的名称加上标签值。 我会将标签中的值用作列名,并在name 中使用博彩公司的名称。然后value 中的float 将其用作数据框的行

这里是预期的输出

   1_LadBrokes  X_LadBrokes  2_LadBrokes       last_update_LadBrokes  1_Pncl  X_Pncl  2_Pncl            last_update_Pncl
0         1.61          3.9          5.2  2021-10-01 16:41:27.000000    1.65     4.2    5.43  2021-10-01 16:59:18.000000

【问题讨论】:

这个链接有帮助吗? ***.com/questions/40588852/pandas-read-nested-json @butterflyknife 会有所帮助,但 OP 需要更多才能达到结果。 【参考方案1】:

使用pd.json_normalize 并为valuelast_update 创建两个子数据框,将它们加入。

out = pd.json_normalize(
          data=data['data'],
          record_path=['bookmaker', 'data', 'odds', 'data'], 
          meta=[['bookmaker', 'data', 'name']]
      )[['label', 'value', 'last_update.date', 'bookmaker.data.name']]

df1 = out.set_index(out['label'] + '_' + out['bookmaker.data.name'])['value']
df2 = out.set_index('bookmaker.data.name')['last_update.date'] \
         .add_prefix('last_update_').drop_duplicates()

df = pd.concat([df1, df2]).to_frame().T

输出:

>>> df
  1_Ladbrokes_Ladbrokes X_Ladbrokes_Ladbrokes 2_Ladbrokes_Ladbrokes 1_Pncl_Pncl X_Pncl_Pncl 2_Pncl_Pncl       last_update_Ladbrokes            last_update_Pncl
0                  1.61                  3.90                  5.20        1.65        4.20        5.43  2021-10-01 16:41:27.000000  2021-10-01 16:59:18.000000

【讨论】:

【参考方案2】:

您可以使用json_normalize + apply 来实现它。

def set_values(x):
    data = x["odds.data"]
    label = data.get("label")
    value = data.get("value")
    last_update_date = data.get("last_update").get("date")
    name = x["name"]
    x[f"label_name"] = value
    x[f"last_update_name"] = last_update_date
    return x


df = (
    pd.json_normalize(data["data"], record_path=["bookmaker", "data"])
    .explode("odds.data")
    .apply(lambda x: set_values(x), axis=1)
    .drop(["odds.data", "id", "name"], axis=1)
    .ffill()
    .bfill()
    .head(1)
)

In [39]: df
Out[39]: 
  1_Ladbrokes 1_Pncl 2_Ladbrokes 2_Pncl X_Ladbrokes X_Pncl       last_update_Ladbrokes            last_update_Pncl
0        1.61   1.65        5.20   5.43        3.90   4.20  2021-10-01 16:41:27.000000  2021-10-01 16:59:18.000000

【讨论】:

谢谢,它有效!

以上是关于将 Json 转换为 Pandas 数据框的主要内容,如果未能解决你的问题,请参考以下文章

如何将 json 转换为 pandas 数据框?

将 Pandas 数据框转换为嵌套 JSON

将 JSON 文件转换为 Pandas 数据框

将 pandas 嵌套的 JSON 结构转换为数据框

快速将 JSON 列转换为 Pandas 数据框

将多个嵌套 JSON 转换为 Pandas 数据框