Python pandas：通过代理键将 JSON 扁平化为行的快速方法

Posted 2023-02-23

技术标签:

【中文标题】Python pandas：通过代理键将 JSON 扁平化为行的快速方法【英文标题】：Python pandas: fast way to flatten JSON into rows by a surrogate key 【发布时间】：2021-08-15 19:11:11 【问题描述】：

我对@987654321@ 等软件包的了解相当浅，我一直在寻找一种将数据扁平化为行的解决方案。使用像这样的dict，使用名为entry_id 的代理键：

data = [
    
        "id": 1,
        "entry_id": 123,
        "type": "ticker",
        "value": "IBM"
    ,
    
        "id": 2,
        "entry_id": 123,
        "type": "company_name",
        "value": "International Business Machines"
    ,
    
        "id": 3,
        "entry_id": 123,
        "type": "cusip",
        "value": "01234567"
    ,
    
        "id": 4,
        "entry_id": 321,
        "type": "ticker",
        "value": "AAPL"
    ,
    
        "id": 5,
        "entry_id": 321,
        "type": "permno",
        "value": "123456"
    ,
    
        "id": 6,
        "entry_id": 321,
        "type": "company_name",
        "value": "Apple, Inc."
    ,
    
        "id": 7,
        "entry_id": 321,
        "type": "formation_date",
        "value": "1976-04-01"
    
]

我想将数据展平为按代理键 entry_id 分组的行，看起来像这样（空字符串或 None 值，没关系）：

[
    "entry_id": 123, "ticker": "IBM", "permno": "", "company_name": "International Business Machines", "cusip": "01234567", "formation_date": "",
    "entry_id": 321, "ticker": "AAPL", "permno": "123456", "company_name": "Apple, Inc", "cusip": "", "formation_date": "1976-04-01"
]

我已尝试使用 DataFrame 的 groupby 和 json_normalize，但无法获得所需结果的正确巫术级别。我可以在纯 Python 中遍历数据，但我确信这不是一个快速的解决方案。我不确定如何指定 type 是列，value 是值，entry_id 是聚合键。我也对pandas 以外的软件包持开放态度。

【问题讨论】：

【参考方案1】：

我们可以从给定的记录列表中创建一个数据框，然后 pivot 要重塑的数据框，fill 带有空字符串的 NaN 值，然后将旋转后的帧转换为字典

df = pd.DataFrame(data)
df.pivot('entry_id', 'type', 'value').fillna('').reset_index().to_dict('r')

['entry_id': 123,
  'company_name': 'International Business Machines',
  'cusip': '01234567',
  'formation_date': '',
  'permno': '',
  'ticker': 'IBM',
 'entry_id': 321,
  'company_name': 'Apple, Inc.',
  'cusip': '',
  'formation_date': '1976-04-01',
  'permno': '123456',
  'ticker': 'AAPL']

【讨论】：

以上是关于Python pandas：通过代理键将 JSON 扁平化为行的快速方法的主要内容，如果未能解决你的问题，请参考以下文章