如何将 Pandas 系列中的多个字典键转换为 DataFrame 中的列？

Posted 2023-03-12

技术标签:

【中文标题】如何将 Pandas 系列中的多个字典键转换为 DataFrame 中的列？【英文标题】：How to convert multiple dictionary keys in a Pandas Series to columns in a DataFrame? 【发布时间】：2021-11-19 00:49:45 【问题描述】：

我有以下带有 2 列的 pandas DataFrame：地址和事务。

    Address                                     Transactions
0   0x88aDa02f6fCE2F1A835567B4999D62a7ebb70367  ['type': 'outflow', 'amount': '250,000 VSO', 'type': inflow, 'amount': 100,000]
1   0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  ['type': 'inflow', 'amount': '9.1283802424254', 'type': inflow, 'amount': 100,000]
2   0x5852346d9dC3d64d81dc82fdddd5Cc1211157cD5  ['type': 'outflow', 'amount': '7,200 VSO', 'type': inflow, 'amount': 100,000]

每个地址有多个事务，一个地址的所有事务由一个列表表示，每个事务包含一个字典。

每个字典都有两个键和两个值：分别是类型和数量。

创建上表的代码如下：

df_dict = pd.DataFrame(dict_all_txs_all_addresses.items(), columns=['Address', 'Transactions'])

我想做的事： 我想创建一个看起来像这样的多索引（可能是不必要的？）表：

    Address                                         Type                             Amount
    0   0x88aDa02f6fCE2F1A835567B4999D62a7ebb70367  outflow                          250,000 VSO
                                                    inflow                           100,000 VSO

    1   0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  inflow                           330,000 VSO
                                                    inflow                           150,000 VSO'

它将每笔交易显示在不同的行中，同时只维护一个地址。请注意，此模型表有 3 列。

也许这可以使用 df.groupby() 而不是多索引 df 来解决？

这是一个字典示例，便于阅读和操作：

dict_all_txs_all_addresses = 
        "0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d": [
            
                "amount": "330,000 VSO",
                "type": "inflow"
            ,
            
                "amount": "150,000 VSO",
                "type": "inflow"
            
        ],
        "0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367": [
            
                "amount": "250,000 VSO",
                "type": "outflow"
            ,
            
                "amount": "100,000 VSO",
                "type": "inflow"
            
        ]

【问题讨论】：

【参考方案1】：

我们可以在这里使用pd.json_normalize 来获得一个可行的整洁格式：

df = df.explode("Transactions", ignore_index=True)
df = pd.concat([df, pd.json_normalize(df.pop("Transactions"))], axis=1)

                                      Address       amount     type
0  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  330,000 VSO   inflow
1  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  150,000 VSO   inflow
2  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  250,000 VSO  outflow
3  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  100,000 VSO   inflow

【讨论】：

【参考方案2】：

分解Transactions 列，然后使用apply(pd.Series) 技巧将其扩展为多个列：

(df.set_index('Address')
   .explode('Transactions')
   .Transactions
   .apply(pd.Series)
   .set_index('type', append=True))

                                                         amount
Address                                    type                
0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d inflow   330,000 VSO
                                           inflow   150,000 VSO
0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow  250,000 VSO
                                           inflow   100,000 VSO

如果您需要所有列作为普通列而不是索引，请使用reset_index 而不是set_index：

df.set_index('Address').explode('Transactions').Transactions.apply(pd.Series).reset_index()

                                      Address       amount     type
0  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  330,000 VSO   inflow
1  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  150,000 VSO   inflow
2  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  250,000 VSO  outflow
3  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  100,000 VSO   inflow

【讨论】：

我在这种方法中遇到的唯一问题是生成的 df 中唯一的列是数量。也许@Erfan 的解决方案可以解决这个问题。 @LuizScheuer 其他两列作为索引。您想要生成的 df 中的所有三列吗？是的，没错。理想的情况是没有索引（或者索引是地址）。

以上是关于如何将 Pandas 系列中的多个字典键转换为 DataFrame 中的列？的主要内容，如果未能解决你的问题，请参考以下文章