如何使用字典列表创建分层数据框

Posted

技术标签:

【中文标题】如何使用字典列表创建分层数据框【英文标题】:How to create hierarchical data frame using list of dictionaries 【发布时间】:2021-01-26 10:24:39 【问题描述】:

我有以下要使用 python 展平的字典列表。数据最初来自 xero,如下所示:

这是我使用 API 提取的示例数据:

my_dict = ['RowType': 'Section', 'Title': 'Income', 'Rows': [],'RowType': 'Section', 'Title': 'Income from Rents', 'Rows': [],
 'RowType': 'Section',
  'Title': 'Rent Received',
  'Rows': ['RowType': 'Row',
    'Cells': ['Value': 'Contract Rent',
      'Attributes': ['Value': '5',
        'Id': 'account',
       'Value': '5', 'Id': 'groupID'],
     'Value': '721093.92',
      'Attributes': ['Value': '5',
        'Id': 'account',
       'Value': '5', 'Id': 'groupID']],
   'RowType': 'Row',
    'Cells': ['Value': 'Rent  - Carparks',
      'Attributes': ['Value': '95',
        'Id': 'account'],
     'Value': '3523.33',
      'Attributes': ['Value': '95',
        'Id': 'account']],
   'RowType': 'Row',
    'Cells': ['Value': 'Vacant Tenancies',
      'Attributes': ['Value': '53',
        'Id': 'account'],
     'Value': '-22226.50',
      'Attributes': ['Value': '53',
        'Id': 'account']],
   'RowType': 'SummaryRow',
    'Cells': ['Value': 'Total Rent Received', 'Value': '702390.75']],
 'RowType': 'Section',
  'Title': 'Rent Reductions',
  'Rows': ['RowType': 'Row',
    'Cells': ['Value': 'COVID-19 Rent reduction',
      'Attributes': ['Value': '40',
        'Id': 'account'],
     'Value': '-132478.03',
      'Attributes': ['Value': '40',
        'Id': 'account']],
   'RowType': 'Row',
    'Cells': ['Value': 'Rent Holiday',
      'Attributes': ['Value': '4d',
        'Id': 'account'],

         'Value': '-14451.58',
          'Attributes': ['Value': '4d',
            'Id': 'account']],
       'RowType': 'SummaryRow',
        'Cells': ['Value': 'Total Rent Reductions', 'Value': '-146929.61']]]

想要的输出如下:

          Name        Amount    Hierarchy_level_3   Hierarchy_level_1   Hierarchy_level_2
0   Contract Rent   721093.92   Rent Received            Income        Income from Rents
1   Rent - Carparks 3523.33     Rent Receive             Income        Income from Rents
2   Vacant Tenancies -22226.50  Rent Received            Income        Income from Rents
3   Total Rent Received 702390.75           
4   COVID-19 Rent reduction -132478.03  Rent Reduction   Income        Income from Rents
     .                .              .                       .          .          .
     .                .              .                       .          .           .

谁能帮我解决这个问题?这里的示例数据是我从 api 获得的格式。不知道如何展平这个文件。我对 Python 比较陌生。

【问题讨论】:

离题: 谨防在 SO(即互联网)上发布公司受限/机密数据。它可能(或可能不会)导致您与雇主之间的问题。当然,您发布的数据可能完全是您自己的财产,我不知道(?)。您可以用虚构的数据替换数字,用一些任意但仍然相关的名称替换键(单元格名称)。 ID 已加密,我的个人数据也已加密。这应该不是问题。 【参考方案1】:

假设您示例中的4 行的Hierarchy_level_3Rent Received 而不是Rent Reduction,并且您的示例中具有4 级层次结构,这是一个解决方案。我添加了级别编号和级别名称,因为我认为这些可能比“层次结构级别”更有用,但可以随意删除

import pandas as pd
hierarchy = f'Hierarchy_level_i+1': d['Title'] for i, d in enumerate(my_dict)
all_data = []

for level, d in enumerate(my_dict):
    for row in d['Rows']:
        cells = row['Cells']
        all_data.append(
            'Name': cells[0]['Value'],
            'Amount': cells[1]['Value'],
            'Level': level,
            'Level_name': hierarchy[f'Hierarchy_level_level+1'],
            **hierarchy
        )
df = pd.DataFrame(all_data)

输出:

                   Name      Amount  Level       Level_name Hierarchy_level_1  Hierarchy_level_2 Hierarchy_level_3 Hierarchy_level_4
0            Contract Rent   721093.92      2    Rent Received            Income  Income from Rents     Rent Received   Rent Reductions
1         Rent  - Carparks     3523.33      2    Rent Received            Income  Income from Rents     Rent Received   Rent Reductions
2         Vacant Tenancies   -22226.50      2    Rent Received            Income  Income from Rents     Rent Received   Rent Reductions
3      Total Rent Received   702390.75      2    Rent Received            Income  Income from Rents     Rent Received   Rent Reductions
4  COVID-19 Rent reduction  -132478.03      3  Rent Reductions            Income  Income from Rents     Rent Received   Rent Reductions
5             Rent Holiday   -14451.58      3  Rent Reductions            Income  Income from Rents     Rent Received   Rent Reductions
6    Total Rent Reductions  -146929.61      3  Rent Reductions            Income  Income from Rents     Rent Received   Rent Reductions

--- 编辑 由于只需要 3 级层次结构:

import pandas as pd
hierarchy = f'Hierarchy_level_i+1': d['Title'] for i, d in enumerate(my_dict)
all_data = []

for level, d in enumerate(my_dict):
    for row in d['Rows']:
        cells = row['Cells']
        all_data.append(
            'Name': cells[0]['Value'],
            'Amount': cells[1]['Value'],
            'Hierarchy_level_1': hierarchy[f'Hierarchy_level_1'],
            'Hierarchy_level_2': hierarchy[f'Hierarchy_level_2'],
            'Hierarchy_level_3': hierarchy[f'Hierarchy_level_level+1'],
        )
df = pd.DataFrame(all_data)

输出:

Name      Amount Hierarchy_level_1  Hierarchy_level_2 Hierarchy_level_3
0            Contract Rent   721093.92            Income  Income from Rents     Rent Received
1         Rent  - Carparks     3523.33            Income  Income from Rents     Rent Received
2         Vacant Tenancies   -22226.50            Income  Income from Rents     Rent Received
3      Total Rent Received   702390.75            Income  Income from Rents     Rent Received
4  COVID-19 Rent reduction  -132478.03            Income  Income from Rents   Rent Reductions
5             Rent Holiday   -14451.58            Income  Income from Rents   Rent Reductions
6    Total Rent Reductions  -146929.61            Income  Income from Rents   Rent Reductions

【讨论】:

非常感谢您的帮助。这真的很有帮助。不过,只有一个问题,减租应该是hierarchy_level_3 而不是level hierarchy_level_4。一般来说,最多应该只有三个层次结构。 您可以在**hierarchy 之后添加"Hierarchy_level_3": hierarchy[f'Hierarchy_level_level+1'],这将使用正确的级别覆盖级别3 天哪,对此我感激不尽。从字面上看,我花了好几个小时来解决这个问题。多谢了。 :) 给你一个问题,如果我想在数据框中显示帐户 ID,我该如何包含帐户 ID?你能帮忙吗? ` 可以加'AccountId': cells[0].get('Attributes', ['Value': 'N/A'])[0]['Value']

以上是关于如何使用字典列表创建分层数据框的主要内容,如果未能解决你的问题,请参考以下文章

Python:如何从熊猫数据框创建字典? [复制]

如何使用嵌套字典列表展平熊猫数据框中的列

如何将分层列表排序为字典的树/pyrimid 模型?

如何在python中使用pandas将字典列表转换为数据框[重复]

如何使用分层索引保存和检索 Pandas 数据帧?

如何使用一个键将多个列表值创建到python中的字典中?