如何使用字典列表创建分层数据框
Posted
技术标签:
【中文标题】如何使用字典列表创建分层数据框【英文标题】:How to create hierarchical data frame using list of dictionaries 【发布时间】:2021-01-26 10:24:39 【问题描述】:我有以下要使用 python 展平的字典列表。数据最初来自 xero,如下所示:
这是我使用 API 提取的示例数据:
my_dict = ['RowType': 'Section', 'Title': 'Income', 'Rows': [],'RowType': 'Section', 'Title': 'Income from Rents', 'Rows': [],
'RowType': 'Section',
'Title': 'Rent Received',
'Rows': ['RowType': 'Row',
'Cells': ['Value': 'Contract Rent',
'Attributes': ['Value': '5',
'Id': 'account',
'Value': '5', 'Id': 'groupID'],
'Value': '721093.92',
'Attributes': ['Value': '5',
'Id': 'account',
'Value': '5', 'Id': 'groupID']],
'RowType': 'Row',
'Cells': ['Value': 'Rent - Carparks',
'Attributes': ['Value': '95',
'Id': 'account'],
'Value': '3523.33',
'Attributes': ['Value': '95',
'Id': 'account']],
'RowType': 'Row',
'Cells': ['Value': 'Vacant Tenancies',
'Attributes': ['Value': '53',
'Id': 'account'],
'Value': '-22226.50',
'Attributes': ['Value': '53',
'Id': 'account']],
'RowType': 'SummaryRow',
'Cells': ['Value': 'Total Rent Received', 'Value': '702390.75']],
'RowType': 'Section',
'Title': 'Rent Reductions',
'Rows': ['RowType': 'Row',
'Cells': ['Value': 'COVID-19 Rent reduction',
'Attributes': ['Value': '40',
'Id': 'account'],
'Value': '-132478.03',
'Attributes': ['Value': '40',
'Id': 'account']],
'RowType': 'Row',
'Cells': ['Value': 'Rent Holiday',
'Attributes': ['Value': '4d',
'Id': 'account'],
'Value': '-14451.58',
'Attributes': ['Value': '4d',
'Id': 'account']],
'RowType': 'SummaryRow',
'Cells': ['Value': 'Total Rent Reductions', 'Value': '-146929.61']]]
想要的输出如下:
Name Amount Hierarchy_level_3 Hierarchy_level_1 Hierarchy_level_2
0 Contract Rent 721093.92 Rent Received Income Income from Rents
1 Rent - Carparks 3523.33 Rent Receive Income Income from Rents
2 Vacant Tenancies -22226.50 Rent Received Income Income from Rents
3 Total Rent Received 702390.75
4 COVID-19 Rent reduction -132478.03 Rent Reduction Income Income from Rents
. . . . . .
. . . . . .
谁能帮我解决这个问题?这里的示例数据是我从 api 获得的格式。不知道如何展平这个文件。我对 Python 比较陌生。
【问题讨论】:
离题: 谨防在 SO(即互联网)上发布公司受限/机密数据。它可能(或可能不会)导致您与雇主之间的问题。当然,您发布的数据可能完全是您自己的财产,我不知道(?)。您可以用虚构的数据替换数字,用一些任意但仍然相关的名称替换键(单元格名称)。 ID 已加密,我的个人数据也已加密。这应该不是问题。 【参考方案1】:假设您示例中的4
行的Hierarchy_level_3
是Rent Received
而不是Rent Reduction
,并且您的示例中具有4 级层次结构,这是一个解决方案。我添加了级别编号和级别名称,因为我认为这些可能比“层次结构级别”更有用,但可以随意删除
import pandas as pd
hierarchy = f'Hierarchy_level_i+1': d['Title'] for i, d in enumerate(my_dict)
all_data = []
for level, d in enumerate(my_dict):
for row in d['Rows']:
cells = row['Cells']
all_data.append(
'Name': cells[0]['Value'],
'Amount': cells[1]['Value'],
'Level': level,
'Level_name': hierarchy[f'Hierarchy_level_level+1'],
**hierarchy
)
df = pd.DataFrame(all_data)
输出:
Name Amount Level Level_name Hierarchy_level_1 Hierarchy_level_2 Hierarchy_level_3 Hierarchy_level_4
0 Contract Rent 721093.92 2 Rent Received Income Income from Rents Rent Received Rent Reductions
1 Rent - Carparks 3523.33 2 Rent Received Income Income from Rents Rent Received Rent Reductions
2 Vacant Tenancies -22226.50 2 Rent Received Income Income from Rents Rent Received Rent Reductions
3 Total Rent Received 702390.75 2 Rent Received Income Income from Rents Rent Received Rent Reductions
4 COVID-19 Rent reduction -132478.03 3 Rent Reductions Income Income from Rents Rent Received Rent Reductions
5 Rent Holiday -14451.58 3 Rent Reductions Income Income from Rents Rent Received Rent Reductions
6 Total Rent Reductions -146929.61 3 Rent Reductions Income Income from Rents Rent Received Rent Reductions
--- 编辑 由于只需要 3 级层次结构:
import pandas as pd
hierarchy = f'Hierarchy_level_i+1': d['Title'] for i, d in enumerate(my_dict)
all_data = []
for level, d in enumerate(my_dict):
for row in d['Rows']:
cells = row['Cells']
all_data.append(
'Name': cells[0]['Value'],
'Amount': cells[1]['Value'],
'Hierarchy_level_1': hierarchy[f'Hierarchy_level_1'],
'Hierarchy_level_2': hierarchy[f'Hierarchy_level_2'],
'Hierarchy_level_3': hierarchy[f'Hierarchy_level_level+1'],
)
df = pd.DataFrame(all_data)
输出:
Name Amount Hierarchy_level_1 Hierarchy_level_2 Hierarchy_level_3
0 Contract Rent 721093.92 Income Income from Rents Rent Received
1 Rent - Carparks 3523.33 Income Income from Rents Rent Received
2 Vacant Tenancies -22226.50 Income Income from Rents Rent Received
3 Total Rent Received 702390.75 Income Income from Rents Rent Received
4 COVID-19 Rent reduction -132478.03 Income Income from Rents Rent Reductions
5 Rent Holiday -14451.58 Income Income from Rents Rent Reductions
6 Total Rent Reductions -146929.61 Income Income from Rents Rent Reductions
【讨论】:
非常感谢您的帮助。这真的很有帮助。不过,只有一个问题,减租应该是hierarchy_level_3 而不是level hierarchy_level_4。一般来说,最多应该只有三个层次结构。 您可以在**hierarchy
之后添加"Hierarchy_level_3": hierarchy[f'Hierarchy_level_level+1']
,这将使用正确的级别覆盖级别3
天哪,对此我感激不尽。从字面上看,我花了好几个小时来解决这个问题。多谢了。 :)
给你一个问题,如果我想在数据框中显示帐户 ID,我该如何包含帐户 ID?你能帮忙吗? `
可以加'AccountId': cells[0].get('Attributes', ['Value': 'N/A'])[0]['Value']
以上是关于如何使用字典列表创建分层数据框的主要内容,如果未能解决你的问题,请参考以下文章