合并字典列表以删除所有重复项
Posted
技术标签:
【中文标题】合并字典列表以删除所有重复项【英文标题】:Merging list of dictionaries to remove all duplicates 【发布时间】:2021-12-03 10:22:37 【问题描述】:我正在尝试获取一个简单的 Python 代码来将字典列表合并到一个精简列表中,因为我有很多重复的 atm。
从这里:
[
"module": "RECEIPT BISCUITS",
"product_range": "ULKER BISCUITS",
"receipt_category": "BISCUITS"
,
"module": "RECEIPT BISCUITS",
"product_range": "ULKER",
"receipt_category": "BISCUITS"
,
"module": "RECEIPT BISCUITS",
"product_range": "ULKER BISCUITS GOLD",
"receipt_category": "BISCUITS GOLD"
,
"module": "RECEIPT COFFEE",
"product_range": "BLACK GOLD",
"receipt_category": "BLACK GOLD"
]
到这里:
[
"module": "RECEIPT BISCUITS",
"product_range": ["ULKER BISCUITS", "ULKER"],
"receipt_category": ["BISCUITS", "BISCUITS GOLD"]
,
"module": "RECEIPT COFFEE",
"product_range": ["BLACK GOLD"],
"receipt_category": ["BLACK GOLD"]
]
模块用于在它们之间进行排序的位置,其他 2 将存储为列表,即使只有一个值。顺便说一句,这是 JSON 格式。
【问题讨论】:
当你尝试这样做时发生了什么? 这个博客有一些例子:geeksforgeeks.org/python-merging-two-list-of-dictionaries @mkrieger1 老实说不知道该怎么做,大约一周前我才开始使用 Python,将数据放到第一部分的位置真是个奇迹! 【参考方案1】:collections.defaultdict
拯救您的数据重组需求!
import collections
data = [
"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS", "receipt_category": "BISCUITS",
"module": "RECEIPT BISCUITS", "product_range": "ULKER", "receipt_category": "BISCUITS",
"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS GOLD", "receipt_category": "BISCUITS GOLD",
"module": "RECEIPT COFFEE", "product_range": "BLACK GOLD", "receipt_category": "BLACK GOLD",
]
grouped = collections.defaultdict(lambda: collections.defaultdict(list))
group_key = "module"
for datum in data:
datum = datum.copy() # Copy so we can .pop without consequence
group = datum.pop(group_key) # Get the key (`module` value)
for key, value in datum.items(): # Loop over the rest and put them in the group
grouped[group][key].append(value)
collated = [
group_key: group,
**values,
for (group, values) in grouped.items()
]
print(collated)
打印出来
[
'module': 'RECEIPT BISCUITS', 'product_range': ['ULKER BISCUITS', 'ULKER', 'ULKER BISCUITS GOLD'], 'receipt_category': ['BISCUITS', 'BISCUITS', 'BISCUITS GOLD'],
'module': 'RECEIPT COFFEE', 'product_range': ['BLACK GOLD'], 'receipt_category': ['BLACK GOLD']
]
请注意,这不会对 product_range
中的值进行重复数据删除,因为我不确定值的顺序对您是否重要,以及是否使用集合(不保留顺序)。
将list
更改为set
并将append
更改为add
将使值唯一。
【讨论】:
以上是关于合并字典列表以删除所有重复项的主要内容,如果未能解决你的问题,请参考以下文章