在字典列表中查找重复值
Posted
技术标签:
【中文标题】在字典列表中查找重复值【英文标题】:Find duplicate values in list of dictionaries 【发布时间】:2022-01-22 15:58:23 【问题描述】:我需要在列表中找到具有相同键值的字典,并创建一个新列表,其中仅保留第一个字典。
示例列表:
lst_in = ['First': 1, 'Second': 4, 'First': 2, 'Second': 5, 'First': 3, 'Second': 4]
要迭代的重复键值应该是“第二个”。所以在这个例子中,第一个和第三个字典是相同的。 我试过查看Find duplicates in python list of dictionaries 和python list of dictionaries find duplicates based on value,但找不到确切的答案。我只看一个键值。字典将始终具有相同的键。
预期输出:
lst_out = ['First': 1, 'Second': 4, 'First': 2, 'Second': 5]
【问题讨论】:
您的示例具有三个具有相同键集和不同值的字典。我可以看到产生 1 个输出 dict,但为什么有 2 个? 第二本词典也是独一无二的。如果另一个字典已经具有相同的键值,我只需要删除一个字典。 【参考方案1】:一些解决方案和基准。
解决方案
使用 dict 很有趣,向前获取顺序,向后获取第一个值。
lst_out = list(d['Second']: d
for s in [1, -1]
for d in lst_in[::s].values())
或者使用setdefault
来跟踪每个值的第一个字典:
tmp =
for d in lst_in:
tmp.setdefault(d['Second'], d)
lst_out = list(tmp.values())
有趣且可能更快的版本:
add = .setdefault
for d in lst_in:
add(d['Second'], d)
lst_out = list(add.__self__.values())
基准测试
包含 100 个不同 Second
值的 1000 个字典列表的时间(使用 Python 3.10.0):
361 μs 362 μs 364 μs dict_forward_backward
295 μs 297 μs 297 μs dict_setdefault
231 μs 231 μs 232 μs dict_setdefault_optimized
196 μs 196 μs 197 μs set_in_list_comprehension
190 μs 190 μs 190 μs set_in_list_comprehension_optimized
191 μs 191 μs 191 μs set_in_list_comprehension_optimized_2
201 μs 201 μs 201 μs set_with_loop
1747 μs 1751 μs 1774 μs with_lists
基准代码:
from timeit import repeat, default_timer as timer
from random import choices
lst_in = ['First': i, 'Second': v
for i, v in enumerate(choices(range(100), k=1000))]
def dict_forward_backward(lst_in):
return list(d['Second']: d
for s in [1, -1]
for d in lst_in[::s].values())
def dict_setdefault(lst_in):
tmp =
for d in lst_in:
tmp.setdefault(d['Second'], d)
return list(tmp.values())
def dict_setdefault_optimized(lst_in):
add = .setdefault
for d in lst_in:
add(d['Second'], d)
return list(add.__self__.values())
def set_in_list_comprehension(lst_in):
return [s.add(v) or d
for s in [set()]
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_in_list_comprehension_optimized(lst_in):
return [add(v) or d
for s in [set()]
for add in [s.add]
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_in_list_comprehension_optimized_2(lst_in):
s = set()
add = s.add
return [add(v) or d
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_with_loop(lst_in):
found = set()
lst_out = []
for dct in lst_in:
if dct['Second'] not in found:
lst_out.append(dct)
found.add( dct['Second'] )
return lst_out
def with_lists(lst_in):
out = 'keep':[], 'counter':[]
for dct in lst_in:
if dct['Second'] not in out['counter']:
out['keep'].append(dct)
out['counter'].append(dct['Second'])
return out['keep']
funcs = [
dict_forward_backward,
dict_setdefault,
dict_setdefault_optimized,
set_in_list_comprehension,
set_in_list_comprehension_optimized,
set_in_list_comprehension_optimized_2,
set_with_loop,
with_lists,
]
# Correctness
expect = funcs[0](lst_in)
for func in funcs[1:]:
result = func(lst_in)
print(result == expect, func.__name__)
print()
# Speed
for _ in range(3):
for func in funcs:
ts = sorted(repeat(lambda: func(lst_in), 'gc.enable(); gc.collect()', number=1000))[:3]
print(*('%4d μs ' % (t * 1e3) for t in ts), func.__name__)
print()
【讨论】:
@ManlaiA 我希望你的意思是我的第二个解决方案,第一个真的更有趣:-)。不知道你的意思是应该这样使用......【参考方案2】:套装非常适合那些“我已经看过了吗?”问题。
lst_in = ['First': 1, 'Second': 4, 'First': 2, 'Second': 5, 'First': 3, 'Second': 4]
found = set()
lst_out = []
for dct in lst_in:
if dct['Second'] not in found:
lst_out.append(dct)
found.add( dct['Second'] )
【讨论】:
【参考方案3】:您可以跟踪 Second
值并在将字典添加到最终列表时检查它:
lst_in = ['First': 1, 'Second': 4, 'First': 2, 'Second': 5, 'First': 3, 'Second': 4]
out = 'keep':[], 'counter':[]
for dct in lst_in:
if dct['Second'] not in out['counter']:
out['keep'].append(dct)
out['counter'].append(dct['Second'])
print(out['keep'])
输出:
['First': 1, 'Second': 4, 'First': 2, 'Second': 5]
【讨论】:
以上是关于在字典列表中查找重复值的主要内容,如果未能解决你的问题,请参考以下文章
根据关于通缉的字典的不完整信息在字典列表中查找字典 [重复]