如何有效地删除列表列表中的连续重复项？

Posted 2023-02-25

技术标签:

【中文标题】如何有效地删除列表列表中的连续重复项？【英文标题】：How to delete consecutive duplicates in a list of lists efficiently? 【发布时间】：2019-08-13 05:12:27 【问题描述】：

我有一个嵌套列表：

l = [['GILTI', 'was', 'intended', 'to','to', 'stifle', 'multinationals'. 'was'],
    ['like' ,'technology', 'and', 'and','pharmaceutical', 'companies', 'like']]

如何在不使用 set 或其他类似操作的情况下检测两个连续元素并删除一个？这应该是所需的输出：

l = [['GILTI', 'was', 'intended','to', 'stifle', 'multinationals'. 'was'],
    ['like' ,'technology', 'and','pharmaceutical', 'companies', 'like']]

我尝试像这样使用 itertools groupby：

from itertools import groupby  
[i[0] for i in groupby(l)]

还有一个有序的字典：

from collections import OrderedDict

temp_lis = []
for x in l:
    temp_lis.append(list(OrderedDict.fromkeys(x)))
temp_lis

出来：

[['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals'],
 ['like', 'technology', 'and', 'pharmaceutical', 'companies']]

第二种解决方案可能看起来效果很好。但是，这是错误的，因为它正在删除不连续的重复元素（例如 was 和 like）。如何获得上述期望的输出？

【问题讨论】：

【参考方案1】：

你可以像这样使用groupby：

[[k for k, g in groupby(x)] for x in l]

如果有多个重复的连续元素，这将保留一个。

如果您需要完全删除重复的连续元素，请使用：

[[k for k, g in groupby(x) if len(list(g)) == 1] for x in l]

示例：

from itertools import groupby

l = [['GILTI', 'was', 'intended', 'to','to', 'stifle', 'multinationals', 'was'],
    ['like' ,'technology', 'and', 'and','pharmaceutical', 'companies', 'like']]

print([[k for k, g in groupby(x)] for x in l])

# [['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'],
#  ['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]

【讨论】：

再次感谢您的帮助！更具体的解决方案呢？如果我只是对删除“到”、“到”序列感兴趣怎么办？ @aywoki，所以你不想要两个'to's？是的，我只是好奇如何在这种情况下进行迭代。这个解决方案虽然解决了这个问题【参考方案2】：

自定义生成器解决方案：

def deduped(seq):
    first = True
    for el in seq:
        if first or el != prev:
            yield el
            prev = el
            first = False

[list(deduped(seq)) for seq in l]
# => [['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'], 
#     ['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]

编辑：以前的版本无法处理 None 作为第一个元素。

【讨论】：

prev = object() sentinel 也会解决第一个元素问题【参考方案3】： enumerate() - 方法向一个可迭代对象添加一个计数器，并以枚举对象的形式返回它。

例如

l = [['GILTI', 'was', 'intended','to', 'stifle', 'multinationals','was'],
    ['like' ,'technology', 'and','pharmaceutical', 'companies', 'like']]
result = []

for sublist in l:
    new_list = []
    for index,x in enumerate(sublist):
        #validate current and next element of list is same 
        if len(sublist)-1 >= index+1 and x == sublist[index+1]:
            continue
        #append none consecutive into new list
        new_list.append(x)
    #append list into result list
    result.append(new_list)

print(result)

O/P：

[['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'], 
['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]

【讨论】：

以上是关于如何有效地删除列表列表中的连续重复项？的主要内容，如果未能解决你的问题，请参考以下文章