从列表中删除元组删除了一些但不是全部

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了从列表中删除元组删除了一些但不是全部相关的知识,希望对你有一定的参考价值。

我必须遗漏一些非常明显的东西。

我有一个元组列表,它们是(短语,数字)对。我想从我的停用词列表中删除包含包含停用词的短语的整个元组。

stopwords = ['for', 'with', 'and', 'in', 'on', 'down']
tup_list = [('faucet', 5185), ('kitchen', 2719), ('faucets', 2628),
            ('kitchen faucet', 1511), ('shower', 1471), ('bathroom', 1131),
            ('handle', 1048), ('for', 1035), ('cheap', 960), ('bronze', 807),
            ('tub', 797), ('sale', 771), ('sink', 762), ('with', 696),
            ('single', 620), ('kitchen faucets', 615), ('stainless faucet', 613),
            ('pull', 603), ('and', 477), ('in', 447), ('single handle', 430),
            ('for sale', 406), ('bathroom faucet', 392), ('on', 369),
            ('down', 363), ('head', 359), ('pull down', 357), ('wall', 351),
            ('faucet with', 350)]

for p,n in tup_list:
    print('p', p, p.split(), any(phrase in stopwords for phrase in p.split()))

print(len(tup_list))
for p,n in tup_list:
    if any(phrase in stopwords for phrase in p.split()):
        tup_list.remove((p,n))
        print('Removing', p)
print(len(tup_list))

print([item for item in tup_list if item[0] == 'in'])

当我运行上面的内容时,我得到以下打印输出:

p faucet ['faucet'] False
p kitchen ['kitchen'] False
p faucets ['faucets'] False
p kitchen faucet ['kitchen', 'faucet'] False
p shower ['shower'] False
p bathroom ['bathroom'] False
p handle ['handle'] False
p for ['for'] True
p cheap ['cheap'] False
p bronze ['bronze'] False
p tub ['tub'] False
p sale ['sale'] False
p sink ['sink'] False
p with ['with'] True
p single ['single'] False
p kitchen faucets ['kitchen', 'faucets'] False
p stainless faucet ['stainless', 'faucet'] False
p pull ['pull'] False
p and ['and'] True
p in ['in'] True
p single handle ['single', 'handle'] False
p for sale ['for', 'sale'] True
p bathroom faucet ['bathroom', 'faucet'] False
p on ['on'] True
p down ['down'] True
p head ['head'] False
p pull down ['pull', 'down'] True
p wall ['wall'] False
p faucet with ['faucet', 'with'] True
29
Removing for
Removing with
Removing and
Removing for sale
Removing on
Removing pull down
Removing faucet with
22
[('in', 447)]

我的问题:为什么包含('in', 447)的元组不会被删除?打印输出显示p in ['in'] True意思是'in'在停用词列表中,那么为什么tup_list.remove((p,n))不会删除它?

答案

从列表中删除项目时,索引会更改。当您迭代更改的列表时,您将看到意外的结果。

这是一个解决方案。它不是最有效的,但可能适合您的需求。

remove_indices = []

for i, (p, n) in enumerate(tup_list):
    if any(phrase in stopwords for phrase in p.split()):
        remove_indices.append(i)
        print('Removing', p)

tup_list = [i for j, i in enumerate(tup_list) if j not in remove_indices]

以上是关于从列表中删除元组删除了一些但不是全部的主要内容,如果未能解决你的问题,请参考以下文章

如何根据元组的索引值从列表中删除重复的元组,同时保持元组的顺序? [复制]

table.remove 删除某些元素,但不是全部

从列表中的元组中删除空字符串

从列表中删除 nil - Erlang

从列表视图中删除一行后刷新片段

用另一个列表替换主活动中的列表并从视图中删除旧列表