删除 Python 列表中与条件匹配的前 N 个项目

Posted 2023-02-15

技术标签:

【中文标题】删除 Python 列表中与条件匹配的前 N 个项目【英文标题】：Remove the first N items that match a condition in a Python list 【发布时间】：2017-01-27 13:12:09 【问题描述】：

如果我有一个函数 matchCondition(x)，如何删除 Python 列表中符合该条件的第一个 n 项？

一种解决方案是遍历每个项目，将其标记为删除（例如，将其设置为None），然后使用推导式过滤列表。这需要对列表进行两次迭代并改变数据。有没有更惯用或更有效的方法来做到这一点？

n = 3

def condition(x):
    return x < 5

data = [1, 10, 2, 9, 3, 8, 4, 7]
out = do_remove(data, n, condition)
print(out)  # [10, 9, 8, 4, 7] (1, 2, and 3 are removed, 4 remains)

【问题讨论】：

【参考方案1】：

从Python 3.8 开始，并引入assignment expressions (PEP 572)（:= 运算符），我们可以在列表解析中使用和递增变量：

# items = [1, 10, 2, 9, 3, 8, 4, 7]
total = 0
[x for x in items if not (x < 5 and (total := total + 1) <= 3)]
# [10, 9, 8, 4, 7]

这个：

将变量 total 初始化为 0，这将表示列表理解中先前匹配的出现次数检查每个项目，如果它同时：匹配排除条件 (x < 5) 如果我们丢弃的项目数量没有超过我们想要过滤掉的数量：通过赋值表达式增加total (total := total + 1) 同时比较 total 的新值与要丢弃的最大项目数 (3)

【讨论】：

【参考方案2】：

使用列表推导：

n = 3
data = [1, 10, 2, 9, 3, 8, 4, 7]
count = 0
def counter(x):
    global count
    count += 1
    return x

def condition(x):
    return x < 5

filtered = [counter(x) for x in data if count < n and condition(x)]

由于布尔短路，这也将在找到 n 个元素后停止检查条件。

【讨论】：

不需要counter函数，Python已经内置了：filtered = (x for i, x in enumerate(data) if i > n or condition(x)) 这不太行，因为enumerate 将遍历索引，但这需要跟踪已经满足条件的元素的数量。【参考方案3】：

简单的 Python：

N = 3
data = [1, 10, 2, 9, 3, 8, 4, 7]

def matchCondition(x):
    return x < 5

c = 1
l = []
for x in data:
    if c > N or not matchCondition(x):
        l.append(x)
    else:
        c += 1

print(l)

如果需要，这可以很容易地变成一个生成器：

def filter_first(n, func, iterable):
    c = 1
    for x in iterable:
        if c > n or not func(x):
            yield x
        else:
            c += 1

print(list(filter_first(N, matchCondition, data)))

【讨论】：

【参考方案4】：

使用itertools.filterfalse 和itertools.count 的一种方式：

from itertools import count, filterfalse

data = [1, 10, 2, 9, 3, 8, 4, 7]
output = filterfalse(lambda L, c=count(): L < 5 and next(c) < 3, data)

然后list(output)，给你：

[10, 9, 8, 4, 7]

【讨论】：

@wcarroll for python 2.x 它是ifilterfalse @JonClements 出于好奇，在 lambda 函数签名中使用关键字参数（即 c=count()) 是在 lambda 表达式中创建局部变量的首选方式吗？ @wcarroll 这不是特别令人愉快 - 但对于这样的事情，它会保持相关的范围...... 如果我们不必在超过最大掉落数后每次都继续检查 [first] 条件，那就太好了。我从未听说过filterfalse - 为什么使用它而不是带有否定条件的内置filter（在本例中为L >= 5 or next(c) >= 3）？ filterfalse 的存在不是打破了 Python 的黄金法则“做任何事只有一种正确的方法”吗？【参考方案5】：

接受的答案对我来说有点太神奇了。这是一个希望流程更清晰的地方：

def matchCondition(x):
    return x < 5


def my_gen(L, drop_condition, max_drops=3):
    count = 0
    iterator = iter(L)
    for element in iterator:
        if drop_condition(element):
            count += 1
            if count >= max_drops:
                break
        else:
            yield element
    yield from iterator


example = [1, 10, 2, 9, 3, 8, 4, 7]

print(list(my_gen(example, drop_condition=matchCondition)))

这与davidism 答案中的逻辑类似，但我们不是在每一步都检查是否超出了丢弃计数，而是将循环的其余部分短路。

注意：如果您没有可用的yield from，只需在iterator 中的其余项目上用另一个for 循环替换它。

【讨论】：

【参考方案6】：

编写一个生成器，它接受迭代、条件和要丢弃的数量。遍历数据并产生不满足条件的项目。如果满足条件，则增加一个计数器并且不产生该值。一旦计数器达到您想要丢弃的数量，请始终产出物品。

def iter_drop_n(data, condition, drop):
    dropped = 0

    for item in data:
        if dropped >= drop:
            yield item
            continue

        if condition(item):
            dropped += 1
            continue

        yield item

data = [1, 10, 2, 9, 3, 8, 4, 7]
out = list(iter_drop_n(data, lambda x: x < 5, 3))

这不需要额外的列表副本，只迭代列表一次，并且只为每个项目调用一次条件。除非您真的想查看整个列表，否则请不要对结果调用 list 并直接遍历返回的生成器。

【讨论】：

【参考方案7】：

如果需要突变：

def do_remove(ls, N, predicate):
    i, delete_count, l = 0, 0, len(ls)
    while i < l and delete_count < N:
        if predicate(ls[i]):
           ls.pop(i) # remove item at i
           delete_count, l = delete_count + 1, l - 1 
        else:
           i += 1
    return ls # for convenience

assert(do_remove(l, N, matchCondition) == [10, 9, 8, 4, 7])

【讨论】：

请注意，这种方法的复杂度为 O(N * len(ls))，远非最优。

以上是关于删除 Python 列表中与条件匹配的前 N 个项目的主要内容，如果未能解决你的问题，请参考以下文章

如何在 Python 中读取和删除文件中的前 n 行 - 优雅的解决方案 [重复]

Redshift 在多个条件下加入，但仅在一个条件不匹配时返回

在 mongodb 中，如何仅选择文档中与 find() 条件匹配的对象？

计算一个表中与另一表中的条件匹配的记录

删除目录树中与regex匹配的文件

删除文件中与模式不匹配的行

删除 Python 列表中与条件匹配的前 N ​​个项目

删除 Python 列表中与条件匹配的前 N 个项目