如何从字符串列表中检索部分匹配项

Posted 2023-02-27

技术标签:

【中文标题】如何从字符串列表中检索部分匹配项【英文标题】：How to retrieve partial matches from a list of strings 【发布时间】：2021-01-15 12:08:10 【问题描述】：

有关在 numeric 列表中检索部分匹配项的方法，请访问：

How to return a subset of a list that matches a condition?

Python: Find in list

但是，如果您正在寻找如何检索 字符串 列表的部分匹配项，您会在下面的答案中找到简要说明的最佳方法。

SO: Python list lookup with partial match 显示如何返回 bool，如果 list 包含部分匹配（例如 begins、ends 或 contains）某个字符串的元素。但是您如何返回元素本身，而不是True 或False

示例：

l = ['ones', 'twos', 'threes']
wanted = 'three'

这里，链接问题中的方法将返回True 使用：

any(s.startswith(wanted) for s in l)

那么你怎么能返回元素'threes'呢？

【问题讨论】：

【参考方案1】： startswith 和 in，返回一个布尔值。 in 运算符是对成员资格的测试。这可以通过list-comprehension 或filter 执行。 使用list-comprehension 和in 是测试过的最快实现。 如果大小写不是问题，请考虑将所有单词映射为小写。 l = list(map(str.lower, l))。 使用 python 3.10.0 测试

`filter`:

使用filter 创建一个filter 对象，因此list() 用于显示list 中的所有匹配值。

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = list(filter(lambda x: x.startswith(wanted), l))

# using in
result = list(filter(lambda x: wanted in x, l))

print(result)
[out]:
['threes']

`list-comprehension`

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = [v for v in l if v.startswith(wanted)]

# using in
result = [v for v in l if wanted in v]

print(result)
[out]:
['threes']

哪个实现更快？

在 Jupyter 实验室中使用来自 nltk v3.6.5 的 words 语料库进行测试，该语料库有 236736 个单词带有'three'的单词

['three', 'threefold', 'threefolded', 'threefoldedness', 'threefoldly', 'threefoldness', 'threeling', 'threeness', 'threepence', 'threepenny', 'threepennyworth', 'threescore', 'threesome']

from nltk.corpus import words

%timeit list(filter(lambda x: x.startswith(wanted), words.words()))
[out]:
64.8 ms ± 856 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(filter(lambda x: wanted in x, words.words()))
[out]:
54.8 ms ± 528 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [v for v in words.words() if v.startswith(wanted)]
[out]:
57.5 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [v for v in words.words() if wanted in v]
[out]:
50.2 ms ± 791 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

【讨论】：

【参考方案2】：

不用返回any()函数的结果，你可以使用for循环来查找字符串：

def find_match(string_list, wanted):
    for string in string_list:
        if string.startswith(wanted):
            return string
    return None

>>> find_match(['ones', 'twos', 'threes'], "three")
'threes'

【讨论】：

【参考方案3】：

简单直接的回答：

test_list = ['one', 'two','threefour']
r = [s for s in test_list if s.startswith('three')]
print(r[0] if r else 'nomatch')

结果：

threefour

不确定在不匹配的情况下要做什么。 r[0] 正是您所要求的是否匹配，但如果没有匹配则未定义。 print 处理此问题，但您可能希望以不同的方式处理。

【讨论】：

【参考方案4】：

我想说最密切相关的解决方案是使用next 而不是any：

>>> next((s for s in l if s.startswith(wanted)), 'mydefault')
'threes'
>>> next((s for s in l if s.startswith('blarg')), 'mydefault')
'mydefault'

就像any 一样，它会在找到匹配项后立即停止搜索，并且只占用 O(1) 空间。与列表理解解决方案不同，它总是处理整个列表并占用 O(n) 空间。

哦，或者直接使用any，但请记住最后检查的元素：

>>> if any((match := s).startswith(wanted) for s in l):
        print(match)

threes
>>> if any((match := s).startswith('blarg') for s in l):
        print(match)

>>>

另一种变体，只分配匹配元素：

>>> if any(s.startswith(wanted) and (match := s) for s in l):
        print(match)

threes

（如果匹配的 s 可能是空字符串，则可能需要包含类似 or True 的内容。）

【讨论】：

【参考方案5】：

这对我来说似乎很简单，所以我可能读错了，但您可以通过带有 if 语句的 foo 循环运行它；

l = ['ones', 'twos', 'threes']
wanted = 'three'

def run():
    for s in l:
        if (s.startswith(wanted)):
            return s

print(run())

输出： threes

【讨论】：

以上是关于如何从字符串列表中检索部分匹配项的主要内容，如果未能解决你的问题，请参考以下文章

如何从字符串列表中检索部分匹配项

示例：

filter:

list-comprehension

哪个实现更快？

`filter`:

`list-comprehension`