从列表创建组合,如果子字符串到分隔符字符位于列表项的 1 个以上的子元素中,则从列表中删除

Posted

技术标签:

【中文标题】从列表创建组合,如果子字符串到分隔符字符位于列表项的 1 个以上的子元素中,则从列表中删除【英文标题】:Create combinations from list and remove if substring to delimiter characters is in more than 1 subelement of a list item 【发布时间】:2019-05-30 05:15:10 【问题描述】:

我有一个列表,我使用 itertools.combinations 创建所有组合。每个列表项中的元素都可以由字符串“:”分隔。我需要删除在超过 1 个元素中多次出现相同匹配子字符串的列表项。字符串中的字符直到“:”(用于正则表达式匹配的分隔符???)需要检查列表项中的每个子元素。或者,有没有更好的方法?

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]
outputList = list(combinations(inList,3))
outputList

我得到了这个结果:

[(['TEST1: sub1'], ['TEST1: sub2']),
 (['TEST1: sub1'], ['TEST1: sub3']),
 (['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TEST1: sub3']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['TESTING FOR FUN: random text x2']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

但我想删除子字符串匹配子元素的位置,直到分隔符“:”。

检查子元素在列表项的其他子元素中出现 >1 次后的所需输出:

(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

*注意到列表中的前 2 个项目在所需的输出中被删除了吗? (这适用于 ": " 之前出现的子字符串与字符串长度无关的其他情况。

【问题讨论】:

您的代码说 3 个长度组合,但您提供的示例/输出不匹配,每个元素只有 2 个长度组合。肯定有更好的方法,但我不确定您需要哪种方法。你能澄清一下吗? @Paritosh Singh 你是个天才!很好,我复制+粘贴的 3 长度在不同的期望结果测试中是不正确的。你是对的。我测试了两种长度并得到了预期的结果。这正是我想要的,非常感谢! 【参考方案1】:

如果所需的输出是正确的,那么您可以将其分解为三个单独的步骤:

首先,分隔符代表键值关系,所以你可以在做任何其他操作之前使用字典将具有相同键的数据分组。

其次,取尽可能多的n长度的数据与不同key的组合。

最后,对于这些组合中的每一个,使用 itertools 产品来获取组合中所有可能的对。

from itertools import combinations, product
from collections import defaultdict

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]


inDict = defaultdict(list)
for lst in inList:
    key = lst[0].partition(':')[0]
    inDict[key].append(lst)

print(inDict)
#Output:
defaultdict(list,
            'TEST1': [['TEST1: sub1'], ['TEST1: sub2'], ['TEST1: sub3']],
             'TESTING FOR FUN': [['TESTING FOR FUN: randomtext'],
              ['TESTING FOR FUN: random text x2']],
             'ABC123': [['ABC123: dog']])


temp = combinations(inDict.values(), 2) #2 length pairs from all dict values. change the number here as needed
result = []
for group in temp:
    result.extend(product(*group)) #calculate all products for each pair of lists. 

print(result)
#Output:
[(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

【讨论】:

你能解释一下result.extend(product(*group))中的星号(*)吗? @Chris 是的,这就是所谓的解包操作符。我正在使用它来解压缩包含 2 个列表的元组,然后再将其传递给产品。它有效地删除了元组,因此类似于product(group[0], group[1]),但能够动态解包传递的尽可能多的参数。您可能希望在线阅读更多内容,例如 here 或其他一些资源。

以上是关于从列表创建组合,如果子字符串到分隔符字符位于列表项的 1 个以上的子元素中,则从列表中删除的主要内容,如果未能解决你的问题,请参考以下文章

使用 jquery 从数组列表中添加和删除数组项

PYTHON 如何同时创建多个空列表

从包含以逗号分隔的数字的字符串创建列表;蟒蛇 3

将数据从片段列表视图项传递到活动字符串变量 onitemclicklistener

当我的模式只包含一个组时,为啥 re.findall 会返回一个元组列表?

从逗号分隔的字符串 [XML/XSL] 创建选择下拉列表