从 2D 列表中删除连续重复项,python?
Posted
技术标签:
【中文标题】从 2D 列表中删除连续重复项,python?【英文标题】:Remove consecutive duplicates from a 2D list , python? 【发布时间】:2014-04-19 21:15:43 【问题描述】:如何根据特定元素(在本例中为第二个元素)从 2d 列表中删除连续重复项。
我尝试了一些与 itertools 的组合,但没有运气。
谁能建议我如何解决这个问题?
输入
192.168.1.232 >>>>> 173.194.36.64 , 14 , 15 , 16
192.168.1.232 >>>>> 173.194.36.64 , 14 , 15 , 17
192.168.1.232 >>>>> 173.194.36.119 , 23 , 30 , 31
192.168.1.232 >>>>> 173.194.36.98 , 24 , 40 , 41
192.168.1.232 >>>>> 173.194.36.98 , 24 , 40 , 62
192.168.1.232 >>>>> 173.194.36.74 , 25 , 42 , 43
192.168.1.232 >>>>> 173.194.36.74 , 25 , 42 , 65
192.168.1.232 >>>>> 173.194.36.74 , 26 , 44 , 45
192.168.1.232 >>>>> 173.194.36.74 , 26 , 44 , 66
192.168.1.232 >>>>> 173.194.36.78 , 27 , 46 , 47
输出
192.168.1.232 >>>>> 173.194.36.64 , 14 , 15 , 16
192.168.1.232 >>>>> 173.194.36.119 , 23 , 30 , 31
192.168.1.232 >>>>> 173.194.36.98 , 24 , 40 , 41
192.168.1.232 >>>>> 173.194.36.74 , 25 , 42 , 43
192.168.1.232 >>>>> 173.194.36.78 , 27 , 46 , 47
这是预期的输出。
更新
上面给出的是一个打印得很好的列表形式。
实际列表如下所示。
>>> for x in connection_frame:
print x
['192.168.1.232', '173.194.36.64', 14, 15, 16]
['192.168.1.232', '173.194.36.64', 14, 15, 17]
['192.168.1.232', '173.194.36.119', 23, 30, 31]
['192.168.1.232', '173.194.36.98', 24, 40, 41]
['192.168.1.232', '173.194.36.98', 24, 40, 62]
['192.168.1.232', '173.194.36.74', 25, 42, 43]
['192.168.1.232', '173.194.36.74', 25, 42, 65]
['192.168.1.232', '173.194.36.74', 26, 44, 45]
['192.168.1.232', '173.194.36.74', 26, 44, 66]
['192.168.1.232', '173.194.36.78', 27, 46, 47]
['192.168.1.232', '173.194.36.78', 27, 46, 67]
['192.168.1.232', '173.194.36.78', 28, 48, 49]
['192.168.1.232', '173.194.36.78', 28, 48, 68]
['192.168.1.232', '173.194.36.79', 29, 50, 51]
['192.168.1.232', '173.194.36.79', 29, 50, 69]
['192.168.1.232', '173.194.36.119', 32, 52, 53]
['192.168.1.232', '173.194.36.119', 32, 52, 74]
【问题讨论】:
看看itertools.groupby
。
您只需将它们打成一组即可删除重复项
您使用的实际数据类型是什么。这些行是字符串、元组等吗?
我也相信 OP 只希望删除与某些其他元素重复的元素。不仅仅是没有重复。
@thecreator232,如果顺序无关紧要,我们怎么能有有意义的连续条目?
【参考方案1】:
所以因为你想保持顺序并且只弹出连续的条目,我不知道你可以使用任何花哨的内置。所以这里是“蛮力”方法:
>>> remList = []
>>> for i in range(len(connection_frame)):
... if (i != len(connection_frame)-)1 and (connection_frame[i][1] == connection_frame[i+1][1]):
... remList.append(i)
...
for i in remList:
connection_frame.pop(i)
['192.168.1.232', '173.194.36.119', 32, 52, 53]
['192.168.1.232', '173.194.36.79', 29, 50, 51]
['192.168.1.232', '173.194.36.78', 28, 48, 49]
['192.168.1.232', '173.194.36.78', 27, 46, 67]
['192.168.1.232', '173.194.36.78', 27, 46, 47]
['192.168.1.232', '173.194.36.74', 26, 44, 45]
['192.168.1.232', '173.194.36.74', 25, 42, 65]
['192.168.1.232', '173.194.36.74', 25, 42, 43]
['192.168.1.232', '173.194.36.98', 24, 40, 41]
['192.168.1.232', '173.194.36.64', 14, 15, 16]
>>>
>>> for conn in connection_frame:
... print conn
...
['192.168.1.232', '173.194.36.64', 14, 15, 17]
['192.168.1.232', '173.194.36.119', 23, 30, 31]
['192.168.1.232', '173.194.36.98', 24, 40, 62]
['192.168.1.232', '173.194.36.74', 26, 44, 66]
['192.168.1.232', '173.194.36.78', 28, 48, 68]
['192.168.1.232', '173.194.36.79', 29, 50, 69]
['192.168.1.232', '173.194.36.119', 32, 52, 74]
>>>
或者,如果您想通过列表理解一次性完成所有操作:
>>> new_frame = [conn for conn in connection_frame if not connection_frame.index(conn) in [i for i in range(len(connection_frame)) if (i != len(connection_frame)-1) and (connection_frame[i][1] == connection_frame[i+1][1])]]
>>>
>>> for conn in new_frame:
... print conn
...
['192.168.1.232', '173.194.36.64', 14, 15, 17]
['192.168.1.232', '173.194.36.119', 23, 30, 31]
['192.168.1.232', '173.194.36.98', 24, 40, 62]
['192.168.1.232', '173.194.36.74', 26, 44, 66]
['192.168.1.232', '173.194.36.78', 28, 48, 68]
['192.168.1.232', '173.194.36.79', 29, 50, 69]
['192.168.1.232', '173.194.36.119', 32, 52, 74]
【讨论】:
@thecreator232,你需要改变什么?让我知道,以便我可以在这里更新它 'if (connection_frame[i][1] == connection_frame[i+1][1]) and (connection_frame[i][2] == connection_frame[i+1][2] ) : connection_frame.remove(connection_frame[i+1])' @thecreator232 好的,我明白你在做什么。在您迭代列表时,我会小心更改列表,您可能会得到一些奇怪的结果,而且它通常被认为是糟糕的形式,所以我不会在这里更改它 感谢您的提醒。【参考方案2】:使用itertools.groupby()
:
import itertools
data = """192.168.1.232 >>>>> 173.194.36.64 , 14 , 15 , 16
192.168.1.232 >>>>> 173.194.36.64 , 14 , 15 , 17
192.168.1.232 >>>>> 173.194.36.119 , 23 , 30 , 31
192.168.1.232 >>>>> 173.194.36.98 , 24 , 40 , 41
192.168.1.232 >>>>> 173.194.36.98 , 24 , 40 , 62
192.168.1.232 >>>>> 173.194.36.74 , 25 , 42 , 43
192.168.1.232 >>>>> 173.194.36.74 , 25 , 42 , 65
192.168.1.232 >>>>> 173.194.36.74 , 26 , 44 , 45
192.168.1.232 >>>>> 173.194.36.74 , 26 , 44 , 66
192.168.1.232 >>>>> 173.194.36.78 , 27 , 46 , 47""".split("\n")
for k, g in itertools.groupby(data, lambda l:l.split()[2]):
print next(g)
打印出来
192.168.1.232 >>>>> 173.194.36.64 , 14 , 15 , 16
192.168.1.232 >>>>> 173.194.36.119 , 23 , 30 , 31
192.168.1.232 >>>>> 173.194.36.98 , 24 , 40 , 41
192.168.1.232 >>>>> 173.194.36.74 , 25 , 42 , 43
192.168.1.232 >>>>> 173.194.36.78 , 27 , 46 , 47
(这使用了一个字符串列表,但很容易适应列表列表。)
【讨论】:
如果 OP 的数据结构确实是字符串列表,则此方法有效,但他只是表示它可能是字符串列表(可能是[['192.168.1.1', '>>>>', ...], ...]
,这会使答案稍微复杂一些。是的,它绝对会删除不连续的重复项。
@aruisdante:答案末尾有一条关于此的评论(您可能需要重新加载才能看到它)。
是的,看到它,但这将删除不连续的重复,这不是 OP 想要的。
@aruisdante:不,这不会删除不连续的重复项。我在这里错过了什么吗?
@aruisdante:如果data
是列表列表,那么它是相同的:result = (next(g) for _, g in groupby(data, key=lambda x: x[1]))
【参考方案3】:
Pandas.groupby
是itertools.groupby
的替代品,它还允许您跟踪原始列表的连续/非连续元素 --- 通过提供行号而不是迭代器。像这样的:
df = pandas.DataFrame(connection_frame)
print df
Out:
0 1 2 3 4
0 '192.168.1.232' '173.194.36.64' 14 15 16
1 '192.168.1.232' '173.194.36.64' 14 15 17
2 '192.168.1.232' '173.194.36.119' 23 30 31
3 '192.168.1.232' '173.194.36.98' 24 40 41
4 '192.168.1.232' '173.194.36.98' 24 40 62
5 '192.168.1.232' '173.194.36.74' 25 42 43
6 '192.168.1.232' '173.194.36.74' 25 42 65
7 '192.168.1.232' '173.194.36.74' 26 44 45
8 '192.168.1.232' '173.194.36.74' 26 44 66
9 '192.168.1.232' '173.194.36.78' 27 46 47
10 '192.168.1.232' '173.194.36.78' 27 46 67
11 '192.168.1.232' '173.194.36.78' 28 48 49
12 '192.168.1.232' '173.194.36.78' 28 48 68
13 '192.168.1.232' '173.194.36.79' 29 50 51
14 '192.168.1.232' '173.194.36.79' 29 50 69
15 '192.168.1.232' '173.194.36.119' 32 52 53
16 '192.168.1.232' '173.194.36.119' 32 52 74
然后,您可以按第二列对它们进行分组并将组打印为
gps = df.groupby(2).groups
print gps
Out:
' 14': [0, 1],
' 23': [2],
' 24': [3, 4],
' 25': [5, 6],
' 26': [7, 8],
' 27': [9, 10],
' 28': [11, 12],
' 29': [13, 14],
' 32': [15, 16]
查看各个行号?有很多方法可以删除gps
的每个列表中的连续重复项。这是一个:
valid_rows = list()
for g in gps.values():
old_row = g[0]
valid_rows.append(old_row)
for row_id in range(1, len(g)):
new_row = g[row_id]
if new_row - old_row != 1:
valid_rows.append(new_row)
old_row = new_row
print valid_rows
Out: [5, 3, 9, 7, 0, 2, 15, 13, 11]
最后,通过valid_rows
索引pandas DataFrame。
print df.ix[sorted(valid_rows)]
Out:
0 '192.168.1.232' '173.194.36.64' 14 15 16
2 '192.168.1.232' '173.194.36.119' 23 30 31
3 '192.168.1.232' '173.194.36.98' 24 40 41
5 '192.168.1.232' '173.194.36.74' 25 42 43
7 '192.168.1.232' '173.194.36.74' 26 44 45
9 '192.168.1.232' '173.194.36.78' 27 46 47
11 '192.168.1.232' '173.194.36.78' 28 48 49
13 '192.168.1.232' '173.194.36.79' 29 50 51
15 '192.168.1.232' '173.194.36.119' 32 52 53
【讨论】:
以上是关于从 2D 列表中删除连续重复项,python?的主要内容,如果未能解决你的问题,请参考以下文章