如果它们共享任何键值对，如何合并来自不同列表的多个字典？

Posted 2023-02-25

技术标签:

【中文标题】如果它们共享任何键值对，如何合并来自不同列表的多个字典？【英文标题】：How to merge multiple dictionaries from separate lists if they share any key-value pairs? 【发布时间】：2015-06-23 23:30:00 【问题描述】：

如果多个列表中的字典共享一个共同的键值对，如何组合它们？

例如，这里有三个字典列表：

l1 = ['fruit':'banana','category':'B','fruit':'apple','category':'A']
l2 = ['type':'new','category':'A','type':'old','category':'B']
l3 = ['order':'2','type':'old','order':'1','type':'new']

期望的结果：

l = ['fruit':'apple','category':'A','order':'1','type':'new','fruit':'banana','category':'B','order':'2','type':'old']

棘手的部分是我希望这个函数只接受列表作为参数而不是键，因为我只想插入任意数量的字典列表而不关心哪些键名是重叠的（在这种情况下，将所有三个组合在一起的键名是“类别”和“类型”）。

我应该注意索引应该无关紧要，因为它应该只基于公共元素。

这是我的尝试：

def combine_lists(*args):
    base_list = args[0]
    L = []
    for sublist in args[1:]:
        L.extend(sublist)
    for D in base_list:
        for Dict in L:
            if any([tup in Dict.items() for tup in D.items()]): 
                D.update(Dict)
    return base_list

【问题讨论】：

l1 和 l3 之间没有共同对。单个列表中的所有字典都具有相同的键。有保障吗？是的，这是故意的，因为 l1 中的字典应该能够通过 l2 中的字典与 l3 中的字典匹配（例如 'fruit':'banana','category':'B ' 与 'order':'2','type':'old' 合并为 'type':'old','category':'B' 连接它们）。假设单个列表中的所有字典都具有相同的键。我建议您严格检查您的算法和数据结构，这样您就不必处理这样的怪事了。这让我想起了union-find and connected components in a graph -based algorithms。虽然我没有考虑它们是否适合这种情况。另请参阅Replace list of list with “condensed” list of list while maintaining order ..特别是condense_sets() function。注意：dict.view*() 方法返回支持某些集合操作的对象。 【参考方案1】：

对于这个问题，将字典视为元组列表很方便：

In [4]: 'fruit':'apple','category':'A'.items()
Out[4]: [('category', 'A'), ('fruit', 'apple')]

由于我们希望连接共享一个键值对的字典，我们可以将每个元组作为图中的节点，元组对作为边。一旦你有一个图表问题被简化为查找图的连通分量。

使用networkx，

import itertools as IT
import networkx as nx

l1 = ['fruit':'apple','category':'A','fruit':'banana','category':'B']
l2 = ['type':'new','category':'A','type':'old','category':'B']
l3 = ['order':'1','type':'new','order':'2','type':'old']

data = [l1, l2, l3]
G = nx.Graph()
for dct in IT.chain.from_iterable(data):
    items = list(dct.items())
    node1 = node1[0]
    for node2 in items:
        G.add_edge(node1, node22)

for cc in nx.connected_component_subgraphs(G):
    print(dict(IT.chain.from_iterable(cc.edges())))

产量

'category': 'A', 'fruit': 'apple', 'type': 'new', 'order': '1'
'category': 'B', 'fruit': 'banana', 'type': 'old', 'order': '2'

如果你想删除 networkx 依赖，你可以使用，例如，pillmuncher's implementation:

import itertools as IT

def connected_components(neighbors):
    """
    https://***.com/a/13837045/190597 (pillmuncher)
    """
    seen = set()
    def component(node):
        nodes = set([node])
        while nodes:
            node = nodes.pop()
            seen.add(node)
            nodes |= neighbors[node] - seen
            yield node
    for node in neighbors:
        if node not in seen:
            yield component(node)

l1 = ['fruit':'apple','category':'A','fruit':'banana','category':'B']
l2 = ['type':'new','category':'A','type':'old','category':'B']
l3 = ['order':'1','type':'new','order':'2','type':'old']

data = [l1, l2, l3]
G = 
for dct in IT.chain.from_iterable(data):
    items = dct.items()
    node1 = items[0]
    for node2 in items[1:]:
        G.setdefault(node1, set()).add(node2)
        G.setdefault(node2, set()).add(node1)

for cc in connected_components(G):
    print(dict(cc))

打印与上面相同的结果。

【讨论】：

以上是关于如果它们共享任何键值对，如何合并来自不同列表的多个字典？的主要内容，如果未能解决你的问题，请参考以下文章