如何从 python 字典中删除相同的列（相同的索引元素）？

Posted 2023-03-11

技术标签:

【中文标题】如何从 python 字典中删除相同的列（相同的索引元素）？【英文标题】：How can I delete the same columns(same indexed elements) from a python dictionary? 【发布时间】：2020-01-13 16:14:40 【问题描述】：

假设我有这个：

d = 'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']

我想要这个：

d = 'a': [1, 3, 4], 'b': ['10', '30', '40']

如果我在b 中看到一个空元素，我想删除它，即d["b"][1]，同时删除同一索引处的d["a"][1]。

编辑：忘了提一下，你不能改变任何元素的顺序。

【问题讨论】：

字典中只有两个键还是要概括一下？我想概括一下。你能再解释一下吗？您是否一直在查看 b 的错误值或是否还有其他值？使用 list 不是个好主意 【参考方案1】：

这是一个想法。看起来您将字典视为数据框，因为您正在按索引“连接”列表。

那么，为什么不直接使用库并以简洁高效的方式进行操作呢？

import pandas as pd
df = pd.DataFrame(d)

产量

然后

df[~df.eq('').any(1)]

   a   b
0  1  10
2  3  30
3  4  40

在所有操作之后，如果你需要你的字典：

df.to_dict('list')

'a': [1, 3, 4], 'b': ['10', '30', '40']

【讨论】：

这可能是最好的答案。 Python 中的可用库可以让您的生活更轻松，不妨使用它们。只有两个问题。 1) df[~df.eq('').any(1)] 中的 ~ 是什么 2) 我试过这个并且效果很好，但我得到了一个“FutureWarning: elementwise comparison failed; 返回标量，但在未来将执行元素比较 result = method(y)" (Pandas 0.25.1) @NorbertTóth 没关系！这不是 pandas 的问题，而是 numpy 的问题。当您比较不同的类型时会发生这种情况。没什么好担心的，但详情take a look here @NorbertTóth ~ 作为 pandas 系列的逻辑“非”运算符。【参考方案2】：

一个通用的解决方案：

找出哪些索引是空白的，并将它们放在一个唯一的列表中，以相反的顺序排序循环输入值并删除索引

降序确保如果有多个空白，则删除正确的元素。

d = 'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']

empty_indexes = sorted(i for v in d.values() for i,x in enumerate(v) if not x,reverse=True)

for v in d.values():
    for i in empty_indexes:
        try:
            v.pop(i)
        except IndexError:
            pass

oneliner（灵感来自 cmets 中的 pault）：

dict(zip(d,[list(y) for y in zip(*(x for x in zip(*d.values()) if all(i!="" for i in x)))]))

解密这个：

内部zip 转置值。生成器理解过滤所有元素均为非空的行 (if all(...) 中间的zip 转置回原来的方向压缩键和值会重建字典。没有顺序问题，因为无论 python 的版本如何，键都保证与值的顺序相同。

oneliner 很难阅读，可以循环分解。它不需要索引的排序+唯一性。事实上，它根本不需要索引。

单行免费：

values = []  # init list of values
for y in zip(*d.values()):   # loop on assembled values
    if all(i != "" for i in y):  # filter out rows which contain empty strings
        values.append(y)

# transpose back / convert to list (since zip yields tuples)
values = [list(x) for x in zip(*values)]

# rebuild dictionary. Order of d and values is the same
d = dict(zip(d,values))

【讨论】：

可能，但这就是我的想象。 @Jean-FrançoisFabre 怎么样dict(zip(d.keys(), map(list, zip(*filter(lambda x: x[1], zip(*d.values())))))) 这是一个通用的解决方案吗？无论如何，我都会为“d”删除“d.keys()”。实际上我发布的内容有问题，但我认为有一个基于zip 的方法......可能仅适用于 3.6+，其中 dicts 是按插入顺序排列的。也提供了 non-oneliner。 pault 可能更喜欢将 oneliner 作为注释而不是完整的 python 代码粘贴。不过，你应该已经回答了【参考方案3】：

作为一般解决方案，假设每个列表的大小相同，您可以使用：

def drop_empty(d, key):
    '''
    Drops values from all lists in the dictionary `d` at the
    indices of the list given by `key` that are blank strings.
    '''
    indices = [i for i, v in enumerate(d.get('b')) if v=='']
    for v in d.values():
        for ix in reversed(indices):
            v.pop(ix)
    return d

# test case, drops indices 1 and 4:
d = 'a': [1, 2, 3, 4, 5], 'b': ['10', '', '30', '40', ''], 'c': [0, 0, 1, 1, 2]

drop_empty(d, 'b')
# returns:
'a': [1, 3, 4], 'b': ['10', '30', '40'], 'c': [0, 1, 1]

【讨论】：

为什么要反转索引？在迭代时更改可迭代对象总是很危险的。从头开始关注会减少IndexError 的机会但这不是重点。代码在迭代时没有删除，但如果你不反转，第一个索引是可以的，但后面的所有索引都会移位。【参考方案4】：

d = 'a': [1, 2, 3, 4], 'b': ['10', '', '30', '40']

bad_inds = [ind for ind in range(len(d['a'])) if not d['a'][ind] or not d['b'][ind]]

for ind in bad_inds:
    for value in d.values():
        del value[ind]

输出：

d

>>> 'a': [1, 3, 4], 'b': ['10', '30', '40']

【讨论】：

【参考方案5】：

您可以先获取所有好的索引，然后根据好的索引过滤您的值：

from operator import itemgetter

good_indices = [i for i, v in enumerate(zip(*d.values())) if all(v)]
d = k : [*itemgetter(*good_indices)(v)] for k, v in d.items()

print(d)

输出：

'a': [1, 3, 4], 'b': ['10', '30', '40']

【讨论】：

以上是关于如何从 python 字典中删除相同的列（相同的索引元素）？的主要内容，如果未能解决你的问题，请参考以下文章