熊猫中的元素异或

Posted

技术标签:

【中文标题】熊猫中的元素异或【英文标题】:Element-wise XOR in pandas 【发布时间】:2015-11-21 03:39:17 【问题描述】:

我知道逻辑 AND 是 &,逻辑 OR 是 |在 Pandas 系列中,但我一直在寻找元素逻辑 XOR。我想我可以用 AND 和 OR 来表达它,但如果有可用的 XOR,我更喜欢使用 XOR。

谢谢!

【问题讨论】:

【参考方案1】:

Python 异或:a ^ b

Numpy logical XOR:np.logical_xor(a,b)

测试性能 - 结果相等:

1.大小为 10000 的随机布尔序列

In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)

In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

2。大小为 1000 的随机布尔序列

In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)

In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

3.大小为 100 的随机布尔序列

In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)

In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop

In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop

4.大小为 10 的随机布尔序列

In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)

In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop

In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop

【讨论】:

python xor 运算符 ^ 被 numpy 库重载以在内部执行 numpy.logical_xor。所以读者应该注意,这些性能测试结果是相同的,因为它们是相同的。

以上是关于熊猫中的元素异或的主要内容,如果未能解决你的问题,请参考以下文章

替换熊猫索引对象中的字符串元素

如何迭代熊猫数据框列中的元素?

替换熊猫数据框中的列表元素

按多列分组并将dict元素的中值作为熊猫中的新列

如何将每行列表中的元素与熊猫匹配

熊猫计算列中的元素并以重复的方式显示