带和不带括号的熊猫逻辑和运算符产生不同的结果[重复]
Posted
技术标签:
【中文标题】带和不带括号的熊猫逻辑和运算符产生不同的结果[重复]【英文标题】:pandas logical and operator with and without brackets produces different results [duplicate] 【发布时间】:2017-07-09 08:17:13 【问题描述】:我刚刚注意到这一点:
df[df.condition1 & df.condition2]
df[(df.condition1) & (df.condition2)]
为什么这两行的输出不同?
我无法分享确切的数据,但我会尽量提供详细信息:
df[df.col1 == False & df.col2.isnull()] # returns 33 rows and the rule `df.col2.isnull()` is not in effect
df[(df.col1 == False) & (df.col2.isnull())] # returns 29 rows and both conditions are applied correctly
感谢@jezrael 和@ayhan,这就是发生的事情,让我使用@jezael 提供的示例:
df = pd.DataFrame('col1':[True, False, False, False],
'col2':[4, np.nan, np.nan, 1])
print (df)
col1 col2
0 True 4.0
1 False NaN
2 False NaN
3 False 1.0
如果我们看一下第 3 行:
col1 col2
3 False 1.0
以及我写条件的方式:
df.col1 == False & df.col2.isnull() # is equivalent to False == False & False
因为&
符号的优先级高于==
,所以不带括号的False == False & False
相当于:
False == (False & False)
print(False == (False & False)) # prints True
带括号:
print((False == False) & False) # prints False
我认为用数字来说明这个问题会更容易一些:
print(5 == 5 & 1) # prints False, because 5 & 1 returns 1 and 5==1 returns False
print(5 == (5 & 1)) # prints False, same reason as above
print((5 == 5) & 1) # prints 1, because 5 == 5 returns True, and True & 1 returns 1
所以吸取的教训:总是加括号!!!
我希望我可以将答案点分成@jezrael 和@ayhan :(
【问题讨论】:
【参考方案1】:df[condition1 & condition2]
和 df[(condition1) & (condition2)]
之间没有区别。当您编写表达式并且运算符 &
优先时,就会出现差异:
df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=list('abc'))
df
Out:
a b c
0 5 0 3
1 3 7 9
2 3 5 2
3 4 7 6
4 8 8 1
condition1 = df['a'] > 3
condition2 = df['b'] < 5
df[condition1 & condition2]
Out:
a b c
0 5 0 3
df[(condition1) & (condition2)]
Out:
a b c
0 5 0 3
但是,如果你这样输入,你会看到一个错误:
df[df['a'] > 3 & df['b'] < 5]
Traceback (most recent call last):
File "<ipython-input-7-9d4fd21246ca>", line 1, in <module>
df[df['a'] > 3 & df['b'] < 5]
File "/home/ayhan/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 892, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
这是因为首先评估 3 & df['b']
(这对应于您的示例中的 False & df.col2.isnull()
)。所以你需要将括号中的条件分组:
df[(df['a'] > 3) & (df['b'] < 5)]
Out[8]:
a b c
0 5 0 3
【讨论】:
【参考方案2】:你是对的,这是不同的,我认为运营商的优先级存在问题 - 检查docs:
df = pd.DataFrame('col1':[True, False, False, False],
'col2':[4, np.nan, np.nan, 1])
print (df)
col1 col2
0 True 4.0
1 False NaN
2 False NaN
3 False 1.0
# operator & precedence
print (df[df.col1 == False & df.col2.isnull()])
col1 col2
1 False NaN
2 False NaN
3 False 1.0
# operator == precedence bacause in brackets
print (df[(df.col1 == False) & (df.col2.isnull())])
col1 col2
1 False NaN
2 False NaN
似乎我在docs - 6.16 中找到了它。 &
的运算符优先级比 ==
更高:
Operator Description
lambda Lambda expression
if – else Conditional expression
or Boolean OR
and Boolean AND
not x Boolean NOT
in, not in, is, is not, Comparisons, including membership tests
<, <=, >, >=, !=, == and identity tests
| Bitwise OR
^ Bitwise XOR
& Bitwise AND
(expressions...), [expressions...], Binding or tuple display, list display,
key: value..., expressions... dictionary display, set display
【讨论】:
那么哪个算子优先,我还是一头雾水。The following table summarizes the operator precedence in Python, from lowest precedence (least binding) to highest precedence (most binding).
所以我认为&
的优先级高于==
。请参阅更新的 OP以上是关于带和不带括号的熊猫逻辑和运算符产生不同的结果[重复]的主要内容,如果未能解决你的问题,请参考以下文章