关于panda中dataframe的与&运算*(stackoverflow高票答案)

Posted rvin

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了关于panda中dataframe的与&运算*(stackoverflow高票答案)相关的知识,希望对你有一定的参考价值。

85 down vote favorite

31

What explains the difference in behavior of boolean and bitwise operations on lists vs numpy.arrays?

I‘m getting confused about the appropriate use of the ‘&‘ vs ‘and‘ in python, illustrated in the following simple examples.

    mylist1 = [True,  True,  True,  False,  True]
    mylist2 = [False, True, False,  True, False]  

    >>> len(mylist1) == len(mylist2)
    True

    # ---- Example 1 ----
    >>>mylist1 and mylist2 
    [False, True, False, True, False]
    #I am confused: I would have expected [False, True, False, False, False]

    # ---- Example 2 ----
    >>>mylist1 & mylist2 
    *** TypeError: unsupported operand type(s) for &: ‘list‘ and ‘list‘
    #I am confused: Why not just like example 1? 

    # ---- Example 3 ----
    >>>import numpy as np

    >>> np.array(mylist1) and np.array(mylist2) 
    *** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    #I am confused: Why not just like Example 4? 

     # ---- Example 4 ----
    >>> np.array(mylist1) & np.array(mylist2) 
    array([False,  True, False, False, False], dtype=bool)
    #This is the output I was expecting! 

This answer, and this answer both helped me understand that ‘and‘ is a boolean operation but ‘&‘ is a bitwise operation.

I was reading some information to better understand the concept of bitwise operations, but I am struggling to use that information to make sense of my above 4 examples.

Note, in my particular situation, my desired output is a newlist where:

    len(newlist) == len(mylist1) 
    newlist[i] == (mylist1[i] and mylist2[i]) #for every element of newlist

Example 4, above, led me to my desired output, so that is fine.

But I am left feeling confused about when/how/why I should use ‘and‘ vs ‘&‘. Why do lists and numpy arrays behave differently with these operators?

Can anyone help me understand the difference between boolean and bitwise operations to explain why they handle lists and numpy.arrays differently?

I just want to make sure I continue to use these operations correctly going forward. Thanks a lot for the help!

Numpy version 1.7.1

python 2.7

References all inline with text.

EDITS

1) Thanks @delnan for pointing out that in my original examples I had am ambiguity that was masking my deeper confusion. I have updated my examples to clarify my question.

  • 4
    Example 1 only appears to give the correct output. It actually just returns the second list unaltered. Try some other lists, in particular anything where the second list contains a True in a position that‘s False in the first list: Boolean logic dictates a False output at that position, but you‘ll get a True. – user395760 Mar 25 ‘14 at 21:22
  •  
    @delnan Thanks for noticing the ambiguity in my examples. I have updated my examples to highlight my confusion and focus on the aspect of this behavior that I do not understand. I‘m clearly missing something important, because I did not expect the output of Example 1. – rysqui Mar 25 ‘14 at 21:37
  • 2
    In Numpy there‘s np.bitwise_and() and np.logical_and() and friends to avoid confusion. – Dietrich Mar 25 ‘14 at 21:54
  •  
    In example 1, mylist1 and mylist2 does not output the same result as mylist2 and mylist1, since what is being returned is the second list as pointed out by delnan. – user2015487 Feb 16 ‘16 at 17:58
  • 1
    Possible duplicate of Python: Boolean operators vs Bitwise operators – Oliver Ni Nov 6 ‘16 at 16:09
up vote 72 down vote accepted

and tests whether both expressions are logically True while & (when used with True/False values) tests if both are True.

In Python, empty built-in objects are typically treated as logically False while non-empty built-ins are logically True. This facilitates the common use case where you want to do something if a list is empty and something else if the list is not. Note that this means that the list [False] is logically True:

>>> if [False]:
...    print ‘True‘
...
True

So in Example 1, the first list is non-empty and therefore logically True, so the truth value of the and is the same as that of the second list. (In our case, the second list is non-empty and therefore logically True, but identifying that would require an unnecessary step of calculation.)

For example 2, lists cannot meaningfully be combined in a bitwise fashion because they can contain arbitrary unlike elements. Things that can be combined bitwise include: Trues and Falses, integers.

NumPy objects, by contrast, support vectorized calculations. That is, they let you perform the same operations on multiple pieces of data.

Example 3 fails because NumPy arrays (of length > 1) have no truth value as this prevents vector-based logic confusion.

Example 4 is simply a vectorized bit and operation.

Bottom Line

  • If you are not dealing with arrays and are not performing math manipulations of integers, you probably want and.

  • If you have vectors of truth values that you wish to combine, use numpy with &.

以上是关于关于panda中dataframe的与&运算*(stackoverflow高票答案)的主要内容,如果未能解决你的问题,请参考以下文章

Pandas的DataFrame & Series详解

将 Pandas tseries 对象转换为 DataFrame

如何输出满足特殊条件的 Pandas DataFrame?

如何将 Cassandra Map 转换为 Pandas Dataframe

在 Pandas 中使用条件列表过滤 DataFrame

Using iloc, loc, & ix to select rows and columns in Pandas DataFrames