Summary of Indexing operation in DataFrame of Pandas

Posted johnyang

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Summary of Indexing operation in DataFrame of Pandas相关的知识,希望对你有一定的参考价值。

Summary of Indexing operation in DataFrame of Pandas

For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.

import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(16).reshape(4,4),index=[‘Ohio‘,‘Colorado‘,‘Utah‘,‘New York‘],columns=[‘one‘,‘two‘,‘three‘,‘four‘]);df
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15

(1) df[val]

  • when val is a number,df[val] selects single column from DataFrame,returnning Series type.
df[‘one‘]
Ohio         0
Colorado     4
Utah         8
New York    12
Name: one, dtype: int32
  • when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.
df[[‘one‘,‘two‘]]
one two
Ohio 0 1
Colorado 4 5
Utah 8 9
New York 12 13
  • when val is :num, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.
df[:2]
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
df.iloc[:2] # the same with above
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
df[1:3]
one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
df.iloc[1:3]
one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
  • when val is boolean DataFrame, df[val] sets values based on boolean
df<5
one two three four
Ohio True True True True
Colorado True False False False
Utah False False False False
New York False False False False
df[df<5]
one two three four
Ohio 0.0 1.0 2.0 3.0
Colorado 4.0 NaN NaN NaN
Utah NaN NaN NaN NaN
New York NaN NaN NaN NaN
df[df<5]=0;df
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15

(2)df.loc[val]

  • when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.
df.loc[‘Colorado‘]
one      0
two      5
three    6
four     7
Name: Colorado, dtype: int32
df.loc[[‘Colorado‘,‘New York‘]]
one two three four
Colorado 0 5 6 7
New York 12 13 14 15

(3)df.loc[:,val]

  • when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.
df.loc[:,‘two‘]
Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32
df.loc[:,[‘two‘]] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.
two
Ohio 0
Colorado 5
Utah 9
New York 13
df.loc[:,[‘one‘,‘two‘]]
one two
Ohio 0 0
Colorado 0 5
Utah 8 9
New York 12 13
df[[‘one‘,‘two‘]] # The same with above df.loc[:,[‘one‘,‘two‘]]
one two
Ohio 0 0
Colorado 0 5
Utah 8 9
New York 12 13

(3)df.loc[val1,val2]

  • when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.
df.loc[‘Ohio‘,‘one‘]
0
df.loc[[‘Ohio‘,‘Utah‘],‘one‘]
Ohio    0
Utah    8
Name: one, dtype: int32
df.loc[‘Ohio‘,[‘one‘,‘two‘]]
one    0
two    0
Name: Ohio, dtype: int32
df.loc[[‘Ohio‘,‘Utah‘],[‘one‘,‘two‘]]
one two
Ohio 0 0
Utah 8 9
df.loc[:,:]
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
df.loc[‘Ohio‘,:]
one      0
two      0
three    0
four     0
Name: Ohio, dtype: int32
df.loc[:,‘two‘]
Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32
df.loc[:,[‘one‘,‘two‘]]
one two
Ohio 0 0
Colorado 0 5
Utah 8 9
New York 12 13

(4) df.iloc[val]

  • Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc
df.iloc[1]
one      0
two      5
three    6
four     7
Name: Colorado, dtype: int32
df.iloc[[1,3]]
one two three four
Colorado 0 5 6 7
New York 12 13 14 15

(5)df.iloc[:,val]

  • The same with df.loc,except that val shall be integer or list of integers.
df
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
df.iloc[:,1]
Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32
df.iloc[:,[1,3]]
two four
Ohio 0 0
Colorado 5 7
Utah 9 11
New York 13 15

(6)df.iloc[val1,val2]

  • The same with df.loc,except val1 and val2 shall be integer or list of integers
df.iloc[1,2]
6
df.iloc[1,[1,2,3]]
two      5
three    6
four     7
Name: Colorado, dtype: int32
df.iloc[[1,2],2]
Colorado     6
Utah        10
Name: three, dtype: int32
df.iloc[[1,2],[1,2]]
two three
Colorado 5 6
Utah 9 10
df.iloc[:,[1,2]]
two three
Ohio 0 0
Colorado 5 6
Utah 9 10
New York 13 14
df.iloc[[1,2],:]
one two three four
Colorado 0 5 6 7
Utah 8 9 10 11

(7)df.at[val1,val2]

  • val1 shall be a single index value,val2 shall be a single column value.
df.at[‘Utah‘,‘one‘]
8
df.loc[‘Utah‘,‘one‘] # The same with above
8
df.at[[‘Utah‘,‘Colorado‘],‘one‘] # Raise exception
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

D:Anacondalibsite-packagespandascoreframe.py in _get_value(self, index, col, takeable)
   2538         try:
-> 2539             return engine.get_value(series._values, index)
   2540         except (TypeError, ValueError):


pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_value()


pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_value()


pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()


TypeError: ‘[‘Utah‘, ‘Colorado‘]‘ is an invalid key


During handling of the above exception, another exception occurred:


TypeError                                 Traceback (most recent call last)

<ipython-input-77-c52a9db91739> in <module>()
----> 1 df.at[[‘Utah‘,‘Colorado‘],‘one‘]


D:Anacondalibsite-packagespandascoreindexing.py in __getitem__(self, key)
   2140 
   2141         key = self._convert_key(key)
-> 2142         return self.obj._get_value(*key, takeable=self._takeable)
   2143 
   2144     def __setitem__(self, key, value):


D:Anacondalibsite-packagespandascoreframe.py in _get_value(self, index, col, takeable)
   2543             # use positional
   2544             col = self.columns.get_loc(col)
-> 2545             index = self.index.get_loc(index)
   2546             return self._get_value(index, col, takeable=True)
   2547     _get_value.__doc__ = get_value.__doc__


D:Anacondalibsite-packagespandascoreindexesase.py in get_loc(self, key, method, tolerance)
   3076                                  ‘backfill or nearest lookups‘)
   3077             try:
-> 3078                 return self._engine.get_loc(key)
   3079             except KeyError:
   3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))


pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()


TypeError: ‘[‘Utah‘, ‘Colorado‘]‘ is an invalid key

(8) df.iat[val1,val2]

  • The same with df.at,except val1 and val2 shall be both integer
df.iat[2,2]
10
df
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15

Conclusion

  • val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
  • Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
  • df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
  • df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]

以上是关于Summary of Indexing operation in DataFrame of Pandas的主要内容,如果未能解决你的问题,请参考以下文章

Linux丨Ubuntu SUMMARY OF LESS COMMANDS

A Summary of Big Data Management

Summary of Windows SubSystem for Linux

summary of week

Summary of Critical and Exploitable iOS Vulnerabilities in 2016

Restructure output of R summary function