篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Summary of Indexing operation in DataFrame of Pandas相关的知识,希望对你有一定的参考价值。
Summary of Indexing operation in DataFrame of Pandas
For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.
when val is a number,df[val] selects single column from DataFrame,returnning Series type.
df[‘one‘]
Ohio 0
Colorado 4
Utah 8
New York 12
Name: one, dtype: int32
when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.
df[[‘one‘,‘two‘]]
one
two
Ohio
0
1
Colorado
4
5
Utah
8
9
New York
12
13
when val is :num, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.
df[:2]
one
two
three
four
Ohio
0
1
2
3
Colorado
4
5
6
7
df.iloc[:2] # the same with above
one
two
three
four
Ohio
0
1
2
3
Colorado
4
5
6
7
df[1:3]
one
two
three
four
Colorado
4
5
6
7
Utah
8
9
10
11
df.iloc[1:3]
one
two
three
four
Colorado
4
5
6
7
Utah
8
9
10
11
when val is boolean DataFrame, df[val] sets values based on boolean
df<5
one
two
three
four
Ohio
True
True
True
True
Colorado
True
False
False
False
Utah
False
False
False
False
New York
False
False
False
False
df[df<5]
one
two
three
four
Ohio
0.0
1.0
2.0
3.0
Colorado
4.0
NaN
NaN
NaN
Utah
NaN
NaN
NaN
NaN
New York
NaN
NaN
NaN
NaN
df[df<5]=0;df
one
two
three
four
Ohio
0
0
0
0
Colorado
0
5
6
7
Utah
8
9
10
11
New York
12
13
14
15
(2)df.loc[val]
when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.
df.loc[‘Colorado‘]
one 0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.loc[[‘Colorado‘,‘New York‘]]
one
two
three
four
Colorado
0
5
6
7
New York
12
13
14
15
(3)df.loc[:,val]
when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.
df.loc[:,‘two‘]
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,[‘two‘]] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.
two
Ohio
0
Colorado
5
Utah
9
New York
13
df.loc[:,[‘one‘,‘two‘]]
one
two
Ohio
0
0
Colorado
0
5
Utah
8
9
New York
12
13
df[[‘one‘,‘two‘]] # The same with above df.loc[:,[‘one‘,‘two‘]]
one
two
Ohio
0
0
Colorado
0
5
Utah
8
9
New York
12
13
(3)df.loc[val1,val2]
when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.
df.loc[‘Ohio‘,‘one‘]
0
df.loc[[‘Ohio‘,‘Utah‘],‘one‘]
Ohio 0
Utah 8
Name: one, dtype: int32
df.loc[‘Ohio‘,[‘one‘,‘two‘]]
one 0
two 0
Name: Ohio, dtype: int32
df.loc[[‘Ohio‘,‘Utah‘],[‘one‘,‘two‘]]
one
two
Ohio
0
0
Utah
8
9
df.loc[:,:]
one
two
three
four
Ohio
0
0
0
0
Colorado
0
5
6
7
Utah
8
9
10
11
New York
12
13
14
15
df.loc[‘Ohio‘,:]
one 0
two 0
three 0
four 0
Name: Ohio, dtype: int32
df.loc[:,‘two‘]
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,[‘one‘,‘two‘]]
one
two
Ohio
0
0
Colorado
0
5
Utah
8
9
New York
12
13
(4) df.iloc[val]
Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc
df.iloc[1]
one 0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,3]]
one
two
three
four
Colorado
0
5
6
7
New York
12
13
14
15
(5)df.iloc[:,val]
The same with df.loc,except that val shall be integer or list of integers.
df
one
two
three
four
Ohio
0
0
0
0
Colorado
0
5
6
7
Utah
8
9
10
11
New York
12
13
14
15
df.iloc[:,1]
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.iloc[:,[1,3]]
two
four
Ohio
0
0
Colorado
5
7
Utah
9
11
New York
13
15
(6)df.iloc[val1,val2]
The same with df.loc,except val1 and val2 shall be integer or list of integers
df.iloc[1,2]
6
df.iloc[1,[1,2,3]]
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,2],2]
Colorado 6
Utah 10
Name: three, dtype: int32
df.iloc[[1,2],[1,2]]
two
three
Colorado
5
6
Utah
9
10
df.iloc[:,[1,2]]
two
three
Ohio
0
0
Colorado
5
6
Utah
9
10
New York
13
14
df.iloc[[1,2],:]
one
two
three
four
Colorado
0
5
6
7
Utah
8
9
10
11
(7)df.at[val1,val2]
val1 shall be a single index value,val2 shall be a single column value.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
D:Anacondalibsite-packagespandascoreframe.py in _get_value(self, index, col, takeable)
2538 try:
-> 2539 return engine.get_value(series._values, index)
2540 except (TypeError, ValueError):
pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: ‘[‘Utah‘, ‘Colorado‘]‘ is an invalid key
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-77-c52a9db91739> in <module>()
----> 1 df.at[[‘Utah‘,‘Colorado‘],‘one‘]
D:Anacondalibsite-packagespandascoreindexing.py in __getitem__(self, key)
2140
2141 key = self._convert_key(key)
-> 2142 return self.obj._get_value(*key, takeable=self._takeable)
2143
2144 def __setitem__(self, key, value):
D:Anacondalibsite-packagespandascoreframe.py in _get_value(self, index, col, takeable)
2543 # use positional
2544 col = self.columns.get_loc(col)
-> 2545 index = self.index.get_loc(index)
2546 return self._get_value(index, col, takeable=True)
2547 _get_value.__doc__ = get_value.__doc__
D:Anacondalibsite-packagespandascoreindexesase.py in get_loc(self, key, method, tolerance)
3076 ‘backfill or nearest lookups‘)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: ‘[‘Utah‘, ‘Colorado‘]‘ is an invalid key
(8) df.iat[val1,val2]
The same with df.at,except val1 and val2 shall be both integer
df.iat[2,2]
10
df
one
two
three
four
Ohio
0
0
0
0
Colorado
0
5
6
7
Utah
8
9
10
11
New York
12
13
14
15
Conclusion
val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]
以上是关于Summary of Indexing operation in DataFrame of Pandas的主要内容,如果未能解决你的问题,请参考以下文章