python中Pandas之DataFrame索引选取数据

Posted 2023-03-29 赵孝正

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python中Pandas之DataFrame索引选取数据相关的知识，希望对你有一定的参考价值。

1.索引是什么

1.1 认识索引

先创建一个简单的DataFrame。

myList = [['a', 10, 1.1],
	  ['b', 20, 2.2],
	  ['c', 30, 3.3],
	  ['d', 40, 4.4]]  
df1 = pd.DataFrame(data = myList)
print(df1)
--------------------------------
[out]:
   0   1    2
0  a  10  1.1
1  b  20  2.2
2  c  30  3.3
3  d  40  4.4

DataFrame中有两种索引：

行索引(index)：对应最左边那一竖列
列索引(columns)：对应最上面那一横行

两种索引默认均为从0开始的自增整数。

# 输出行索引
print(df1.index)
[out]:
RangeIndex(start=0, stop=4, step=1)
---------------------------------------
# 输出列索引
print(df1.columns)
[out]:
RangeIndex(start=0, stop=3, step=1)
---------------------------------------
# 输出所有的值
print(df1.values)
[out]:
array([['a', 10, 1.1],
       ['b', 20, 2.2],
       ['c', 30, 3.3],
       ['d', 40, 4.4]], dtype=object)

1.2 自定义索引

可以使用 index 这个参数指定行索引，columns 这个参数指定列索引。

df2 = pd.DataFrame(myList, 
		           index = ['one', 'two', 'three', 'four'], 
		           columns = ['char', 'int', 'float'])
print(df2)
-----------------------------------------------------------
[out]:
      char  int  float
one      a   10    1.1
two      b   20    2.2
three    c   30    3.3
four     d   40    4.4

输出此时的行索引和列索引：

# 输出行索引
print(df2.index)
[out]:
Index(['one', 'two', 'three', 'four'], dtype='object')
--------------------------------------------------------
# 输出列索引
print(df2.columns)
[out]:
Index(['char', 'int', 'float'], dtype='object')

2. 索引的简单使用

2.1 列索引

选择一列：

print(df2['char'])
print(df2.char)
# 两种方式输出一样
[out]:
one      a
two      b
three    c
four     d
Name: char, dtype: object

注意此时方括号里面只传入一个字符串 ’char’，这样选出来的一列，结果的类型为 Series

type(df2['char'])
[out]: pandas.core.series.Series

选择多列：

print(df2[['char', 'int']])
[out]: 
      char   int
one      a   10
two      b   20
three    c   30
four     d   40

注意此时方括号里面传入一个列表 [‘char’, ‘int’]，选出的结果类型为 DataFrame。
如果只想选出来一列，却想返回 DataFrame 类型怎么办？

print(df2[['char']])
[out]:
      char
one      a
two      b
three    c
four     d
---------------------------------------
type(df2[['char']])
[out]：pandas.core.frame.DataFrame

注意直接使用 df2[0] 取某一列会报错，除非columns是由下标索引组成的，比如df1那个样子，df1[0] 就不会报错。

print(df1[0])
[out]:
0    a
1    b
2    c
3    d
Name: 0, dtype: object
-----------------------
print(df2[0])
[out]: 
KeyError: 0

2.1.2 使用loc和iloc

df = dat_df.iloc[:, [0, 2, 3, 4]]  #选择所有行，并选择第0，2，3，4列，列名可以为其它字符串

2.2 行索引

2.2.1 使用[ : ]

区别于选取列，此种方式 [ ] 中不再单独的传入一个字符串，而是需要使用冒号切片。

选取行标签从 ’two’ 到 ’three’ 的多行数据

print(df2['two': 'three'])
[out]:
      char  int  float
two      b   20    2.2
three    c   30    3.3
# dataframe格式
# 也可以直接用数字

选取行标签为 ’two’ 这一行数据

# 此时返回的类型为DataFrame
print(df2['two': 'two'])
[out]:
      char  int  float
two      b   20    2.2

在 [ ] 中不仅可以传入行标签，还可以传入行的编号。

选取从第1行到第3行的数据(编号从0开始)

print(df2[1:4])
[out]:
      char  int  float
two      b   20    2.2
three    c   30    3.3
four     d   40    4.4
# dataframe格式

可以看到选取的数据是不包含方括号最右侧的编号所对应的数据的。

选取第1行的数据

print(df2[1:2])
[out]:
    char  int  float
two    b   20    2.2

2.2.2 使用.loc()和.iloc()

区别就是 .loc() 是根据行索引和列索引的值来选取数据，而 .iloc() 是根据从 0 开始的下标位置来进行索引的。

选取行：
1. 使用.loc()

print(df2.loc['one'])
[out]:
char       a
int       10
float    1.1
Name: one, dtype: object
-------------------------------------------
print(df2.loc[['one', 'three']])
[out]:
      char  int  float
one      a   10    1.1
three    c   30    3.3
-------------------------------------------
df2.loc['one': 'three']
Out[14]: 
      char  int  float
one      a   10    1.1
two      b   20    2.2
three    c   30    3.3

2. 使用.iloc()

print(df2.iloc[0])
[out]:
char       a
int       10
float    1.1
Name: one, dtype: object
-------------------------------------------
print(df2.iloc[[0, 2]])
[out]:
      char  int  float
one      a   10    1.1
three    c   30    3.3
-------------------------------------------
df2.iloc[1: 3]
Out[18]: 
      char  int  float
two      b   20    2.2
three    c   30    3.3

3. 根据列条件，选取dataframe数据框中的数据

# 选取等于某些值的行记录 用 == 

df.loc[df['column_name'] == some_value]

# 选取某列是否是某一类型的数值 用 isin

df.loc[df['column_name'].isin(some_values)]

# 多种条件的选取 用 &

df.loc[(df['column'] == some_value) & df['other_column'].isin(some_values)]

# 选取不等于某些值的行记录 用 ！=

df.loc[df['column_name'] != some_value]

# isin返回一系列的数值,如果要选择不符合这个条件的数值使用~

df.loc[~df['column_name'].isin(some_values)]

4. 根据列条件，获取行索引号并转成列表

在dataframe中根据一定的条件，得到符合要求的某些行元素所在的位置

import pandas as pd
df = pd.DataFrame('BoolCol': [1, 2, 3, 3, 4],'attr': [22, 33, 22, 44, 66],  
       index=[10,20,30,40,50])  
print(df)  
a = df[(df.BoolCol==3)&(df.attr==22)].index.tolist()  
print(a)

输出：

  BoolCol  attr  
10        1    22  
20        2    33  
30        3    22  
40        3    44  
50        4    66  
[30]

注意：
df[(df.BoolCol==3)&(df.attr==22)].index 返回的是 index 对象列表，需转换为普通列表格式时用 tolist() 方法

参考链接
[1] Pandas中DataFrame索引、选取数据 2020.3

以上是关于python中Pandas之DataFrame索引选取数据的主要内容，如果未能解决你的问题，请参考以下文章