本篇博客将会给出大家平时使用pandas的时候经常需要用到的功能代码,同时也会给出运行结果,以帮助大家更进一步的理解。
另外,我也以注释的形式更进一步的补充说明代码的功能及其作用,需要本篇博文中用到的文档文件以及代码的朋友,也可以三连支持一下,并评论留下你的邮箱,我会在看到后的第一时间发送给你。
当然啦,你也可以把本篇博文当作一本小小的pandas书籍,当需要用到pandas哪些知识的时候,Ctrl+F就可以搜索到啦,现在不看的话就先收藏着。
更新的另外一篇,欢迎先来点击收藏:学习pandas全套代码【超详细】分箱操作、分组聚合、时间序列、数据可视化
第一部分:pandas数据结构
import numpy as np
import pandas as pd
pandas的主要数据结构是 Series(⼀维数据)与 DataFrame(二维数据)。
1.1 Series
l = np.array([1,2,3,6,9])
s1 = pd.Series(data = l)
display(l,s1)
array([1, 2, 3, 6, 9])
0 1
1 2
2 3
3 6
4 9
dtype: int64
s2 = pd.Series(data = l,index = list('ABCDE'))
s2
A 1
B 2
C 3
D 6
E 9
dtype: int64
s3 = pd.Series(data = {'A':149,'B':130,'C':118,'D':99,'E':66})
s3
A 149
B 130
C 118
D 99
E 66
dtype: int64
1.2 DataFrame
df1 = pd.DataFrame(data = np.random.randint(0,151,size = (10,3)),
index = list('ABCDEFHIJK'),
columns=['Python','Math','En'],dtype=np.float16)
df1
| Python | Math | En |
---|
A | 113.0 | 37.0 | 70.0 |
---|
B | 92.0 | 22.0 | 11.0 |
---|
C | 0.0 | 9.0 | 66.0 |
---|
D | 40.0 | 145.0 | 23.0 |
---|
E | 25.0 | 133.0 | 108.0 |
---|
F | 124.0 | 16.0 | 130.0 |
---|
H | 121.0 | 85.0 | 133.0 |
---|
I | 84.0 | 125.0 | 39.0 |
---|
J | 111.0 | 36.0 | 137.0 |
---|
K | 55.0 | 26.0 | 85.0 |
---|
df2 = pd.DataFrame(data = {'Python':[66,99,128],'Math':[88,65,137],'En':[100,121,45]})
df2
| Python | Math | En |
---|
0 | 66 | 88 | 100 |
---|
1 | 99 | 65 | 121 |
---|
2 | 128 | 137 | 45 |
---|
第二部分:数据查看
df = pd.DataFrame(data = np.random.randint(0,151,size = (100,3)),
columns=['Python','Math','En'])
df
| Python | Math | En |
---|
0 | 133 | 139 | 141 |
---|
1 | 82 | 17 | 130 |
---|
2 | 51 | 51 | 145 |
---|
3 | 127 | 70 | 11 |
---|
4 | 93 | 60 | 91 |
---|
... | ... | ... | ... |
---|
95 | 57 | 133 | 96 |
---|
96 | 91 | 21 | 134 |
---|
97 | 76 | 109 | 113 |
---|
98 | 99 | 82 | 29 |
---|
99 | 28 | 54 | 88 |
---|
100 rows × 3 columns
df.shape
(100, 3)
df.head(n = 3)
| Python | Math | En |
---|
0 | 133 | 139 | 141 |
---|
1 | 82 | 17 | 130 |
---|
2 | 51 | 51 | 145 |
---|
df.tail()
| Python | Math | En |
---|
95 | 57 | 133 | 96 |
---|
96 | 91 | 21 | 134 |
---|
97 | 76 | 109 | 113 |
---|
98 | 99 | 82 | 29 |
---|
99 | 28 | 54 | 88 |
---|
df.dtypes
Python int64
Math int64
En int64
dtype: object
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Python 100 non-null int64
1 Math 100 non-null int64
2 En 100 non-null int64
dtypes: int64(3)
memory usage: 2.5 KB
df.describe()
| Python | Math | En |
---|
count | 100.000000 | 100.000000 | 100.000000 |
---|
mean | 85.790000 | 77.410000 | 67.630000 |
---|
std | 41.375173 | 44.905309 | 43.883835 |
---|
min | 3.000000 | 0.000000 | 3.000000 |
---|
25% | 54.500000 | 40.250000 | 31.250000 |
---|
50% | 84.500000 | 81.000000 | 58.500000 |
---|
75% | 123.000000 | 113.250000 | 103.000000 |
---|
max | 149.000000 | 149.000000 | 147.000000 |
---|
df.values
array([[133, 139, 141],
[ 82, 17, 130],
[ 51, 51, 145],
[127, 70, 11],
[ 93, 60, 91],
[103, 110, 103],
[ 27, 133, 32],
[148, 99, 128],
[139, 97, 44],
[ 64, 85, 71],
[147, 94, 37],
[114, 12, 16],
[ 16, 54, 44],
[123, 3, 76],
[137, 97, 123],
[149, 113, 74],
[ 69, 38, 7],
[ 68, 122, 4],
[ 53, 13, 47],
[113, 127, 124],
[ 55, 139, 47],
[140, 114, 14],
[ 84, 111, 115],
[ 65, 5, 136],
[ 96, 50, 89],
[145, 130, 15],
[111, 30, 66],
[132, 122, 144],
[ 79, 5, 45],
[115, 29, 49],
[ 27, 55, 83],
[ 29, 74, 38],
[ 87, 100, 45],
[132, 147, 119],
[ 66, 90, 40],
[ 67, 108, 48],
[ 78, 28, 46],
[105, 137, 110],
[132, 119, 55],
[117, 23, 79],
[ 12, 29, 12],
[114, 58, 119],
[139, 0, 42],
[ 61, 69, 142],
[141, 73, 107],
[ 49, 12, 19],
[ 8, 1, 75],
[134, 60, 25],
[138, 80, 79],
[112, 115, 26],
[ 77, 4, 120],
[140, 100, 35],
[ 82, 129, 4],
[100, 8, 25],
[ 77, 97, 78],
[ 55, 113, 53],
[ 45, 73, 37],
[ 44, 0, 80],
[ 26, 74, 52],
[ 99, 75, 147],
[111, 8, 144],
[ 55, 146, 15],
[140, 106, 74],
[ 91, 78, 92],
[130, 108, 41],
[ 34, 41, 136],
[ 3, 139, 4],
[123, 93, 4],
[ 24, 103, 3],
[ 44, 122, 92],
[ 83, 45, 50],
[ 46, 149, 103],
[ 48, 127, 92],
[ 3, 51, 57],
[136, 136, 82],
[ 65, 102, 16],
[ 23, 61, 118],
[138, 15, 6],
[ 83, 91, 4],
[109, 24, 54],
[ 40, 43, 125],
[103, 123, 141],
[116, 113, 38],
[137, 71, 126],
[ 69, 143, 83],
[ 8, 60, 60],
[ 40, 22, 95],
[ 73, 19, 17],
[137, 129, 103],
[109, 142, 94],
[ 85, 105, 10],
[ 97, 107, 19],
[ 79, 12, 27],
[143, 74, 18],
[ 32, 114, 52],
[ 57, 133, 96],
[ 91, 21, 134],
[ 76, 109, 113],
[ 99, 82, 29],
[ 28, 54, 88]])
df.columns
Index(['Python', 'Math', 'En'], dtype='object')
df.index
RangeIndex(start=0, stop=100, step=1)
第三部分:数据输入与输出
3.1 csv
df = pd.DataFrame(data = np.random.randint(0,151,size = (100,3)),
columns=['Python','Math','En'])
df
| Python | Math | En |
---|
0 | 10 | 128 | 54 |
---|
1 | 45 | 47 | 74 |
---|
2 | 103 | 33 | 133 |
---|
3 | 70 | 24 | 81 |
---|
4 | 90 | 143 | 121 |
---|
... | ... | ... | ... |
---|
95 | 145 | 25 | 139 |
---|
96 | 53 | 51 | 109 |
---|
97 | 35 | 7 | 130 |
---|
98 | 86 | 51 | 20 |
---|
99 | 149 | 66 | 75 |
---|
100 rows × 3 columns
df.to_csv('./data.csv',sep = ',',
index = True,
header=True)
df.to_csv('./data2.csv',sep = ',',
index = False,
header=False)
pd.read_csv('./data.csv',
index_col=0)
| Python | Math | En |
---|
0 | 10 | 128 | 54 |
---|
1 | 45 | 47 | 74 |
---|
2 | 103 | 33 | 133 |
---|
3 | 70 | 24 | 81 |
---|
4 | 90 | 143 | 121 |
---|
... | ... | ... | ... |
---|
95 | 145 | 25 | 139 |
---|
96 | 53 | 51 | 109 |
---|
97 | 35 | 7 | 130 |
---|
98 | 86 | 51 | 20 |
---|
99 | 149 | 66 | 75 |
---|
100 rows × 3 columns
pd.read_csv('./data2.csv',header =None)
| 0 | 1 | 2 |
---|
0 | 10 | 128 | 54 |
---|
1 | 45 | 47 | 74 |
---|
2 | 103 | 33 | 133 |
---|
3 | 70 | 24 | 81 |
---|
4 | 90 | 143 | 121 |
---|
... | ... | ... | ... |
---|
95 | 145 | 25 | 139 |
---|
96 | 53 | 51 | 109 |
---|
97 | 35 | 7 | 130 |
---|
98 | 86 | 51 | 20 |
---|
99 | 149 | 66 | 75 |
---|
100 rows × 3 columns
3.2 Excel
df
| Python | Math | En |
---|
0 | 10 | 128 | 54 |
---|
1 | 45 | 47 | 74 |
---|
2 | 103 | 33 | 133 |
---|
3 | 70 | 24 | 81 |
---|
4 | 90 | 143 | 121 |
---|
... | ... | ... | ... |
---|
95 | 145 | 25 | 139 |
---|
96 | 53 | 51 | 109 |
---|
97 | 35 | 7 | 130 |
---|
98 | 86 | 51 | 20 |
---|
99 | 149 | 66 | 75 |
---|
100 rows × 3 columns
df.to_excel('./data.xls')
pd.read_excel('./data.xls',
index_col=0)
| Python | Math | En |
---|
0 | 10 | 128 | 54 |
---|
1 | 45 | 47 | 74 |
---|
2 | 103 | 33 | 133 |
---|
3 | 70 | 24 | 81 |
---|
4 | 90 | 143 | 121 |
---|
... | ... | ... | ... |
---|
95 | 145 | 25 | 139 |
---|
96 | 53 | 51 | 109 |
---|
97 | 35 | 7 | 130 |
---|
98 | 86 | 51 | 20 |
---|
99 | 149 | 66 | 75 |
---|
100 rows × 3 columns
3.3 HDF5
df.to_hdf('./data.h5',key = 'score')
df2 = pd.DataFrame(data = np.random.randint(6,100,size = (1000,5)),
columns=['计算机','化工','生物','工程','教师'])
df2
| 计算机 | 化工 | 生物 | 工程 | 教师 |
---|
0 | 64 | 22 | 16 | 68 | 60 |
---|
1 | 95 | 47 | 72 | 76 | 37 |
---|
2 | 88 | 48 | 92 | 50 | 37 |
---|
3 | 75 | 38 | 8 | 63 | 83 |
---|
4 | 62 | 14 | 20 | 21 | 45 |
---|
... | ... | ... | ... | ... | ... |
---|
995 | 89 | 94 | 88 | 97 | 27 |
---|
996 | 52 | 68 | 21 | 8 | 50 |
---|
997 | 76 | 99 | 10 | 92 | 56 |
---|
998 | 66 | 31 | 55 | 65 | 94 |
---|
999 | 8 | 21 | 38 | 89 | 14 |
---|
1000 rows × 5 columns
df2.to_hdf('./data.h5',key = 'salary')
pd.read_hdf('./data.h5',key = 'salary')
| 计算机 | 化工 | 生物 | 工程 | 教师 |
---|
0 | 64 | 22 | 16 | 68 | 60 |
---|
1 | 95 | 47 | 72 | 76 | 37 |
---|
2 | 88 | 48 | 92 | 50 | 37 |
---|
3 | 75 | 38 | 8 | 63 | 83 |
---|
4 | 62 | 14 | 20 | 21 | 45 |
---|
... | ... | ... | ... | ... | ... |
---|
995 | 89 | 94 | 88 | 97 | 27 |
---|
996 | 52 | 68 | 21 | 8 | 50 |
---|
997 | 76 | 99 | 10 | 92 | 56 |
---|
998 | 66 | 31 | 55 | 65 | 94 |
---|
999 | 8 | 21 | 38 | 89 | 14 |
---|
1000 rows × 5 columns
3.4 SQL
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://root:12345678@localhost/pandas?charset=utf8')
df2.to_sql('salary',engine,index=False)
df3 = pd.read_sql('select * from salary limit 50',con = engine)
df3
| 计算机 | 化工 | 生物 | 工程 | 教师 |
---|
0 | 64 | 22 | 16 | 68 | 60 |
---|
1 | 95 | 47 | 72 | 76 | 37 |
---|
2 | 88 | 48 | 92 | 50 | 37 |
---|
3 | 75 | 38 | 8 | 63 | 83 |
---|
4 | 62 | 14 | 20 | 21 | 45 |
---|
5 | 95 | 41 | 84 | 37 | 16 |
---|
6 | 34 | 45 | 11 | 93 | 94 |
---|
7 | 99 | 10 | 57 | 30 | 63 |
---|
8 | 59 | 60 | 12 | 37 | 93 |
---|
9 | 41 | 58 | 15 | 67 | 70 |
---|
10 | 19 | 8 | 63 | 96 | 64 |
---|
11 | 75 | 61 | 78 | 49 | 89 |
---|
12 | 86 | 84 | 31 | 68 | 27 |
---|
13 | 42 | 98 | 24 | 20 | 85 |
---|
14 | 95 | 15 | 97 | 80 | 87 |
---|
15 | 52 | 22 | 44 | 35 | 74 |
---|
16 | 67 | 20 | 65 | 24 | 10 |
---|
17 | 9 | 46 | 41 | 62 | 66 |
---|
18 | 86 | 76 | 80 | 72 | 19 |
---|
19 | 61 | 81 | 64 | 26 | 6 |
---|
20 | 77 | 92 | 84 | 18 | 7 |
---|
21 | 87 | 16 | 75 | 14 | 34 |
---|
22 | 23 | 82 | 92 | 42 | 32 |
---|
23 | 61 | 89 | 28 | 21 | 40 |
---|
24 | 22 | 12 | 38 | 89 | 14 |
---|
25 | 77 | 12 | 46 | 89 | 12 |
---|
26 | 60 | 45 | 52 | 71 | 67 |
---|
27 | 29 | 76 | 94 | 26 | 91 |
---|
28 | 14 | 60 | 82 | 88 | 60 |
---|
29 | 56 | 36 | 44 | 60 | 37 |
---|
30 | 63 | 77 | 43 | 42 | 82 |
---|
31 | 25 | 71 | 36 | 51 | 21 |
---|
32 | 76 | 86 | 87 | 83 | 8 |
---|
33 | 93 | 59 | 86 | 25 | 78 |
---|
34 | 73 | 40 | 12 | 86 | 66 |
---|
35 | 10 | 30 | 54 | 13 | 71 |
---|
36 | 9 | 48 | 58 | 75 | 85 |
---|
37 | 81 | 41 | 61 | 12 | 55 |
---|
38 | 80 | 68 | 66 | 92 | 84 |
---|
39 | 53 | 36 | 84 | 26 | 66 |
---|
40 | 19 | 62 | 63 | 47 | 45 |
---|
41 | 89 | 39 | 91 | 31 | 86 |
---|
42 | 57 | 43 | 53 | 48 | 19 |
---|
43 | 66 | 16 | 23 | 19 | 10 |
---|
44 | 46 | 28 | 78 | 81 | 21 |
---|
45 | 38 | 53 | 76 | 49 | 8 |
---|
46 | 55 | 94 | 70 | 6 | 44 |
---|
47 | 56 | 33 | 92 | 17 | 84 |
---|
48 | 69 | 68 | 23 | 87 | 90 |
---|
49 | 12 | 47 | 32 | 80 | 15 |
---|
第四部分:数据选取
4.1 获取数据
df = pd.DataFrame(np.random.randint(0,151,size = (10,3)),
index=list('ABCDEFHIJK'),columns=['Python','Math','En'])
df
| Python | Math | En |
---|
A | 88 | 52 | 48 |
---|
B | 78 | 62 | 94 |
---|
C | 9 | 14 | 71 |
---|
D | 86 | 15 | 21 |
---|
E | 1 | 71 | 71 |
---|
F | 123 | 138 | 55 |
---|
H | 59 | 17 | 140 |
---|
I | 68 | 8 | 58 |
---|
J | 100 | 70 | 63 |
---|
K | 79 | 37 | 72 |
---|
df['Python']
A 88
B 78
C 9
D 86
E 1
F 123
H 59
I 68
J 100
K 79
Name: Python, dtype: int64
df.Python
A 88
B 78
C 9
D 86
E 1
F 123
H 59
I 68
J 100
K 79
Name: Python, dtype: int64
df[['Python','En']]
| Python | En |
---|
A | 88 | 48 |
---|
B | 78 | 94 |
---|
C | 9 | 71 |
---|
D | 86 | 21 |
---|
E | 1 | 71 |
---|
F | 123 | 55 |
---|
H | 59 | 140 |
---|
I | 68 | 58 |
---|
J | 100 | 63 |
---|
K | 79 | 72 |
---|
4.2 标签选择
df.loc['A']
Python 88
Math 52
En 48
Name: A, dtype: int64
df.loc[['A','F','K']]
| Python | Math | En |
---|
A | 88 | 52 | 48 |
---|
F | 123 | 138 | 55 |
---|
K | 79 | 37 | 72 |
---|
df.loc['A','Python']
88
df.loc[['A','C','F'],'Python']
A 88
C 9
F 123
Name: Python, dtype: int64
df.loc['A'::2,['Math','En']]
| Math | En |
---|
A | 52 | 48 |
---|
C | 14 | 71 |
---|
E | 71 | 71 |
---|
H | 17 | 140 |
---|
J | 70 | 63 |
---|
df.loc['A':'D',:]
| Python | Math | En |
---|
A | 88 | 52 | 48 |
---|
B | 78 | 62 | 94 |
---|
C | 9 | 14 | 71 |
---|
D | 86 | 15 | 21 |
---|
4.3 位置选择
df.iloc[0]
Python 88
Math 52
En 48
Name: A, dtype: int64
df.iloc[[0,2,4]]
| Python | Math | En |
---|
A | 88 | 52 | 48 |
---|
C | 9 | 14 | 71 |
---|
E | 1 | 71 | 71 |
---|
df.iloc[0:4,[0,2]]
| Python | En |
---|
A | 88 | 48 |
---|
B | 78 | 94 |
---|
C | 9 | 71 |
---|
D | 86 | 21 |
---|
df.iloc[3:8:2]
| Python | Math | En |
---|
D | 86 | 15 | 21 |
---|
F | 123 | 138 | 55 |
---|
I | 68 | 8 | 58 |
---|
4.4 boolean索引
cond = df.Python > 80
df[cond]
| Python | Math | En |
---|
A | 88 | 52 | 48 |
---|
D | 86 | 15 | 21 |
---|
F | 123 | 138 | 55 |
---|
J | 100 | 70 | 63 |
---|
cond = df.mean(axis = 1) > 75
df[cond]
| Python | Math | En |
---|
B | 78 | 62 | 94 |
---|
F | 123 | 138 | 55 |
---|
J | 100 | 70 | 63 |
---|
cond = (df.Python > 70) & (df.Math > 70)
df[cond]
cond = df.index.isin(['C','E','H','K'])
df[cond]
| Python | Math | En |
---|
C | 9 | 14 | 71 |
---|
E | 1 | 71 | 71 |
---|
H | 59 | 17 | 140 |
---|
K | 79 | 37 | 72 |
---|
4.5 赋值操作
df['Python']['A'] = 150
df
| Python | Math | En |
---|
A | 150 | 52 | 48 |
---|
B | 78 | 62 | 94 |
---|
C | 9 | 14 | 71 |
---|
D | 86 | 15 | 21 |
---|
E | 1 | 71 | 71 |
---|
F | 123 | 138 | 55 |
---|
H | 59 | 17 | 140 |
---|
I | 68 | 8 | 58 |
---|
J | 100 | 70 | 63 |
---|
K | 79 | 37 | 72 |
---|
df['Java'] = np.random.randint(0,151,size = 10)
df
| Python | Math | En | Java |
---|
A | 150 | 52 | 48 | 65 |
---|
B | 78 | 62 | 94 | 25 |
---|
C | 9 | 14 | 71 | 82 |
---|
D | 86 | 15 | 21 | 139 |
---|
E | 1 | 71 | 71 | 67 |
---|
F | 123 | 138 | 55 | 145 |
---|
H | 59 | 17 | 140 | 53 |
---|
I | 68 | 8 | 58 | 141 |
---|
J | 100 | 70 | 63 | 11 |
---|
K | 79 | 37 | 72 | 127 |
---|
df.loc[['C','D','E'],'Math'] = 147
df
| Python | Math | En | Java |
---|
A | 150 | 52 | 48 | 65 |
---|
B | 78 | 62 | 94 | 25 |
---|
C | 9 | 147 | 71 | 82 |
---|
D | 86 | 147 | 21 | 139 |
---|
E | 1 | 147 | 71 | 67 |
---|
F | 123 | 138 | 55 | 145 |
---|
H | 59 | 17 | 140 | 53 |
---|
I | 68 | 8 | 58 | 141 |
---|
J | 100 | 70 | 63 | 11 |
---|
K | 79 | 37 | 72 | 127 |
---|
cond = df < 60
df[cond] = 60
df
| Python | Math | En | Java |
---|
A | 150 | 60 | 60 | 65 |
---|
B | 78 | 62 | 94 | 60 |
---|
C | 60 | 147 | 71 | 82 |
---|
D | 86 | 147 | 60 | 139 |
---|
E | 60 | 147 | 71 | 67 |
---|
F | 123 | 138 | 60 | 145 |
---|
H | 60 | 60 | 140 | 60 |
---|
I | 68 | 60 | 60 | 141 |
---|
J | 100 | 70 | 63 | 60 |
---|
K | 79 | 60 | 72 | 127 |
---|
df.iloc[3::3,[0,2]] += 100
df
| Python | Math | En | Java |
---|
A | 150 | 60 | 60 | 65 |
---|
B | 78 | 62 | 94 | 60 |
---|
C | 60 | 147 | 71 | 82 |
---|
D | 186 | 147 | 160 | 139 |
---|
E | 60 | 147 | 71 | 67 |
---|
F | 123 | 138 | 60 | 145 |
---|
H | 160 | 60 | 240 | 60 |
---|
I | 68 | 60 | 60 | 141 |
---|
J | 100 | 70 | 63 | 60 |
---|
K | 179 | 60 | 172 | 127 |
---|
第五部分:数据集成
5.1 concat数据串联
df1 = pd.DataFrame(np.random.randint(0,151,size = (10,3)),
columns=['Python','Math','En'],
index = list('ABCDEFHIJK'))
df2 = pd.DataFrame(np.random.randint(0,151,size = (10,3)),
columns = ['Python','Math','En'],
index = list('QWRTUYOPLM'))
df3 = pd.DataFrame(np.random.randint(0,151,size = (10,2)),
columns=['Java','Chinese'],index = list('ABCDEFHIJK'))
pd.concat([df1,df2],axis = 0)