Python数据分析pandas入门练习题
Posted Geek_bao
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python数据分析pandas入门练习题相关的知识,希望对你有一定的参考价值。
Python数据分析基础
- Preparation
- Exercise 1- MPG Cars
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).
- Step 3. Assign each to a variable called cars1 and cars2
- Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1
- Step 5. What is the number of observations in each dataset?
- Step 6. Join cars1 and cars2 into a single DataFrame called cars
- Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.
- Step 8. Add the column owners to cars
- Exercise 2-Fictitious Names
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Create the 3 DataFrames based on the followin raw data
- Step 3. Assign each to a variable called data1, data2, data3
- Step 4. Join the two dataframes along rows and assign all_data
- Step 5. Join the two dataframes along columns and assing to all_data_col
- Step 6. Print data3
- Step 7. Merge all_data and data3 along the subject_id value
- Step 8. Merge only the data that has the same 'subject_id' on both data1 and data2
- Step 9. Merge all values in data1 and data2, with matching records from both sides where available.
- Exercise 3-Housing Market
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Create 3 differents Series, each of length 100, as follows:
- Step 3. Let's create a DataFrame by joinning the Series by column
- Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter
- Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to 'bigcolumn'
- Step 6. Ops it seems it is going only until index 99. Is it true?
- Step 7. Reindex the DataFrame so it goes from 0 to 299
- Conclusion
Preparation
需要数据集可以自行网上寻找或私聊博主,传到csdn,你们下载要会员,就不传了。下面数据集链接下载不一定能成功。
Exercise 1- MPG Cars
Introduction:
The following exercise utilizes data from UC Irvine Machine Learning Repository
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
import numpy as np
Step 2. Import the first dataset cars1 and cars2.
Step 3. Assign each to a variable called cars1 and cars2
代码如下:
cars1 = pd.read_csv('cars1.csv')
cars2 = pd.read_csv('cars2.csv')
print(cars1.head())
print(cars2.head())
输出结果如下:
mpg cylinders displacement horsepower weight acceleration model \\
0 18.0 8 307 130 3504 12.0 70
1 15.0 8 350 165 3693 11.5 70
2 18.0 8 318 150 3436 11.0 70
3 16.0 8 304 150 3433 12.0 70
4 17.0 8 302 140 3449 10.5 70
origin car Unnamed: 9 Unnamed: 10 Unnamed: 11 \\
0 1 chevrolet chevelle malibu NaN NaN NaN
1 1 buick skylark 320 NaN NaN NaN
2 1 plymouth satellite NaN NaN NaN
3 1 amc rebel sst NaN NaN NaN
4 1 ford torino NaN NaN NaN
Unnamed: 12 Unnamed: 13
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
mpg cylinders displacement horsepower weight acceleration model \\
0 33.0 4 91 53 1795 17.4 76
1 20.0 6 225 100 3651 17.7 76
2 18.0 6 250 78 3574 21.0 76
3 18.5 6 250 110 3645 16.2 76
4 17.5 6 258 95 3193 17.8 76
origin car
0 3 honda civic
1 1 dodge aspen se
2 1 ford granada ghia
3 1 pontiac ventura sj
4 1 amc pacer d/l
Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1
代码如下:
# cars1.dropna(axis=1)
cars1 = cars1.loc[:, "mpg" : "car"] # 取mpg列到car列,赋值给cars1
cars1.head()
输出结果如下:
mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307 | 130 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350 | 165 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
2 | 18.0 | 8 | 318 | 150 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
3 | 16.0 | 8 | 304 | 150 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
4 | 17.0 | 8 | 302 | 140 | 3449 | 10.5 | 70 | 1 | ford torino |
Step 5. What is the number of observations in each dataset?
代码如下:
print(cars1.shape)
print(cars2.shape)
输出结果如下:
(198, 9)
(200, 9)
Step 6. Join cars1 and cars2 into a single DataFrame called cars
代码如下:
cars = cars1.append(cars2) # cars1后追加cars2
# 或者cars = pd.concat([cars1, cars2], axis=0, ignore_index=True)
cars
输出结果如下:
mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307 | 130 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350 | 165 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
2 | 18.0 | 8 | 318 | 150 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
3 | 16.0 | 8 | 304 | 150 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
4 | 17.0 | 8 | 302 | 140 | 3449 | 10.5 | 70 | 1 | ford torino |
5 | 15.0 | 8 | 429 | 198 | 4341 | 10.0 | 70 | 1 | ford galaxie 500 |
6 | 14.0 | 8 | 454 | 220 | 4354 | 9.0 | 70 | 1 | chevrolet impala |
7 | 14.0 | 8 | 440 | 215 | 4312 | 8.5 | 70 | 1 | plymouth fury iii |
8 | 14.0 | 8 | 455 | 225 | 4425 | 10.0 | 70 | 1 | pontiac catalina |
9 | 15.0 | 8 | 390 | 190 | 3850 | 8.5 | 70 | 1 | amc ambassador dpl |
10 | 15.0 | 8 | 383 | 170 | 3563 | 10.0 | 70 | 1 | dodge challenger se |
11 | 14.0 | 8 | 340 | 160 | 3609 | 8.0 | 70 | 1 | plymouth 'cuda 340 |
12 | 15.0 | 8 | 400 | 150 | 3761 | 9.5 | 70 | 1 | chevrolet monte carlo |
13 | 14.0 | 8 | 455 | 225 | 3086 | 10.0 | 70 | 1 | buick estate wagon (sw) |
14 | 24.0 | 4 | 113 | 95 | 2372 | 15.0 | 70 | 3 | toyota corona mark ii |
15 | 22.0 | 6 | 198 | 95 | 2833 | 15.5 | 70 | 1 | plymouth duster |
16 | 18.0 | 6 | 199 | 97 | 2774 | 15.5 | 70 | 1 | amc hornet |
17 | 21.0 | 6 | 200 | 85 | 2587 | 16.0 | 70 | 1 | ford maverick |
18 | 27.0 | 4 | 97 | 88 | 2130 | 14.5 | 70 | 3 | datsun pl510 |
19 | 26.0 | 4 | 97 | 46 | 1835 | 20.5 | 70 | 2 | volkswagen 1131 deluxe sedan |
20 | 25.0 | 4 | 110 | 87 | 2672 | 17.5 | 70 | 2 | peugeot 504 |
21 | 24.0 | 4 | 107 | 90 | 2430 | 14.5 | 70 | 2 | audi 100 ls |
22 | 25.0 | 4 | 104 | 95 | 2375 | 17.5 | 70 | 2 | saab 99e |
23 | 26.0 | 4 | 121 | 113 | 2234 | 12.5 | 70 | 2 | bmw 2002 |
24 | 21.0 | 6 | 199 | 90 | 2648 | 15.0 | 70 | 1 | amc gremlin |
25 | 10.0 | 8 | 360 | 215 | 4615 | 14.0 | 70 | 1 | ford f250 |
26 | 10.0 | 8 | 307 | 200 | 4376 | 15.0 | 70 | 1 | chevy c20 |
27 | 11.0 | 8 | 318 | 210 | 4382 | 13.5 | 70 | 1 | dodge d200 |
28 | 9.0 | 8 | 304 | 193 | 4732 | 18.5 | 70 | 1 | hi 1200d |
29 | 27.0 | 4 | 97 | 88 | 2130 | 14.5 | 71 | 3 | datsun pl510 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
170 | 27.0 | 4 | 112 | 88 | 2640 | 18.6 | 82 | 1 | chevrolet cavalier wagon |
171 | 34.0 | 4 | 112 | 88 | 2395 | 18.0 | 82 | 1 | chevrolet cavalier 2-door |
172 | 31.0 | 4 | 112 | 85 | 2575 | 16.2 | 82 | 1 | pontiac j2000 se hatchback |
173 | 29.0 | 4 | 135 | 84 | 2525 | 16.0 | 82 | 1 | dodge aries se |
174 | 27.0 | 4 | 151 | 90 | 2735 | 18.0 | 82 | 1 | pontiac phoenix |
175 | 24.0 | 4 | 140 | 92 | 2865 | 16.4 | 82 | 1 | ford fairmont futura |
176 | 23.0 | 4 | 151 | ? | 3035 | 20.5 | 82 | 1 | amc concord dl |
177 | 36.0 | 4 | 105 | 74 | 1980 | 15.3 | 82 | 2 | volkswagen rabbit l |
178 | 37.0 | 4 | 91 | 68 | 2025 | 18.2 | 82 | 3 | mazda glc custom l |
179 | 31.0 | 4 | 91 | 68 | 1970 | 17.6 | 82 | 3 | mazda glc custom |
180 | 38.0 | 4 | 105 | 63 | 2125 | 14.7 | 82 | 1 | plymouth horizon miser |
181 | 36.0 | 4 | 98 | 70 | 2125 | 17.3 | 82 | 1 | mercury lynx l |
182 | 36.0 | 4 | 120 | 88 | 2160 | 14.5 | 82 | 3 | nissan stanza xe |
183 | 36.0 | 4 | 107 | 75 | 2205 | 14.5 | 82 | 3 | honda accord |
184 | 34.0 | 4 | 108 | 70 | 2245 | 16.9 | 82 | 3 | toyota corolla |
185 | 38.0 | 4 | 91 | 67 | 1965 | 15.0 | 82 | 3 | honda civic |
186 | 32.0 | 4 | 91 | 67 | 1965 | 15.7 | 82 | 3 | honda civic (auto) |
187 | 38.0 | 4 | 91 | 67 | 1995 | 16.2 | 82 | 3 | datsun 310 gx |
188 | 25.0 | 6 | 181 | 110 | 2945 | 16.4 | 82 | 1 | buick century limited |
189 | 38.0 | 6 | 262 | 85 | 3015 | 17.0 | 82 | 1 | oldsmobile cutlass ciera (diesel) |
190 | 26.0 | 4 | 156 | 92 | 2585 | 14.5 | 82 | 1 | chrysler lebaron medallion |
191 | 22.0 | 6 | 232 | 112 | 2835 | 14.7 | 82 | 1 | ford granada l |
192 | 32.0 | 4 | 144 | 96 | 2665 | 13.9 | 82 | 3 | toyota celica gt |
193 | 36.0 | 4 | 135 | 84 | 2370 | 13.0 | 82 | 1 | dodge charger 2.2 |
194 | 27.0 | 4 | 151 | 90 | 2950 | 17.3 | 82 | 1 | chevrolet camaro |
195 | 27.0 | 4 | 140 | 86 | 2790 | 15.6 | 82 | 1 | ford mustang gl |
196 | 44.0 | 4 | 97 | 52 | 2130 | 24.6 | 82 | 2 | vw pickup |
197 | 32.0 | 4 | 135 | 84 | 2295 | 11.6 | 82 | 1 | dodge rampage |
198 | 28.0 | 4 | 120 | 79 | 2625 | 18.6 | 82 | 1 | ford ranger |
199 | 31.0 | 4 | 119 | 82 | 2720 | 19.4 | 82 | 1 | chevy s-10 |
398 rows × 9 columns
Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.
代码如下:
# 创建一个随机的Series,数的范围为15000到73000
my_owners = np.random.randint(15000, 73000, 398)
my_owners
输出结果如下:
array([30395, 42733, 44554, 34325, 50270, 60139, 24218, 25925, 42502,
45041, 21449, 34472, 42783, 56380, 15707, 25707, 61160, 29297,
42237, 72966, 71738, 56392, 69335, 17479, 30914, 29516, 36953,
51000, 39315, 32876, 18305, 27092, 16590, 46419, 32564, 72843,
46094, 50032, 22524, 16894, 54936, 18294, 44021, 42157, 61278,
55678, 58345, 32391, 17736, 56275, 21903, 47867, 22928, 52829,
67523, 55847, 25127, 40745, 23557, 54718, 18046, 35915, 65050,
49568, 61822, 60210, 17202, 30865, 71921, 23434, 66579, 55818,
56517, 33692, 55612, 32730, 22067, 65470, 50373, 58544, 38244,
21356, 70010, 49500, 56970, 50040, 48606, 65609, 37288, 19547,
32552, 71469, 69222, 36178, 44561, 40260, 44320, 28935, 57835,
24374, 65163, 43465, 22097, 59672, 42933, 47359, 18186, 17173,
66674, 55787, 29976, 40561, 36443, 68754, 48264, 31182, 70643,
15752, 29759, 37604, 21019, 49529, 61506, 25802, 29858, 23015,
27686, 65069, 62086, 33228, 16118, 71662, 70313, 24824, 24145,
65096, 20493, 30484, 68996, 26227, 53008, 53758, 18948, 54496,
64296, 50249, 35804, 44871, 39478, 63729, 19158, 21156, 64732,
16314, 51031, 36171, 16193, 17655, 38842, 61288, 31747, 50995,
63973, 52996, 26386, 57648, 26917, 59280, 30409, 27326, 48687,
20302, 54604, 62031, 62863, 31196, 67807, 30862, 66646, 20763,
65260, 66917, 67245, 26877, 24180, 70477, 46640, 36947, 16129,
55475, 32569, 53886, 19898, 62866, 42115, 18904, 28941, 48321,
28726, 19294, 17524, 30191, 29962, 64426, 60301, 71109, 70145,
60671, 62912, 57491, 48347, 28355, 29315, 39817, 71448, 62550,
59895, 17500, 21399, 52074, 32021, 54743, 67416, 27439, 18368,
21339, 18891, 26910, 66961, 15866, 71688, 24802, 15530, 23647,
44735, 72447, 64943, 67634, 67242, 61201, 36495, 42778, 66391,
25980, 61012, 51792, 45485, 52052, 27935, 66677, 29556, 67718,
63235, 66715, 39916, 54433, 63466, 61667, 21403, 53130, 45514,
55541, 54951, 66835, 37705, 34943, 18583, 26945, 31816, 30104,
52488, 46073, 39184, 26461, 64275, 60612, 27026, 37623, 22297,
33671, 53580, 38553, 29536, 56143, 47368, 16612, 54661, 49403,
70564, 30202, 56649, 26010, 65496, 63384, 17810, 64697, 48685,
56686, 35658, 15539, 49614, 42165, 17433, 51415, 35637, 50719,
47660, 15843, 45879, 41314, 39516, 61481, 68731, 29011, 51430,
20347, 41176, 50809, 55824, 37399, 40692, 18155, 69199, 38232,
32516, 57175, 38183, 21583, 66353, 18430, 16846, 61518, 70780,
71784, 38712, 61313, 55800, 61001, 52706, 18203, 17225, 66550,
34556, 25500, 65731, 15544, 69825, 68116, 34481, 60377, 29735,
47846, 51439, 53054, 45308, 66654, 65698, 18421, 59846, 15493,
53974, 41658, 30768, 23367, 15484, 28173, 18845, 15455, 42450,
18834, 59814, 55643, 38475, 45623, 23382, 50896, 66593, 72178,
29783, 39787, 46350, 42547, 65359, 62119, 53808, 45300, 48233,
34077, 60663, 46497, 48174, 19764, 56893, 52080, 41104, 21126,
56865, 39795])
Step 8. Add the column owners to cars
代码如下:
# 增加一列owners
cars['owners'] = my_owners
cars.tail()
输出结果如下:
mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car | owners | |
---|---|---|---|---|---|---|---|---|---|---|
195 | 27.0 | 4 | 140 | 86 | 2790 | 15.6 | 82 | 1 | ford mustang gl | 52080 |
196 | 44.0 | 4 | 97 | 52 | 2130 | 24.6 | 82 | 2 | vw pickup | 41104 |
197 | 32.0 | 4 | 135 | 84 | 2295 | 11.6 | 82 | 1 | dodge rampage | 21126 |
198 | 28.0 | 4 | 120 | 79 | 2625 | 18.6 | 82 | 1 | ford ranger | 56865 |
199 | 31.0 | 4 | 119 | 82 | 2720 | 19.4 | 82 | 1 | chevy s-10 | 39795 |
Exercise 2-Fictitious Names
Introduction:
This time you will create a data again
Special thanks to Chris Albon for sharing the dataset and materials.
All the credits to this exercise belongs to him.
In order to understand about it go here.
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
Step 2. Create the 3 DataFrames based on the followin raw data
代码如下:
raw_data_1 = {
'subject_id': ['1', '2', '3', '4', '5'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
raw_data_2 = {
'subject_id': ['4', '5', '6', '7', '8'],
'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
raw_data_3 = {
'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
Step 3. Assign each to a variable called data1, data2, data3
代码如下:
data1 = pd.DataFrame(raw_data_1, columns = ['subject_id', 'first_name', 'last_name'])
data2 = pd.DataFrame(raw_data_2, columns = ['subject_id', 'first_name', 'last_name'])
data3 = pd.DataFrame(raw_data_3, columns = ['subject_id', 'test_id'])
print(data1)
print(data2)
print(data3)
输出结果如下:
subject_id first_name last_name
0 1 Alex Anderson
1 2 Amy Ackerman
2 3 Allen Ali
3 4 Alice Aoni
4 5 Ayoung Atiches
subject_id first_name last_name
0 4 Billy Bonder
1 5 Brian Black
2 6 Bran Balwner
3 7 Bryce Brice
4 8 Betty Btisan
subject_id test_id
0 1 51
1 2 15
2 3 15
3 4 61
4 5 16
5 7 14
6 8 15
7 9 1
8 10 61
9 11 16
Step 4. Join the two dataframes along rows and assign all_data
代码如下:
# all_data = data1.append(data2)
# all_data = pd.merge(data1, data2, how='outer')
# 以上两种方法也都可
all_data = pd.concat([data1, data2])
all_data
输出结果如下:
subject_id | first_name | last_name | |
---|---|---|---|
0 | 1 | Alex | Anderson |
1 | 2 | Amy | Ackerman |
2 | 3 | Allen | Ali |
3 | 4 | Alice | Aoni |
4 | 5 | Ayoung | Atiches |
0 | 4 | Billy | Bonder |
1 | 5 | Brian | Black |
2 | 6 | Bran | Balwner |
3 | 7 | Bryce | Brice |
4 | 8 | Betty | Btisan |
Step 5. Join the two dataframes along columns and assing to all_data_col
代码如下:
# 按列合并并赋值给all_data_col
all_data_col = pd.concat([data1, data2], axis=1)
all_data_col
输出结果如下:
subject_id | first_name | last_name | subject_id | first_name | last_name | |
---|---|---|---|---|---|---|
0 | 1 | Alex | Anderson | 4 | Billy | Bonder |
1 | 2 | Amy | Ackerman | 5 | Brian | Black |
2 | 3 | Allen | Ali | 6 | Bran | Balwner |
3 | 4 | Alice | Aoni | 7 | Bryce | Brice |
4 | 5 | Ayoung | Atiches | 8 | Betty | Btisan |
Step 6. Print data3
代码如下:
data3
输出结果如下:
subject_id | test_id | |
---|---|---|
0 | 1 | 51 |
1 | 2 | 15 |
2 | 3 | 15 |
3 | 4 | 61 |
4 | 5 | 16 |
5 | 7 | 14 |
6 | 8 | 15 |
7 | 9 | 1 |
8 | 10 | 61 |
9 | 11 | 16 |
Step 7. Merge all_data and data3 along the subject_id value
代码如下:
all_data_3 = pd.merge(all_data, data3, on='subject_id') # 默认how='inner'
all_data_3
输出结果如下:
subject_id | first_name | last_name | test_id | |
---|---|---|---|---|
0 | 1 | Alex | Anderson | 51 |
1 | 2 | Amy | Ackerman | 15 |
2 | 3 | Allen | Ali | 15 |
3 | 4 | Alice | Aoni | 61 |
4 | 4 | Billy | Bonder | 61 |
5 | 5 | Ayoung | Atiches | 16 |
6 | 5 | Brian | Black | 16 |
7 | 7 | Bryce | Brice | 14 |
8 | 8 | Betty | Btisan | 15 |
Step 8. Merge only the data that has the same ‘subject_id’ on both data1 and data2
代码如下:
data = pd.merge(data1, data2, on='subject_id', how='inner')
data
输出结果如下:
subject_id | first_name_x | last_name_x | first_name_y | last_name_y | |
---|---|---|---|---|---|
0 | 4 | Alice | Aoni | Billy | Bonder |
1 | 5 | Ayoung | Atiches | Brian | Black |
Step 9. Merge all values in data1 and data2, with matching records from both sides where available.
代码如下:
# 合并 data1 和 data2 中的所有值,并在可用的情况下使用双方的匹配记录。
pd.merge(data1, data2, on='subject_id', how='outer') # 可通过suffixes=['_A', '_B']设置保证合并不重复
输出结果如下:
subject_id | first_name_x | last_name_x | first_name_y | last_name_y | |
---|---|---|---|---|---|
0 | 1 | Alex | Anderson | NaN | NaN |
1 | 2 | Amy | Ackerman | NaN | NaN |
2 | 3 | Allen | Ali | NaN | NaN |
3 | 4 | Alice | Aoni | Billy | Bonder |
4 | 5 | Ayoung | Atiches | Brian | Black |
5 | 6 | NaN | NaN | Bran | Balwner |
6 | 7 | NaN | NaN | Bryce | Brice |
7 | 8 | NaN | NaN | Betty | Btisan |
Exercise 3-Housing Market
Introduction:
This time we will create our own dataset with fictional numbers to describe a house market. As we are going to create random data don’t try to reason of the numbers.
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
import numpy as np
Step 2. Create 3 differents Series, each of length 100, as follows:
- The first a random number from 1 to 4
- The second a random number from 1 to 3
- The third a random number from 10,000 to 30,000
代码如下:
s1 = pd.Series(np.random.randint(1, 4, 100))
s2 = pd.Series(np.random.randint(1, 3, 100))
s3 = pd.Series(np.random.randint(10000, 30000, 100))
print(s1, s2, s3)
输出结果如下:
0 2
1 3
2 1
3 3
4 1
5 1
6 2
7 1
8 1
9 1
10 1
11 3
12 1
13 2
14 3
15 2
16 1
17 1
18 3
19 3
20 1
21 3
22 3
23 1
24 1
25 2
26 1
27 1
28 2
29 1
..
70 1
71 1
72 3
73 2
74 2
75 1
76 2
77 1
78 3
79 2
80 3
81 3
82 3
83 2
84 1
85 3
86 2
87 1
88 3
89 3
90 1
91 3
92 2
93 3
94 1
95 2
96 3
97 2
98 3
99 1
Length: 100, dtype: int32 0 1
1 2
2 1
3 2
4 1
5 2
6 1
7 1
8 1
9 2
10 2
11 1
12 1
13 2
14 2
15 2
16 1
17 2
18 1
19 1
20 2
21 2
22 1
23 1
24 1
25 1
26 1
27 2
28 1
29 1
..
70 2
71 2
72 1
73 1
74 1
75 1
76 2
77 2
78 2
79 2
80 1
81 2
82 1
83 2
84 1
85 1
86 2
87 2
88 1
89 2
90 1
91 2
92 1
93 1
94 1
95 2
96 1
97 1
98 2
99 2
Length: 100, dtype: int32 0 11973
1 10804
2 26866
3 25940
4 23147
5 14552
6 22151
7 19312
8 25373
9 29329
10 17069
11 19629
12 26174
13 20524
14 16489
15 22613
16 25266
17 11566
18 28599
19 27562
20 12922
21 29055
22 12709
23 21727
24 16735
25 20818
26 20199
27 21400
28 21602
29 16792
...
70 10076
71 20091
72 28284
73 12185
74 15879
75 12907
76 24946
77 20168
78 24435
79 12175
80 18286
81 18001
82 10938
83 19116
84 12802
85 11623
86 15048
87 10624
88 18989
89 19797
90 17798
91 21317
92 27047
93 25692
94 27564
95 23411
96 18808
97 16854
98 21737
99 18968
Length: 100, dtype: int32
Step 3. Let’s create a DataFrame by joinning the Series by column
代码如下:
housemkt = pd.concat([s1, s2, s3], axis=1)
housemkt.head()
输出结果如下:
0 | 1 | 2 | |
---|---|---|---|
0 | 2 | 1 | 11973 |
1 | 3 | 2 | 10804 |
2 | 1 | 1 | 26866 |
3 | 3 | 2 | 25940 |
4 | 1 | 1 | 23147 |
Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter
代码如下:
'''
rename函数主要用到的参数有:
columns:列名
index:行名
axis:指定坐标轴
inplace:是否替换,默认为False。inplace为False时返回修改后结果,变量自身不修改。inplace为True时返回None,变量自身被修改。
'''
housemkt.rename(columns={0: 'bedrs', 1: 'bathrs', 2: 'price_sqr_meter'}, inplace = True)
housemkt.head()
输出结果如下:
bedrs | bathrs | price_sqr_meter | |
---|---|---|---|
0 | 2 | 1 | 11973 |
1 | 3 | 2 | 10804 |
2 | 1 | 1 | 26866 |
3 | 3 | 2 | 25940 |
4 | 1 | 1 | 23147 |
Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to ‘bigcolumn’
代码如下:
bigcolumn = pd.concat([s1, s2, s3], axis=0)
bigcolumn = bigcolumn.to_frame() # 可以将数组转换为DataFrame格式
print(type(bigcolumn))
bigcolumn
输出结果如下:
<class 'pandas.core.frame.DataFrame'>
0 | |
---|---|
0 | 2 |
1 | 3 |
2 | 1 |
3 | 3 |
4 | 1 |
5 | 1 |
6 | 2 |
7 | 1 |
8 | 1 |
9 | 1 |
10 | 1 |
11 | 3 |
12 | 1 |
13 | 2 |
14 | 3 |
15 | 2 |
16 | 1 |
17 | 1 |
18 | 3 |
19 | 3 |
20 | 1 |
21 | 3 |
22 | 3 |
23 | 1 |
24 | 1 |
25 | 2 |
26 | 1 |
27 | 1 |
28 | 2 |
29 | 1 |
... | ... |
70 | 10076 |
71 | 20091 |
72 | 28284 |
73 | 12185 |
74 | 15879 |
75 | 12907 |
76 | 24946 |
77 | 20168 |
78 | 24435 |
79 | 12175 |
80 | 18286 |
81 | 18001 |
82 | 10938 |
83 | 19116 |
84 | 12802 |
85 | 11623 |
86 | 15048 |
87 | 10624 |
88 | 18989 |
89 | 19797 |
90 | 17798 |
91 | 21317 |
92 | 27047 |
93 | 25692 |
94 | 27564 |
95 | 23411 |
96 | 18808 |
97 | 16854 |
98 | 21737 |
99 | 18968 |
300 rows × 1 columns
Step 6. Ops it seems it is going only until index 99. Is it true?
代码如下:
len(bigcolumn)
输出结果如下:
300
Step 7. Reindex the DataFrame so it goes from 0 to 299
代码如下:
# reset_index()函数,重置索引后,drop参数默认为False,想要删除原先的索引列要置为True.想要在原数据上修改要inplace=True.特别是不赋值的情况必须要加,否则drop无效
'''
set_index():
函数原型:DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
参数解释:
keys:列标签或列标签/数组列表,需要设置为索引的列
drop:默认为True,删除用作新索引的列
append:默认为False,是否将列附加到现有索引
inplace:默认为False,适当修改DataFrame(不要创建新对象)
verify_integrity:默认为false,检查新索引的副本。否则,请将检查推迟到必要时进行。将其设置为false将提高该方法的性能。
reset_index():
函数原型:DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
参数解释:
level:int、str、tuple或list,默认无,仅从索引中删除给定级别。默认情况下移除所有级别。控制了具体要还原的那个等级的索引
drop:drop为False则索引列会被还原为普通列,否则会丢失
inplace:默认为false,适当修改DataFrame(不要创建新对象)
col_level:int或str,默认值为0,如果列有多个级别,则确定将标签插入到哪个级别。默认情况下,它将插入到第一级。
col_fill:对象,默认‘’,如果列有多个级别,则确定其他级别的命名方式。如果没有,则重复索引名
注:reset_index还原分为两种类型,第一种是对原DataFrame进行reset,第二种是对使用过set_index()函数的DataFrame进行reset
'''
bigcolumn.reset_index(drop=True, inplace=True)
bigcolumn
输出结果如下:
0 | |
---|---|
0 | 2 |
1 | 3 |
2 | 1 |
3 | 3 |
4 | 1 |
5 | 1 |
6 | 2 |
7 | 1 |
8 | 1 |
9 | 1 |
10 | 1 |
11 | 3 |
12 | 1 |
13 | 2 |
14 | 3 |
15 | 2 |
16 | 1 |
17 | 1 |
18 | 3 |
19 | 3 |
20 | 1 |
21 | 3 |
22 | 3 |
23 | 1 |
24 | 1 |
25 | 2 |
26 | 1 |
27 | 1 |
28 | 2 |
29 | 1 |
... | ... |
270 | 10076 |
271 | 20091 |
272 | 28284 |
273 | 12185 |
274 | 15879 |
275 | 12907 |
276 | 24946 |
277 | 20168 |
278 | 24435 |
279 | 12175 |
280 | 18286 |
281 | 18001 |
282 | 10938 |
283 | 19116 |
284 | 12802 |
285 | 11623 |
286 | 15048 |
287 | 10624 |
288 | 18989 |
289 | 19797 |
290 | 17798 |
291 | 21317 |
292 | 27047 |
293 | 25692 |
294 | 27564 |
295 | 23411 |
296 | 18808 |
297 | 16854 |
298 | 21737 |
299 | 18968 |
300 rows × 1 columns
Conclusion
今天主要练习了合并函数的操作以及其他相关函数的使用。再次提醒本专栏pandas使用了anaconda—jupyter notebook。推荐使用,好用极了!
以上是关于Python数据分析pandas入门练习题的主要内容,如果未能解决你的问题,请参考以下文章