Python数据分析pandas入门练习题
Posted Geek_bao
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python数据分析pandas入门练习题相关的知识,希望对你有一定的参考价值。
Python数据分析基础
- Preparation
- Exercise 1 - Filtering and Sorting Data
- Step 1. Import the necessary libraries
- Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv).
- Step 3. Assign it to a variable called chipo.
- Step 4. How many products cost more than $10.00?
- Step 5. What is the price of each item?
- Step 6. Sort by the name of the item
- Step 7. What was the quantity of the most expensive item ordered?
- Step 8. How many times were a Veggie Salad Bowl ordered?
- Step 9. How many times people orderd more than one Canned Soda?
- Exercise2 - Filtering and Sorting Data
- Step 1. Import the necessary libraries
- Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/02_Filtering_%26_Sorting/Euro12/Euro_2012_stats_TEAM.csv).
- Step 3. Assign it to a variable called euro12.
- Step 4. Select only the Goal column.
- Step 5. How many team participated in the Euro2012?
- Step 6. What is the number of columns in the dataset?
- Step 7. View only the columns Team, Yellow Cards and Red Cards and assign them to a dataframe called discipline
- Step 8. Sort the teams by Red Cards, then to Yellow Cards
- Step 9. Calculate the mean Yellow Cards given per Team
- Step 10. Filter teams that scored more than 6 goals
- Step 11. Select the teams that start with G
- Step 12. Select the first 7 columns
- Step 13. Select all columns except the last 3.
- Step 14. Present only the Shooting Accuracy from England, Italy and Russia
- Fictional Army - Filtering and Sorting
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. This is the data given as a dictionary
- Step 3. Create a dataframe and assign it to a variable called army.
- Step 4. Set the 'origin' colum as the index of the dataframe
- Step 5. Print only the column veterans
- Step 6. Print the columns 'veterans' and 'deaths'
- Step 7. Print the name of all the columns.
- Step 8. Select the 'deaths', 'size' and 'deserters' columns from Maine and Alaska
- Step 9. Select the rows 3 to 7 and the columns 3 to 6
- Step 10. Select every row after the fourth row
- Step 11. Select every row up to the 4th row
- Step 12. Select the 3rd column up to the 7th column
- Step 13. Select rows where df.deaths is greater than 50
- Step 14. Select rows where df.deaths is greater than 500 or less than 50
- Step 15. Select all the regiments not named "Dragoons"
- Step 16. Select the rows called Texas and Arizona
- Step 17. Select the third cell in the row named Arizona
- Step 18. Select the third cell down in the column named deaths
- 小节
- 结语
Preparation
下面练习题的数据集,给出的网址不一定可用,这个地址数据集亲测可用。如果数据集失效了,可自行网上寻找。https://github.com/daacheng/PythonBasic/tree/master/dataset
Exercise 1 - Filtering and Sorting Data
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
Step 2. Import the dataset from this address.
这个地址数据集不一定能用,可能需要梯子。
Step 3. Assign it to a variable called chipo.
代码如下:
chipo = pd.read_csv('chipotle.csv', sep=',')
Step 4. How many products cost more than $10.00?
代码如下:
# 题目是让你求单价超过10美金的产品
# 整理 item_price 列并将其转换为浮点数
prices = [float(value[1:-1]) for value in chipo.item_price]
# 用整理过的价格重新分配列
chipo.item_price = prices
# 删除 item_name 和quantity中的重复项
'''
drop_duplicates(self, subset=None, keep="first", inplace=False)
subset(子集 ):考虑用于标识重复行的列标签或标签序列。 默认情况下,所有列均用于查找重复的行。
keep :允许的值为{'first','last',False},默认为'first'。 如果为“ first”,则删除除第一个行以外的重复行。
如果为“ last”,则删除除最后一行以外的重复行。 如果为False,则删除所有重复的行。
inplace :如果为True,则更改源DataFrame并返回None。 默认情况下,源DataFrame保持不变,并返回一个新的DataFrame实例。
'''
chipo_filtered = chipo.drop_duplicates(['item_name', 'quantity'])
# 仅选择数量等于 1 的产品
chipo_one_prod = chipo_filtered[chipo_filtered.quantity == 1]
# item_name.nunique()返回每列不同值的个数
chipo_one_prod[chipo_one_prod['item_price']>10].item_name.nunique()
输出结果如下:
12
Step 5. What is the price of each item?
print a data frame with only two columns item_name and item_price
代码如下:
# 输出每个商品的单价,只输出item_name和item_price
# delete the duplicates in item_name and quantity
chipo_filtered = chipo.drop_duplicates(['item_name','quantity'])
# chipo[(chipo['item_name'] == 'Chicken Bowl') & (chipo['quantity'] == 1)]
# select only the products with quantity equals to 1
chipo_one_prod = chipo_filtered[chipo_filtered.quantity == 1]
# select only the item_name and item_price columns
price_per_item = chipo_one_prod[['item_name', 'item_price']]
print(price_per_item)
# sort the values from the most to less expensive
# price_per_item.sort_values(by = "item_price", ascending = False).head(20)
输出结果如下:
item_name item_price
0 Chips and Fresh Tomato Salsa 2.39
1 Izze 3.39
2 Nantucket Nectar 3.39
3 Chips and Tomatillo-Green Chili Salsa 2.39
5 Chicken Bowl 10.98
6 Side of Chips 1.69
7 Steak Burrito 11.75
8 Steak Soft Tacos 9.25
10 Chips and Guacamole 4.45
11 Chicken Crispy Tacos 8.75
12 Chicken Soft Tacos 8.75
16 Chicken Burrito 8.49
21 Barbacoa Burrito 8.99
27 Carnitas Burrito 8.99
28 Canned Soda 1.09
33 Carnitas Bowl 8.99
34 Bottled Water 1.09
38 Chips and Tomatillo Green Chili Salsa 2.95
39 Barbacoa Bowl 11.75
40 Chips 2.15
44 Chicken Salad Bowl 8.75
54 Steak Bowl 8.99
56 Barbacoa Soft Tacos 9.25
57 Veggie Burrito 11.25
62 Veggie Bowl 11.25
92 Steak Crispy Tacos 9.25
111 Chips and Tomatillo Red Chili Salsa 2.95
168 Barbacoa Crispy Tacos 11.75
186 Veggie Salad Bowl 11.25
191 Chips and Roasted Chili-Corn Salsa 2.39
233 Chips and Roasted Chili Corn Salsa 2.95
237 Carnitas Soft Tacos 9.25
250 Chicken Salad 10.98
263 Canned Soft Drink 1.25
298 6 Pack Soft Drink 6.49
300 Chips and Tomatillo-Red Chili Salsa 2.39
510 Burrito 7.40
520 Crispy Tacos 7.40
554 Carnitas Crispy Tacos 9.25
606 Steak Salad Bowl 11.89
664 Steak Salad 8.99
673 Bowl 7.40
674 Chips and Mild Fresh Tomato Salsa 3.00
738 Veggie Soft Tacos 11.25
1132 Carnitas Salad Bowl 11.89
1229 Barbacoa Salad Bowl 11.89
1414 Salad 7.40
1653 Veggie Crispy Tacos 8.49
1694 Veggie Salad 8.49
3750 Carnitas Salad 8.99
Step 6. Sort by the name of the item
代码如下:
chipo.sort_values(by='item_name')
# chipo.item_name.sort_values()
输出结果如下:
Unnamed: 0 | order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|---|
3389 | 3389 | 1360 | 2 | 6 Pack Soft Drink | [Diet Coke] | 12.98 |
341 | 341 | 148 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1849 | 1849 | 749 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
1860 | 1860 | 754 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
2713 | 2713 | 1076 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3422 | 3422 | 1373 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
553 | 553 | 230 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1916 | 1916 | 774 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1922 | 1922 | 776 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
1937 | 1937 | 784 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
3836 | 3836 | 1537 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
298 | 298 | 129 | 1 | 6 Pack Soft Drink | [Sprite] | 6.49 |
1976 | 1976 | 798 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1167 | 1167 | 481 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3875 | 3875 | 1554 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1124 | 1124 | 465 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3886 | 3886 | 1558 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
2108 | 2108 | 849 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3010 | 3010 | 1196 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
4535 | 4535 | 1803 | 1 | 6 Pack Soft Drink | [Lemonade] | 6.49 |
4169 | 4169 | 1664 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
4174 | 4174 | 1666 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
4527 | 4527 | 1800 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
4522 | 4522 | 1798 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
3806 | 3806 | 1525 | 1 | 6 Pack Soft Drink | [Sprite] | 6.49 |
2389 | 2389 | 949 | 1 | 6 Pack Soft Drink | [Coke] | 6.49 |
3132 | 3132 | 1248 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
3141 | 3141 | 1253 | 1 | 6 Pack Soft Drink | [Lemonade] | 6.49 |
639 | 639 | 264 | 1 | 6 Pack Soft Drink | [Diet Coke] | 6.49 |
1026 | 1026 | 422 | 1 | 6 Pack Soft Drink | [Sprite] | 6.49 |
... | ... | ... | ... | ... | ... | ... |
2996 | 2996 | 1192 | 1 | Veggie Salad | [Roasted Chili Corn Salsa (Medium), [Black Bea... | 8.49 |
3163 | 3163 | 1263 | 1 | Veggie Salad | [[Fresh Tomato Salsa (Mild), Roasted Chili Cor... | 8.49 |
4084 | 4084 | 1635 | 1 | Veggie Salad | [[Fresh Tomato Salsa (Mild), Roasted Chili Cor... | 8.49 |
1694 | 1694 | 686 | 1 | Veggie Salad | [[Fresh Tomato Salsa (Mild), Roasted Chili Cor... | 8.49 |
2756 | 2756 | 1094 | 1 | Veggie Salad | [[Tomatillo-Green Chili Salsa (Medium), Roaste... | 8.49 |
4201 | 4201 | 1677 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Black... | 11.25 |
1884 | 1884 | 760 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
455 | 455 | 195 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
3223 | 3223 | 1289 | 1 | Veggie Salad Bowl | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | 11.25 |
2223 | 2223 | 896 | 1 | Veggie Salad Bowl | [Roasted Chili Corn Salsa, Fajita Vegetables] | 8.75 |
2269 | 2269 | 913 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 8.75 |
4541 | 4541 | 1805 | 1 | Veggie Salad Bowl | [Tomatillo Green Chili Salsa, [Fajita Vegetabl... | 8.75 |
3293 | 3293 | 1321 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Rice, Black Beans, Chees... | 8.75 |
186 | 186 | 83 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
960 | 960 | 394 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | 8.75 |
1316 | 1316 | 536 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 8.75 |
2156 | 2156 | 869 | 1 | Veggie Salad Bowl | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | 11.25 |
4261 | 4261 | 1700 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
295 | 295 | 128 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | 11.25 |
4573 | 4573 | 1818 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | 8.75 |
2683 | 2683 | 1066 | 1 | Veggie Salad Bowl | [Roasted Chili Corn Salsa, [Fajita Vegetables,... | 8.75 |
496 | 496 | 207 | 1 | Veggie Salad Bowl | [Fresh Tomato Salsa, [Rice, Lettuce, Guacamole... | 11.25 |
4109 | 4109 | 1646 | 1 | Veggie Salad Bowl | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | 11.25 |
738 | 738 | 304 | 1 | Veggie Soft Tacos | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | 11.25 |
3889 | 3889 | 1559 | 2 | Veggie Soft Tacos | [Fresh Tomato Salsa (Mild), [Black Beans, Rice... | 16.98 |
2384 | 2384 | 948 | 1 | Veggie Soft Tacos | [Roasted Chili Corn Salsa, [Fajita Vegetables,... | 8.75 |
781 | 781 | 322 | 1 | Veggie Soft Tacos | [Fresh Tomato Salsa, [Black Beans, Cheese, Sou... | 8.75 |
2851 | 2851 | 1132 | 1 | Veggie Soft Tacos | [Roasted Chili Corn Salsa (Medium), [Black Bea... | 8.49 |
1699 | 1699 | 688 | 1 | Veggie Soft Tacos | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | 11.25 |
1395 | 1395 | 567 | 1 | Veggie Soft Tacos | [Fresh Tomato Salsa (Mild), [Pinto Beans, Rice... | 8.49 |
4622 rows × 6 columns
Step 7. What was the quantity of the most expensive item ordered?
代码如下:
chipo.sort_values(by='item_price', ascending=False).head(1)
输出结果如下:
Unnamed: 0 | order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|---|
3598 | 3598 | 1443 | 15 | Chips and Fresh Tomato Salsa | NaN | 44.25 |
Step 8. How many times were a Veggie Salad Bowl ordered?
代码如下:
chipo_salad = chipo[chipo.item_name == 'Veggie Salad Bowl']
len(chipo_salad)
# 或者chipo_salad.shape[0]
输出结果如下:
18
Step 9. How many times people orderd more than one Canned Soda?
代码如下:
# chipo[(chipo['item_name'] == 'Chicken Bowl') & (chipo['quantity'] == 1)]
chipo_soda = chipo[(chipo.item_name == 'Canned Soda') & (chipo.quantity>1)]
len(chipo_soda)
# 或者print(chipo_soda.shape[0])
输出结果如下:
20
Exercise2 - Filtering and Sorting Data
This time we are going to pull data directly from the internet.
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called euro12.
代码如下:
euro12 = pd.read_csv('Euro_2012_stats_TEAM.csv', sep=',')
euro12
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | Hit Woodwork | Penalty goals | Penalties not scored | ... | Saves made | Saves-to-shots ratio | Fouls Won | Fouls Conceded | Offsides | Yellow Cards | Red Cards | Subs on | Subs off | Players Used | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Croatia | 4 | 13 | 12 | 51.9% | 16.0% | 32 | 0 | 0 | 0 | ... | 13 | 81.3% | 41 | 62 | 2 | 9 | 0 | 9 | 9 | 16 |
1 | Czech Republic | 4 | 13 | 18 | 41.9% | 12.9% | 39 | 0 | 0 | 0 | ... | 9 | 60.1% | 53 | 73 | 8 | 7 | 0 | 11 | 11 | 19 |
2 | Denmark | 4 | 10 | 10 | 50.0% | 20.0% | 27 | 1 | 0 | 0 | ... | 10 | 66.7% | 25 | 38 | 8 | 4 | 0 | 7 | 7 | 15 |
3 | England | 5 | 11 | 18 | 50.0% | 17.2% | 40 | 0 | 0 | 0 | ... | 22 | 88.1% | 43 | 45 | 6 | 5 | 0 | 11 | 11 | 16 |
4 | France | 3 | 22 | 24 | 37.9% | 6.5% | 65 | 1 | 0 | 0 | ... | 6 | 54.6% | 36 | 51 | 5 | 6 | 0 | 11 | 11 | 19 |
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 | 2 | 1 | 0 | ... | 10 | 62.6% | 63 | 49 | 12 | 4 | 0 | 15 | 15 | 17 |
6 | Greece | 5 | 8 | 18 | 30.7% | 19.2% | 32 | 1 | 1 | 1 | ... | 13 | 65.1% | 67 | 48 | 12 | 9 | 1 | 12 | 12 | 20 |
7 | Italy | 6 | 34 | 45 | 43.0% | 7.5% | 110 | 2 | 0 | 0 | ... | 20 | 74.1% | 101 | 89 | 16 | 16 | 0 | 18 | 18 | 19 |
8 | Netherlands | 2 | 12 | 36 | 25.0% | 4.1% | 60 | 2 | 0 | 0 | ... | 12 | 70.6% | 35 | 30 | 3 | 5 | 0 | 7 | 7 | 15 |
9 | Poland | 2 | 15 | 23 | 39.4% | 5.2% | 48 | 0 | 0 | 0 | ... | 6 | 66.7% | 48 | 56 | 3 | 7 | 1 | 7 | 7 | 17 |
10 | Portugal | 6 | 22 | 42 | 34.3% | 9.3% | 82 | 6 | 0 | 0 | ... | 10 | 71.5% | 73 | 90 | 10 | 12 | 0 | 14 | 14 | 16 |
11 | Republic of Ireland | 1 | 7 | 12 | 36.8% | 5.2% | 28 | 0 | 0 | 0 | ... | 17 | 65.4% | 43 | 51 | 11 | 6 | 1 | 10 | 10 | 17 |
12 | Russia | 5 | 9 | 31 | 22.5% | 12.5% | 59 | 2 | 0 | 0 | ... | 10 | 77.0% | 34 | 43 | 4 | 6 | 0 | 7 | 7 | 16 |
13 | Spain | 12 | 42 | 33 | 55.9% | 16.0% | 100 | 0 | 1 | 0 | ... | 15 | 93.8% | 102 | 83 | 19 | 11 | 0 | 17 | 17 | 18 |
14 | Sweden | 5 | 17 | 19 | 47.2% | 13.8% | 39 | 3 | 0 | 0 | ... | 8 | 61.6% | 35 | 51 | 7 | 7 | 0 | 9 | 9 | 18 |
15 | Ukraine | 2 | 7 | 26 | 21.2% | 6.0% | 38 | 0 | 0 | 0 | ... | 13 | 76.5% | 48 | 31 | 4 | 5 | 0 | 9 | 9 | 18 |
16 rows × 35 columns
Step 4. Select only the Goal column.
代码如下:
euro12.Goals
# 或者euro12['Goals']
输出结果如下:
0 4
1 4
2 4
3 5
4 3
5 10
6 5
7 6
8 2
9 2
10 6
11 1
12 5
13 12
14 5
15 2
Name: Goals, dtype: int64
Step 5. How many team participated in the Euro2012?
代码如下:
euro12.shape[0]
# 或者len(euro12.Team)
输出结果如下:
16
Step 6. What is the number of columns in the dataset?
代码如下:
# euro12.columns.shape[0]
euro12.info()
输出结果如下:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 35 columns):
Team 16 non-null object
Goals 16 non-null int64
Shots on target 16 non-null int64
Shots off target 16 non-null int64
Shooting Accuracy 16 non-null object
% Goals-to-shots 16 non-null object
Total shots (inc. Blocked) 16 non-null int64
Hit Woodwork 16 non-null int64
Penalty goals 16 non-null int64
Penalties not scored 16 non-null int64
Headed goals 16 non-null int64
Passes 16 non-null int64
Passes completed 16 non-null int64
Passing Accuracy 16 non-null object
Touches 16 non-null int64
Crosses 16 non-null int64
Dribbles 16 non-null int64
Corners Taken 16 non-null int64
Tackles 16 non-null int64
Clearances 16 non-null int64
Interceptions 16 non-null int64
Clearances off line 15 non-null float64
Clean Sheets 16 non-null int64
Blocks 16 non-null int64
Goals conceded 16 non-null int64
Saves made 16 non-null int64
Saves-to-shots ratio 16 non-null object
Fouls Won 16 non-null int64
Fouls Conceded 16 non-null int64
Offsides 16 non-null int64
Yellow Cards 16 non-null int64
Red Cards 16 non-null int64
Subs on 16 non-null int64
Subs off 16 non-null int64
Players Used 16 non-null int64
dtypes: float64(1), int64(29), object(5)
memory usage: 4.5+ KB
Step 7. View only the columns Team, Yellow Cards and Red Cards and assign them to a dataframe called discipline
代码如下:
discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']]
discipline
输出结果如下:
Team | Yellow Cards | Red Cards | |
---|---|---|---|
0 | Croatia | 9 | 0 |
1 | Czech Republic | 7 | 0 |
2 | Denmark | 4 | 0 |
3 | England | 5 | 0 |
4 | France | 6 | 0 |
5 | Germany | 4 | 0 |
6 | Greece | 9 | 1 |
7 | Italy | 16 | 0 |
8 | Netherlands | 5 | 0 |
9 | Poland | 7 | 1 |
10 | Portugal | 12 | 0 |
11 | Republic of Ireland | 6 | 1 |
12 | Russia | 6 | 0 |
13 | Spain | 11 | 0 |
14 | Sweden | 7 | 0 |
15 | Ukraine | 5 | 0 |
Step 8. Sort the teams by Red Cards, then to Yellow Cards
代码如下:
# 通过红牌数和黄牌数对每个队伍排序
discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending=False)
输出结果如下:
Team | Yellow Cards | Red Cards | |
---|---|---|---|
6 | Greece | 9 | 1 |
9 | Poland | 7 | 1 |
11 | Republic of Ireland | 6 | 1 |
7 | Italy | 16 | 0 |
10 | Portugal | 12 | 0 |
13 | Spain | 11 | 0 |
0 | Croatia | 9 | 0 |
1 | Czech Republic | 7 | 0 |
14 | Sweden | 7 | 0 |
4 | France | 6 | 0 |
12 | Russia | 6 | 0 |
3 | England | 5 | 0 |
8 | Netherlands | 5 | 0 |
15 | Ukraine | 5 | 0 |
2 | Denmark | 4 | 0 |
5 | Germany | 4 | 0 |
Step 9. Calculate the mean Yellow Cards given per Team
代码如下:
# 计算每个队伍得到的黄牌数量平均值
round(discipline['Yellow Cards'].mean())
输出结果如下:
7
Step 10. Filter teams that scored more than 6 goals
代码如下:
# 筛选出goals大于6的队伍
euro12[euro12['Goals']>6]
# euro12[euro12.Goals>6]
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | Hit Woodwork | Penalty goals | Penalties not scored | ... | Saves made | Saves-to-shots ratio | Fouls Won | Fouls Conceded | Offsides | Yellow Cards | Red Cards | Subs on | Subs off | Players Used | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 | 2 | 1 | 0 | ... | 10 | 62.6% | 63 | 49 | 12 | 4 | 0 | 15 | 15 | 17 |
13 | Spain | 12 | 42 | 33 | 55.9% | 16.0% | 100 | 0 | 1 | 0 | ... | 15 | 93.8% | 102 | 83 | 19 | 11 | 0 | 17 | 17 | 18 |
2 rows × 35 columns
Step 11. Select the teams that start with G
代码如下:
# 选择G开头的队伍
euro12[euro12.Team.str.startswith('G')]
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | Hit Woodwork | Penalty goals | Penalties not scored | ... | Saves made | Saves-to-shots ratio | Fouls Won | Fouls Conceded | Offsides | Yellow Cards | Red Cards | Subs on | Subs off | Players Used | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 | 2 | 1 | 0 | ... | 10 | 62.6% | 63 | 49 | 12 | 4 | 0 | 15 | 15 | 17 |
6 | Greece | 5 | 8 | 18 | 30.7% | 19.2% | 32 | 1 | 1 | 1 | ... | 13 | 65.1% | 67 | 48 | 12 | 9 | 1 | 12 | 12 | 20 |
2 rows × 35 columns
Step 12. Select the first 7 columns
代码如下:
# 选择前七列
euro12.iloc[:, 0:7]
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | |
---|---|---|---|---|---|---|---|
0 | Croatia | 4 | 13 | 12 | 51.9% | 16.0% | 32 |
1 | Czech Republic | 4 | 13 | 18 | 41.9% | 12.9% | 39 |
2 | Denmark | 4 | 10 | 10 | 50.0% | 20.0% | 27 |
3 | England | 5 | 11 | 18 | 50.0% | 17.2% | 40 |
4 | France | 3 | 22 | 24 | 37.9% | 6.5% | 65 |
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 |
6 | Greece | 5 | 8 | 18 | 30.7% | 19.2% | 32 |
7 | Italy | 6 | 34 | 45 | 43.0% | 7.5% | 110 |
8 | Netherlands | 2 | 12 | 36 | 25.0% | 4.1% | 60 |
9 | Poland | 2 | 15 | 23 | 39.4% | 5.2% | 48 |
10 | Portugal | 6 | 22 | 42 | 34.3% | 9.3% | 82 |
11 | Republic of Ireland | 1 | 7 | 12 | 36.8% | 5.2% | 28 |
12 | Russia | 5 | 9 | 31 | 22.5% | 12.5% | 59 |
13 | Spain | 12 | 42 | 33 | 55.9% | 16.0% | 100 |
14 | Sweden | 5 | 17 | 19 | 47.2% | 13.8% | 39 |
15 | Ukraine | 2 | 7 | 26 | 21.2% | 6.0% | 38 |
Step 13. Select all columns except the last 3.
代码如下:
# 选择除了后三列的所有列
euro12.iloc[:, 0:-3]
输出结果如下:
Team | Goals | Shots on target | Shots off target | Shooting Accuracy | % Goals-to-shots | Total shots (inc. Blocked) | Hit Woodwork | Penalty goals | Penalties not scored | ... | Clean Sheets | Blocks | Goals conceded | Saves made | Saves-to-shots ratio | Fouls Won | Fouls Conceded | Offsides | Yellow Cards | Red Cards | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Croatia | 4 | 13 | 12 | 51.9% | 16.0% | 32 | 0 | 0 | 0 | ... | 0 | 10 | 3 | 13 | 81.3% | 41 | 62 | 2 | 9 | 0 |
1 | Czech Republic | 4 | 13 | 18 | 41.9% | 12.9% | 39 | 0 | 0 | 0 | ... | 1 | 10 | 6 | 9 | 60.1% | 53 | 73 | 8 | 7 | 0 |
2 | Denmark | 4 | 10 | 10 | 50.0% | 20.0% | 27 | 1 | 0 | 0 | ... | 1 | 10 | 5 | 10 | 66.7% | 25 | 38 | 8 | 4 | 0 |
3 | England | 5 | 11 | 18 | 50.0% | 17.2% | 40 | 0 | 0 | 0 | ... | 2 | 29 | 3 | 22 | 88.1% | 43 | 45 | 6 | 5 | 0 |
4 | France | 3 | 22 | 24 | 37.9% | 6.5% | 65 | 1 | 0 | 0 | ... | 1 | 7 | 5 | 6 | 54.6% | 36 | 51 | 5 | 6 | 0 |
5 | Germany | 10 | 32 | 32 | 47.8% | 15.6% | 80 | 2 | 1 | 0 | ... | 1 | 11 | 6 | 10 | 62.6% | 63 | 49 | 12 | 4 | 0 |
6 | Greece | 5 | 8 | 18 | 30.7% | 19.2% | 32 | 1 | 1 | 1 | ... | 1 | 23 | 7 | 13 | 65.1% | 67 | 48 | 12 | 9 | 1 |
7 | Italy | 6 | 34 | 45 | 43.0% | 7.5% | 110 | 2 | 0 | 0 | ... | 2 | 18 | 7 | 20 | 74.1% | 101 | 89 | 16 | 16 | 0 |
8 | Netherlands | 2 | 12 | 36 | 25.0% | 4.1% | 60 | 2 | 0 | 0 | ... | 0 | 9 | 5 | 12 | 70.6% | 35 | 30 | 3 | 5 | 0 |
9 | Poland | 2 | 15 | 23 | 39.4% | 5.2% | 48 | 0 | 0 | 0 | ... | 0 | 8 | 3 | 6 | 66.7% | 48 | 56 | 3 | 7 | 1 |
10 | Portugal | 6 | 22 | 42 | 34.3% | 9.3% | 82 | 6 | 0 | 0 | ... | 2 | 11 | 4 | 10 | 71.5% | 73 | 90 | 10 | 12 | 0 |
11 | Republic of Ireland | 1 | 7 | 12 | 36.8% | 5.2% | 28 | 0 | 0 | 0 | ... | 0 | 23 | 9 | 17 | 65.4% | 43 | 51 | 11 | 6 | 1 |
12 | Russia | 5 | 9 | 31 | 22.5% | 12.5% | 59 | 2 | 0 | 0 | ... | 0 | 8 | 3 | 10 | 77.0% | 34 | 43 | 4 | 6 | 0 |
13 | Spain | 12 | 42 | 33 | 55.9% | 16.0% | 100 | 0 | 1 | 0 | ... | 5 | 8 | 1 | 15 | 93.8% | 102 | 83 | 19 | 11 | 0 |
14 | Sweden | 5 | 17 | 19 | 47.2% | 13.8% | 39 | 3 | 0 | 0 | ... | 1 | 12 | 5 | 8 | 61.6% | 35 | 51 | 7 | 7 | 0 |
15 | Ukraine | 2 | 7 | 26 | 21.2% | 6.0% | 38 | 0 | 0 | 0 | ... | 0 | 4 | 4 | 13 | 76.5% | 48 | 31 | 4 | 5 | 0 |
16 rows × 32 columns
Step 14. Present only the Shooting Accuracy from England, Italy and Russia
代码如下:
# 只取出三个队伍England, Italy and Russia的Shooting Accuracy
euro12.loc[euro12.Team.isin(['England', 'Italy', 'Russia']), ['Team', 'Shooting Accuracy']]
输出结果如下:
Team | Shooting Accuracy | |
---|---|---|
3 | England | 50.0% |
7 | Italy | 43.0% |
12 | Russia | 22.5% |
Fictional Army - Filtering and Sorting
Introduction:
This exercise was inspired by this page
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
Step 2. This is the data given as a dictionary
代码如下:
# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
'readiness': [1, 2, 3, 3, 2, Python数据分析pandas入门练习题