Python数据分析pandas入门练习题

Posted Geek_bao

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python数据分析pandas入门练习题相关的知识,希望对你有一定的参考价值。

Python数据分析基础

Preparation

下面练习题的数据集,给出的网址不一定可用,这个地址数据集亲测可用。如果数据集失效了,可自行网上寻找。https://github.com/daacheng/PythonBasic/tree/master/dataset

Exercise 1 - Filtering and Sorting Data

Step 1. Import the necessary libraries

代码如下:

import pandas as pd

Step 2. Import the dataset from this address.

这个地址数据集不一定能用,可能需要梯子。

Step 3. Assign it to a variable called chipo.

代码如下:

chipo = pd.read_csv('chipotle.csv', sep=',')

Step 4. How many products cost more than $10.00?

代码如下:

# 题目是让你求单价超过10美金的产品
# 整理 item_price 列并将其转换为浮点数
prices = [float(value[1:-1]) for value in chipo.item_price]
# 用整理过的价格重新分配列
chipo.item_price = prices
# 删除 item_name 和quantity中的重复项
'''
drop_duplicates(self, subset=None, keep="first", inplace=False)
subset(子集 ):考虑用于标识重复行的列标签或标签序列。 默认情况下,所有列均用于查找重复的行。
keep :允许的值为{'first','last',False},默认为'first'。 如果为“ first”,则删除除第一个行以外的重复行。 
如果为“ last”,则删除除最后一行以外的重复行。 如果为False,则删除所有重复的行。
inplace :如果为True,则更改源DataFrame并返回None。 默认情况下,源DataFrame保持不变,并返回一个新的DataFrame实例。
'''
chipo_filtered = chipo.drop_duplicates(['item_name', 'quantity'])
# 仅选择数量等于 1 的产品
chipo_one_prod = chipo_filtered[chipo_filtered.quantity == 1]
# item_name.nunique()返回每列不同值的个数
chipo_one_prod[chipo_one_prod['item_price']>10].item_name.nunique()

输出结果如下:

12

Step 5. What is the price of each item?

print a data frame with only two columns item_name and item_price

代码如下:

# 输出每个商品的单价,只输出item_name和item_price
# delete the duplicates in item_name and quantity
chipo_filtered = chipo.drop_duplicates(['item_name','quantity'])
# chipo[(chipo['item_name'] == 'Chicken Bowl') & (chipo['quantity'] == 1)]

# select only the products with quantity equals to 1
chipo_one_prod = chipo_filtered[chipo_filtered.quantity == 1]

# select only the item_name and item_price columns
price_per_item = chipo_one_prod[['item_name', 'item_price']]
print(price_per_item)
# sort the values from the most to less expensive
# price_per_item.sort_values(by = "item_price", ascending = False).head(20)

输出结果如下:

                                  item_name  item_price
0              Chips and Fresh Tomato Salsa        2.39
1                                      Izze        3.39
2                          Nantucket Nectar        3.39
3     Chips and Tomatillo-Green Chili Salsa        2.39
5                              Chicken Bowl       10.98
6                             Side of Chips        1.69
7                             Steak Burrito       11.75
8                          Steak Soft Tacos        9.25
10                      Chips and Guacamole        4.45
11                     Chicken Crispy Tacos        8.75
12                       Chicken Soft Tacos        8.75
16                          Chicken Burrito        8.49
21                         Barbacoa Burrito        8.99
27                         Carnitas Burrito        8.99
28                              Canned Soda        1.09
33                            Carnitas Bowl        8.99
34                            Bottled Water        1.09
38    Chips and Tomatillo Green Chili Salsa        2.95
39                            Barbacoa Bowl       11.75
40                                    Chips        2.15
44                       Chicken Salad Bowl        8.75
54                               Steak Bowl        8.99
56                      Barbacoa Soft Tacos        9.25
57                           Veggie Burrito       11.25
62                              Veggie Bowl       11.25
92                       Steak Crispy Tacos        9.25
111     Chips and Tomatillo Red Chili Salsa        2.95
168                   Barbacoa Crispy Tacos       11.75
186                       Veggie Salad Bowl       11.25
191      Chips and Roasted Chili-Corn Salsa        2.39
233      Chips and Roasted Chili Corn Salsa        2.95
237                     Carnitas Soft Tacos        9.25
250                           Chicken Salad       10.98
263                       Canned Soft Drink        1.25
298                       6 Pack Soft Drink        6.49
300     Chips and Tomatillo-Red Chili Salsa        2.39
510                                 Burrito        7.40
520                            Crispy Tacos        7.40
554                   Carnitas Crispy Tacos        9.25
606                        Steak Salad Bowl       11.89
664                             Steak Salad        8.99
673                                    Bowl        7.40
674       Chips and Mild Fresh Tomato Salsa        3.00
738                       Veggie Soft Tacos       11.25
1132                    Carnitas Salad Bowl       11.89
1229                    Barbacoa Salad Bowl       11.89
1414                                  Salad        7.40
1653                    Veggie Crispy Tacos        8.49
1694                           Veggie Salad        8.49
3750                         Carnitas Salad        8.99

Step 6. Sort by the name of the item

代码如下:

chipo.sort_values(by='item_name')
# chipo.item_name.sort_values()

输出结果如下:

Unnamed: 0order_idquantityitem_namechoice_descriptionitem_price
33893389136026 Pack Soft Drink[Diet Coke]12.98
34134114816 Pack Soft Drink[Diet Coke]6.49
1849184974916 Pack Soft Drink[Coke]6.49
1860186075416 Pack Soft Drink[Diet Coke]6.49
27132713107616 Pack Soft Drink[Coke]6.49
34223422137316 Pack Soft Drink[Coke]6.49
55355323016 Pack Soft Drink[Diet Coke]6.49
1916191677416 Pack Soft Drink[Diet Coke]6.49
1922192277616 Pack Soft Drink[Coke]6.49
1937193778416 Pack Soft Drink[Diet Coke]6.49
38363836153716 Pack Soft Drink[Coke]6.49
29829812916 Pack Soft Drink[Sprite]6.49
1976197679816 Pack Soft Drink[Diet Coke]6.49
1167116748116 Pack Soft Drink[Coke]6.49
38753875155416 Pack Soft Drink[Diet Coke]6.49
1124112446516 Pack Soft Drink[Coke]6.49
38863886155816 Pack Soft Drink[Diet Coke]6.49
2108210884916 Pack Soft Drink[Coke]6.49
30103010119616 Pack Soft Drink[Diet Coke]6.49
45354535180316 Pack Soft Drink[Lemonade]6.49
41694169166416 Pack Soft Drink[Diet Coke]6.49
41744174166616 Pack Soft Drink[Coke]6.49
45274527180016 Pack Soft Drink[Diet Coke]6.49
45224522179816 Pack Soft Drink[Diet Coke]6.49
38063806152516 Pack Soft Drink[Sprite]6.49
2389238994916 Pack Soft Drink[Coke]6.49
31323132124816 Pack Soft Drink[Diet Coke]6.49
31413141125316 Pack Soft Drink[Lemonade]6.49
63963926416 Pack Soft Drink[Diet Coke]6.49
1026102642216 Pack Soft Drink[Sprite]6.49
.....................
2996299611921Veggie Salad[Roasted Chili Corn Salsa (Medium), [Black Bea...8.49
3163316312631Veggie Salad[[Fresh Tomato Salsa (Mild), Roasted Chili Cor...8.49
4084408416351Veggie Salad[[Fresh Tomato Salsa (Mild), Roasted Chili Cor...8.49
169416946861Veggie Salad[[Fresh Tomato Salsa (Mild), Roasted Chili Cor...8.49
2756275610941Veggie Salad[[Tomatillo-Green Chili Salsa (Medium), Roaste...8.49
4201420116771Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Black...11.25
188418847601Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...11.25
4554551951Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...11.25
3223322312891Veggie Salad Bowl[Tomatillo Red Chili Salsa, [Fajita Vegetables...11.25
222322238961Veggie Salad Bowl[Roasted Chili Corn Salsa, Fajita Vegetables]8.75
226922699131Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...8.75
4541454118051Veggie Salad Bowl[Tomatillo Green Chili Salsa, [Fajita Vegetabl...8.75
3293329313211Veggie Salad Bowl[Fresh Tomato Salsa, [Rice, Black Beans, Chees...8.75
186186831Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...11.25
9609603941Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...8.75
131613165361Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...8.75
215621568691Veggie Salad Bowl[Tomatillo Red Chili Salsa, [Fajita Vegetables...11.25
4261426117001Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...11.25
2952951281Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...11.25
4573457318181Veggie Salad Bowl[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...8.75
2683268310661Veggie Salad Bowl[Roasted Chili Corn Salsa, [Fajita Vegetables,...8.75
4964962071Veggie Salad Bowl[Fresh Tomato Salsa, [Rice, Lettuce, Guacamole...11.25
4109410916461Veggie Salad Bowl[Tomatillo Red Chili Salsa, [Fajita Vegetables...11.25
7387383041Veggie Soft Tacos[Tomatillo Red Chili Salsa, [Fajita Vegetables...11.25
3889388915592Veggie Soft Tacos[Fresh Tomato Salsa (Mild), [Black Beans, Rice...16.98
238423849481Veggie Soft Tacos[Roasted Chili Corn Salsa, [Fajita Vegetables,...8.75
7817813221Veggie Soft Tacos[Fresh Tomato Salsa, [Black Beans, Cheese, Sou...8.75
2851285111321Veggie Soft Tacos[Roasted Chili Corn Salsa (Medium), [Black Bea...8.49
169916996881Veggie Soft Tacos[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...11.25
139513955671Veggie Soft Tacos[Fresh Tomato Salsa (Mild), [Pinto Beans, Rice...8.49

4622 rows × 6 columns

Step 7. What was the quantity of the most expensive item ordered?

代码如下:

chipo.sort_values(by='item_price', ascending=False).head(1)

输出结果如下:

Unnamed: 0order_idquantityitem_namechoice_descriptionitem_price
35983598144315Chips and Fresh Tomato SalsaNaN44.25

Step 8. How many times were a Veggie Salad Bowl ordered?

代码如下:

chipo_salad = chipo[chipo.item_name == 'Veggie Salad Bowl']
len(chipo_salad)
# 或者chipo_salad.shape[0]

输出结果如下:

18

Step 9. How many times people orderd more than one Canned Soda?

代码如下:

# chipo[(chipo['item_name'] == 'Chicken Bowl') & (chipo['quantity'] == 1)]
chipo_soda = chipo[(chipo.item_name == 'Canned Soda') & (chipo.quantity>1)]
len(chipo_soda)
# 或者print(chipo_soda.shape[0])

输出结果如下:

20

Exercise2 - Filtering and Sorting Data

This time we are going to pull data directly from the internet.

Step 1. Import the necessary libraries

代码如下:

import pandas as pd

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called euro12.

代码如下:

euro12 = pd.read_csv('Euro_2012_stats_TEAM.csv', sep=',')
euro12

输出结果如下:

TeamGoalsShots on targetShots off targetShooting Accuracy% Goals-to-shotsTotal shots (inc. Blocked)Hit WoodworkPenalty goalsPenalties not scored...Saves madeSaves-to-shots ratioFouls WonFouls ConcededOffsidesYellow CardsRed CardsSubs onSubs offPlayers Used
0Croatia4131251.9%16.0%32000...1381.3%41622909916
1Czech Republic4131841.9%12.9%39000...960.1%5373870111119
2Denmark4101050.0%20.0%27100...1066.7%25388407715
3England5111850.0%17.2%40000...2288.1%4345650111116
4France3222437.9%6.5%65100...654.6%3651560111119
5Germany10323247.8%15.6%80210...1062.6%63491240151517
6Greece581830.7%19.2%32111...1365.1%67481291121220
7Italy6344543.0%7.5%110200...2074.1%1018916160181819
8Netherlands2123625.0%4.1%60200...1270.6%35303507715
9Poland2152339.4%5.2%48000...666.7%48563717717
10Portugal6224234.3%9.3%82600...1071.5%739010120141416
11Republic of Ireland171236.8%5.2%28000...1765.4%43511161101017
12Russia593122.5%12.5%59200...1077.0%34434607716
13Spain12423355.9%16.0%100010...1593.8%1028319110171718
14Sweden5171947.2%13.8%39300...861.6%35517709918
15Ukraine272621.2%6.0%38000...1376.5%48314509918

16 rows × 35 columns

Step 4. Select only the Goal column.

代码如下:

euro12.Goals
# 或者euro12['Goals']

输出结果如下:

0      4
1      4
2      4
3      5
4      3
5     10
6      5
7      6
8      2
9      2
10     6
11     1
12     5
13    12
14     5
15     2
Name: Goals, dtype: int64

Step 5. How many team participated in the Euro2012?

代码如下:

euro12.shape[0]
# 或者len(euro12.Team)

输出结果如下:

16

Step 6. What is the number of columns in the dataset?

代码如下:

# euro12.columns.shape[0]
euro12.info()

输出结果如下:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 35 columns):
Team                          16 non-null object
Goals                         16 non-null int64
Shots on target               16 non-null int64
Shots off target              16 non-null int64
Shooting Accuracy             16 non-null object
% Goals-to-shots              16 non-null object
Total shots (inc. Blocked)    16 non-null int64
Hit Woodwork                  16 non-null int64
Penalty goals                 16 non-null int64
Penalties not scored          16 non-null int64
Headed goals                  16 non-null int64
Passes                        16 non-null int64
Passes completed              16 non-null int64
Passing Accuracy              16 non-null object
Touches                       16 non-null int64
Crosses                       16 non-null int64
Dribbles                      16 non-null int64
Corners Taken                 16 non-null int64
Tackles                       16 non-null int64
Clearances                    16 non-null int64
Interceptions                 16 non-null int64
Clearances off line           15 non-null float64
Clean Sheets                  16 non-null int64
Blocks                        16 non-null int64
Goals conceded                16 non-null int64
Saves made                    16 non-null int64
Saves-to-shots ratio          16 non-null object
Fouls Won                     16 non-null int64
Fouls Conceded                16 non-null int64
Offsides                      16 non-null int64
Yellow Cards                  16 non-null int64
Red Cards                     16 non-null int64
Subs on                       16 non-null int64
Subs off                      16 non-null int64
Players Used                  16 non-null int64
dtypes: float64(1), int64(29), object(5)
memory usage: 4.5+ KB

Step 7. View only the columns Team, Yellow Cards and Red Cards and assign them to a dataframe called discipline

代码如下:

discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']]
discipline

输出结果如下:

TeamYellow CardsRed Cards
0Croatia90
1Czech Republic70
2Denmark40
3England50
4France60
5Germany40
6Greece91
7Italy160
8Netherlands50
9Poland71
10Portugal120
11Republic of Ireland61
12Russia60
13Spain110
14Sweden70
15Ukraine50

Step 8. Sort the teams by Red Cards, then to Yellow Cards

代码如下:

# 通过红牌数和黄牌数对每个队伍排序
discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending=False)

输出结果如下:

TeamYellow CardsRed Cards
6Greece91
9Poland71
11Republic of Ireland61
7Italy160
10Portugal120
13Spain110
0Croatia90
1Czech Republic70
14Sweden70
4France60
12Russia60
3England50
8Netherlands50
15Ukraine50
2Denmark40
5Germany40

Step 9. Calculate the mean Yellow Cards given per Team

代码如下:

# 计算每个队伍得到的黄牌数量平均值
round(discipline['Yellow Cards'].mean())

输出结果如下:

7

Step 10. Filter teams that scored more than 6 goals

代码如下:

# 筛选出goals大于6的队伍
euro12[euro12['Goals']>6]
# euro12[euro12.Goals>6]

输出结果如下:

TeamGoalsShots on targetShots off targetShooting Accuracy% Goals-to-shotsTotal shots (inc. Blocked)Hit WoodworkPenalty goalsPenalties not scored...Saves madeSaves-to-shots ratioFouls WonFouls ConcededOffsidesYellow CardsRed CardsSubs onSubs offPlayers Used
5Germany10323247.8%15.6%80210...1062.6%63491240151517
13Spain12423355.9%16.0%100010...1593.8%1028319110171718

2 rows × 35 columns

Step 11. Select the teams that start with G

代码如下:

# 选择G开头的队伍
euro12[euro12.Team.str.startswith('G')]

输出结果如下:

TeamGoalsShots on targetShots off targetShooting Accuracy% Goals-to-shotsTotal shots (inc. Blocked)Hit WoodworkPenalty goalsPenalties not scored...Saves madeSaves-to-shots ratioFouls WonFouls ConcededOffsidesYellow CardsRed CardsSubs onSubs offPlayers Used
5Germany10323247.8%15.6%80210...1062.6%63491240151517
6Greece581830.7%19.2%32111...1365.1%67481291121220

2 rows × 35 columns

Step 12. Select the first 7 columns

代码如下:

# 选择前七列
euro12.iloc[:, 0:7]

输出结果如下:

TeamGoalsShots on targetShots off targetShooting Accuracy% Goals-to-shotsTotal shots (inc. Blocked)
0Croatia4131251.9%16.0%32
1Czech Republic4131841.9%12.9%39
2Denmark4101050.0%20.0%27
3England5111850.0%17.2%40
4France3222437.9%6.5%65
5Germany10323247.8%15.6%80
6Greece581830.7%19.2%32
7Italy6344543.0%7.5%110
8Netherlands2123625.0%4.1%60
9Poland2152339.4%5.2%48
10Portugal6224234.3%9.3%82
11Republic of Ireland171236.8%5.2%28
12Russia593122.5%12.5%59
13Spain12423355.9%16.0%100
14Sweden5171947.2%13.8%39
15Ukraine272621.2%6.0%38

Step 13. Select all columns except the last 3.

代码如下:

# 选择除了后三列的所有列
euro12.iloc[:, 0:-3]

输出结果如下:

TeamGoalsShots on targetShots off targetShooting Accuracy% Goals-to-shotsTotal shots (inc. Blocked)Hit WoodworkPenalty goalsPenalties not scored...Clean SheetsBlocksGoals concededSaves madeSaves-to-shots ratioFouls WonFouls ConcededOffsidesYellow CardsRed Cards
0Croatia4131251.9%16.0%32000...01031381.3%4162290
1Czech Republic4131841.9%12.9%39000...1106960.1%5373870
2Denmark4101050.0%20.0%27100...11051066.7%2538840
3England5111850.0%17.2%40000...22932288.1%4345650
4France3222437.9%6.5%65100...175654.6%3651560
5Germany10323247.8%15.6%80210...11161062.6%63491240
6Greece581830.7%19.2%32111...12371365.1%67481291
7Italy6344543.0%7.5%110200...21872074.1%1018916160
8Netherlands2123625.0%4.1%60200...0951270.6%3530350
9Poland2152339.4%5.2%48000...083666.7%4856371
10Portugal6224234.3%9.3%82600...21141071.5%739010120
11Republic of Ireland171236.8%5.2%28000...02391765.4%43511161
12Russia593122.5%12.5%59200...0831077.0%3443460
13Spain12423355.9%16.0%100010...5811593.8%1028319110
14Sweden5171947.2%13.8%39300...1125861.6%3551770
15Ukraine272621.2%6.0%38000...0441376.5%4831450

16 rows × 32 columns

Step 14. Present only the Shooting Accuracy from England, Italy and Russia

代码如下:

# 只取出三个队伍England, Italy and Russia的Shooting Accuracy
euro12.loc[euro12.Team.isin(['England', 'Italy', 'Russia']), ['Team', 'Shooting Accuracy']]

输出结果如下:

TeamShooting Accuracy
3England50.0%
7Italy43.0%
12Russia22.5%

Fictional Army - Filtering and Sorting

Introduction:

This exercise was inspired by this page

Step 1. Import the necessary libraries

代码如下:

import pandas as pd

Step 2. This is the data given as a dictionary

代码如下:

# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
            'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
            'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
            'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
            'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
            'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
            'readiness': [1, 2, 3, 3, 2, Python数据分析pandas入门练习题

Python数据分析pandas入门练习题

Python数据分析pandas入门练习题

Python数据分析pandas入门练习题

Python数据分析pandas入门练习题

Python数据分析pandas入门练习题