Python数据分析与可视化Pandas统计分析(实训二)
Posted ZSYL
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python数据分析与可视化Pandas统计分析(实训二)相关的知识,希望对你有一定的参考价值。
【Python数据分析与可视化】Pandas统计分析(实训二)
对小费数据集进行数据分析与可视化
1.导入模块
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei'] #显示中文字体
plt.rcParams['axes.unicode_minus'] = False #显示负号
%matplotlib inline
2.获取数据
fdata=pd.read_excel('tips.xls')
fdata
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
5 | 25.29 | 4.71 | Male | No | Sun | Dinner | 4 |
6 | 8.77 | 2.00 | Male | No | Sun | Dinner | 2 |
7 | 26.88 | 3.12 | Male | No | Sun | Dinner | 4 |
8 | 15.04 | 1.96 | Male | No | Sun | Dinner | 2 |
9 | 14.78 | 3.23 | Male | No | Sun | Dinner | 2 |
10 | 10.27 | 1.71 | Male | No | Sun | Dinner | 2 |
11 | 35.26 | 5.00 | Female | No | Sun | Dinner | 4 |
12 | 15.42 | 1.57 | Male | No | Sun | Dinner | 2 |
13 | 18.43 | 3.00 | Male | No | Sun | Dinner | 4 |
14 | 14.83 | 3.02 | Female | No | Sun | Dinner | 2 |
15 | 21.58 | 3.92 | Male | No | Sun | Dinner | 2 |
16 | 10.33 | 1.67 | Female | No | Sun | Dinner | 3 |
17 | 16.29 | 3.71 | Male | No | Sun | Dinner | 3 |
18 | 16.97 | 3.50 | Female | No | Sun | Dinner | 3 |
19 | 20.65 | 3.35 | Male | No | Sat | Dinner | 3 |
20 | 17.92 | 4.08 | Male | No | Sat | Dinner | 2 |
21 | 20.29 | 2.75 | Female | No | Sat | Dinner | 2 |
22 | 15.77 | 2.23 | Female | No | Sat | Dinner | 2 |
23 | 39.42 | 7.58 | Male | No | Sat | Dinner | 4 |
24 | 19.82 | 3.18 | Male | No | Sat | Dinner | 2 |
25 | 17.81 | 2.34 | Male | No | Sat | Dinner | 4 |
26 | 13.37 | 2.00 | Male | No | Sat | Dinner | 2 |
27 | 12.69 | 2.00 | Male | No | Sat | Dinner | 2 |
28 | 21.70 | 4.30 | Male | No | Sat | Dinner | 2 |
29 | 19.65 | 3.00 | Female | No | Sat | Dinner | 2 |
... | ... | ... | ... | ... | ... | ... | ... |
214 | 28.17 | 6.50 | Female | Yes | Sat | Dinner | 3 |
215 | 12.90 | 1.10 | Female | Yes | Sat | Dinner | 2 |
216 | 28.15 | 3.00 | Male | Yes | Sat | Dinner | 5 |
217 | 11.59 | 1.50 | Male | Yes | Sat | Dinner | 2 |
218 | 7.74 | 1.44 | Male | Yes | Sat | Dinner | 2 |
219 | 30.14 | 3.09 | Female | Yes | Sat | Dinner | 4 |
220 | 12.16 | 2.20 | Male | Yes | Fri | Lunch | 2 |
221 | 13.42 | 3.48 | Female | Yes | Fri | Lunch | 2 |
222 | 8.58 | 1.92 | Male | Yes | Fri | Lunch | 1 |
223 | 15.98 | 3.00 | Female | No | Fri | Lunch | 3 |
224 | 13.42 | 1.58 | Male | Yes | Fri | Lunch | 2 |
225 | 16.27 | 2.50 | Female | Yes | Fri | Lunch | 2 |
226 | 10.09 | 2.00 | Female | Yes | Fri | Lunch | 2 |
227 | 20.45 | 3.00 | Male | No | Sat | Dinner | 4 |
228 | 13.28 | 2.72 | Male | No | Sat | Dinner | 2 |
229 | 22.12 | 2.88 | Female | Yes | Sat | Dinner | 2 |
230 | 24.01 | 2.00 | Male | Yes | Sat | Dinner | 4 |
231 | 15.69 | 3.00 | Male | Yes | Sat | Dinner | 3 |
232 | 11.61 | 3.39 | Male | No | Sat | Dinner | 2 |
233 | 10.77 | 1.47 | Male | No | Sat | Dinner | 2 |
234 | 15.53 | 3.00 | Male | Yes | Sat | Dinner | 2 |
235 | 10.07 | 1.25 | Male | No | Sat | Dinner | 2 |
236 | 12.60 | 1.00 | Male | Yes | Sat | Dinner | 2 |
237 | 32.83 | 1.17 | Male | Yes | Sat | Dinner | 2 |
238 | 35.83 | 4.67 | Female | No | Sat | Dinner | 3 |
239 | 29.03 | 5.92 | Male | No | Sat | Dinner | 3 |
240 | 27.18 | 2.00 | Female | Yes | Sat | Dinner | 2 |
241 | 22.67 | 2.00 | Male | Yes | Sat | Dinner | 2 |
242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 |
243 | 18.78 | 3.00 | Female | No | Thur | Dinner | 2 |
244 rows × 7 columns
3.分析数据
(1) 查看数据待描述信息
fdata.describe().head()
total_bill | tip | size | |
---|---|---|---|
count | 244.000000 | 244.000000 | 244.000000 |
mean | 19.785943 | 2.998279 | 2.569672 |
std | 8.902412 | 1.383638 | 0.951100 |
min | 3.070000 | 1.000000 | 1.000000 |
25% | 13.347500 | 2.000000 | 2.000000 |
(2)修改列名为汉字,并显示前5行数据
#修改列名为汉字total_bill tip sex smoker day time size
fdata.rename(columns=('total_bill':'消费总额','tip':'小费','sex':'性别','smoker':'是否抽烟',
'day':'星期','time':'聚餐时间段','size':'人数'),inplace=True)
fdata.head()
消费总额 | 小费 | 性别 | 是否抽烟 | 星期 | 聚餐时间段 | 人数 | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
(3)增加一列“人均消费”
fdata['人均消费']=round(fdata['消费总额']/fdata['人数'],2)
fdata.head()
消费总额 | 小费 | 性别 | 是否抽烟 | 星期 | 聚餐时间段 | 人数 | 人均消费 | |
---|---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 | 8.49 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 | 3.45 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 | 7.00 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 | 11.84 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 | 6.15 |
(4)查询抽烟男性中人均消费大于15的数据
# 方法1:
fdata[(fdata['是否抽烟']=='Yes') &(fdata['性别']=='Male') & (fdata['人均消费']> 15) ]
# 方法2:
# fdata[(fdata.是否抽烟=='Yes') &(fdata.性别=='Male') & (fdata.人均消费> 15) ]
# 方法3:
# fdata.query( '是否抽烟=="Yes" & 性别=="Male" & 人均消费>15')
消费总额 | 小费 | 性别 | 是否抽烟 | 星期 | 聚餐时间段 | 人数 | 人均消费 | |
---|---|---|---|---|---|---|---|---|
83 | 32.68 | 5.00 | Male | Yes | Thur | Lunch | 2 | 16.34 |
170 | 50.81 | 10.00 | Male | Yes | Sat | Dinner | 3 | 16.94 |
173 | 31.85 | 3.18 | Male | Yes | Sun | Dinner | 2 | 15.92 |
175 | 32.90 | 3.11 | Male | Yes | Sun | Dinner | 2 | 16.45 |
179 | 34.63 | 3.55 | Male | Yes | Sun | Dinner | 2 | 17.32 |
182 | 45.35 | 3.50 | Male | Yes | Sun | Dinner | 3 | 15.12 |
184 | 40.55 | 3.00 | Male | Yes | Sun | Dinner | 2 | 20.27 |
237 | 32.83 | 1.17 | Male | Yes | Sat | Dinner | 2 | 16.42 |
(5)分析小费和总金额的关系
#分析小费和总金额的关系,散点图
fdata.plot(kind='scatter',x='消费总额',y='小费')
#正相关关系
(6)分析男女顾客哪个更慷慨,分组看看男性还是女性的小费平均水平更高
#分析男女顾客哪个更慷慨,就是分组看看男性还是女性的小费平均水平更高
fdata.groupby('性别')['小费'].mean()
性别
Female 2.833448
Male 3.089618
Name: 小费, dtype: float64
(7)分析日期和小费的关系
#分析日期和小费的关系,直方图
print(fdata['星期'].unique())
r=fdata.groupby('星期')['小费'].mean()
fig=r.plot(kind='bar',x='星期',y='小费',fontsize=12,rot=36)
# fig.axes.title.set_size(16)
['Sun' 'Sat' 'Thur' 'Fri']
(8)性别+抽烟书对慷慨度的影响
#性别+抽烟书对慷慨度的影响
r=fdata.groupby(['性别','是否抽烟'])['小费'].mean()
fig=r.plot(kind='bar',x=['性别','是否抽烟'],y='小费',fontsize=12,rot=30)
fig.axes.title.set_size(16)
(9)聚餐时间与小费数额的关系
#聚餐时间与小费数额的关系
r=fdata.groupby('聚餐时间段')['小费'].mean()
fig=r.plot(kind='bar',x='聚餐时间',y='小费')
fig.axes.title.set_size(16)
从分析图可以发现,晚餐时段的小费比午餐时段的要高。
加油!
感谢!
努力!
以上是关于Python数据分析与可视化Pandas统计分析(实训二)的主要内容,如果未能解决你的问题,请参考以下文章
Python数据分析与可视化Pandas数据载入与预处理(实训三)
Pandas高级数据分析快速入门之一——Python开发环境篇
《Python开发 - Python库》PandasGUI安装与使用(数据可视化分析工具)