Python数据分析与可视化自行车租赁统计数据分析(综合实训)

Posted ZSYL

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python数据分析与可视化自行车租赁统计数据分析(综合实训)相关的知识,希望对你有一定的参考价值。

实训:自行车租赁统计数据分析

import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize = (10,8))
%matplotlib inline
bike = pd.read_csv('data//bike.csv')
bike.head()
datetimeseasonholidayworkingdayweathertempatemphumiditywindspeedcasualregisteredcount
02011-01-01 00:00:0010019.8414.395810.031316
12011-01-01 01:00:0010019.0213.635800.083240
22011-01-01 02:00:0010019.0213.635800.052732
32011-01-01 03:00:0010019.8414.395750.031013
42011-01-01 04:00:0010019.8414.395750.0011

查看有无缺失值

bike.isnull().sum()
datetime      0
season        0
holiday       0
workingday    0
weather       0
temp          0
atemp         0
humidity      0
windspeed     0
casual        0
registered    0
count         0
dtype: int64

查看待处理数据的数据类型

bike.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
datetime      10886 non-null object
season        10886 non-null int64
holiday       10886 non-null int64
workingday    10886 non-null int64
weather       10886 non-null int64
temp          10886 non-null float64
atemp         10886 non-null float64
humidity      10886 non-null int64
windspeed     10886 non-null float64
casual        10886 non-null int64
registered    10886 non-null int64
count         10886 non-null int64
dtypes: float64(3), int64(8), object(1)
memory usage: 1020.6+ KB
bike.datetime = pd.to_datetime(bike.datetime)
bike.dtypes
datetime      datetime64[ns]
season                 int64
holiday                int64
workingday             int64
weather                int64
temp                 float64
atemp                float64
humidity               int64
windspeed            float64
casual                 int64
registered             int64
count                  int64
dtype: object
bike = bike.set_index('datetime')

先从数值型数据入手,可以看出租赁额(count)数值差异大,所以希望观察一下它们的密度分布

sns.distplot(s, bins = 10, hist = True, kde = True,  
          	norm_hist = False, rug = True, vertical = False,
            color = ‘g', label = 'distplot', axlabel = 'x')
sns.distplot(bike["count"])
# plt.plot(s.index,s.values) #作图


bike["count"].describe()

count    10886.000000
mean       191.574132
std        181.144454
min          1.000000
25%         42.000000
50%        145.000000
75%        284.000000
max        977.000000
Name: count, dtype: float64
def Count(x):
    if x <42:
        return np.nan
    else:
        return x
bike1 = bike
bike1["count"] = bike1["count"].apply(Count)
bike1 = bike1.dropna(axis=0, how='any')
sns.distplot(bike1["count"])

def Count(x):
    if x <145:
        return np.nan
    else:
        return x
bike2= bike
bike2["count"] = bike2["count"].apply(Count)
bike2 = bike2.dropna(axis=0, how='any')
sns.distplot(bike2["count"])
# plt.plot(s.index,s.values)#作图

bike = bike2
bike.shape
(5455, 11)
y_bike = bike.groupby(bike.index.year).mean()['count']
y_bike 
datetime
2011    274.526697
2012    366.408629
Name: count, dtype: float64
y_bike.plot(kind='bar',rot = 0)

mm_bike = bike.resample('M',kind = "period").mean()
mm_bike.head()
seasonholidayworkingdayweathertempatemphumiditywindspeedcasualregisteredcount
datetime
2011-011.00.0000001.0000001.1600008.69200010.90960049.32000011.8804405.280000175.520000180.800000
2011-021.00.0000000.7910451.28358214.29492517.24313444.17910418.17910023.835821168.208955192.044776
2011-031.00.0000000.6666671.29166716.55375019.72802149.45833318.18777848.583333163.781250212.364583
2011-042.00.0780140.6170211.45390119.97078023.63475255.17730516.89374160.624113177.539007238.163121
2011-052.00.0000000.7581971.44672123.06082027.21403764.06967213.94662755.745902224.110656279.856557
mm_bike.plot()
plt.legend(loc = "best",fontsize = 8)

m_bike = bike.groupby(bike.index.month).mean()['count']
m_bike 
datetime
1     246.528736
2     250.560784
3     301.152738
4     336.355450
5     336.163265
6     355.171329
7     339.714533
8     343.414035
9     363.035849
10    354.840304
11    308.497992
12    300.295045
Name: count, dtype: float64
m_bike.plot()
plt.grid()# 9月份达到最大值

h_bike = bike.groupby(bike.index.hour).mean()['count']
h_bike 
datetime
0     175.040000
1     159.666667
6     170.777778
7     332.275194
8     464.194611
9     256.773109
10    247.983051
11    273.677966
12    300.908587
13    306.273504
14    294.686217
15    302.322946
16    358.840314
17    493.831382
18    462.374101
19    361.031746
20    280.290419
21    227.334507
22    197.688776
23    180.289855
Name: count, dtype: float64
h_bike.plot("bar",rot = 0)

season_bike = bike.groupby(bike.season).mean()['count']
season_bike 
season
1    272.279639
2    343.308545
3    348.337306
4    322.621935
Name: count, dtype: float64
season_bike.plot(kind = "bar",rot = 0)

temp_bike = bike.groupby([bike.temp]).mean()['count']
temp_bike.sample(10)
temp
27.06    346.248826
34.44    351.315789
36.90    347.000000
16.40    299.202128
4.92     220.857143
24.60    350.036885
11.48    256.326923
22.96    336.873303
1.64     180.000000
26.24    365.153257
Name: count, dtype: float64
temp_bike.plot()

wind_bike = bike.groupby(bike.windspeed).mean()['count']
wind_bike.sort_values(ascending=False).sample(10)
windspeed
43.9989    389.333333
40.9973    229.375000
32.9975    293.097561
39.0007    335.636364
26.0027    356.098485
47.9988    280.000000
22.0028    319.870056
11.0014    336.831826
7.0015     325.181614
15.0013    340.349904
Name: count, dtype: float64
wind_bike_sort = wind_bike.sort_values(ascending=False)
wind_bike_sort.head(20).plot(kind="bar",rot = 60)

weather_bike = bike.groupby(bike.weather).mean()['count']
weather_bike.plot(kind='bar',rot = 0)


加油!

感谢!

努力!

以上是关于Python数据分析与可视化自行车租赁统计数据分析(综合实训)的主要内容,如果未能解决你的问题,请参考以下文章

kaggle自行车租赁预测

基于SSM开发自行车在线租赁管理系统源码

Python可视化展示-多维数据可视化分析

PMP - 工具与技术 - 数据分析

共享单车需求分析

共享单车需求分析