python之时间序列数据操作
Posted AI大数据与机器学习
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python之时间序列数据操作相关的知识,希望对你有一定的参考价值。
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
from datetime import datetime
t1=datetime(2018,1,1)
t1
datetime.datetime(2018, 1, 1, 0, 0)
#生成时间序列
date_list=[
datetime(2016,9,1),
datetime(2016,9,10),
datetime(2017,1,10),
datetime(2017,9,1),
datetime(2017,9,10)
]
date_list
[datetime.datetime(2016, 9, 1, 0, 0),
datetime.datetime(2016, 9, 10, 0, 0),
datetime.datetime(2017, 1, 10, 0, 0),
datetime.datetime(2017, 9, 1, 0, 0),
datetime.datetime(2017, 9, 10, 0, 0)]
s1=Series(np.random.randn(5),index=date_list) # 生成一个series
s1
2016-09-01 -0.816429
2016-09-10 0.298250
2017-01-10 0.076052
2017-09-01 -0.930593
2017-09-10 1.118954
dtype: float64
s1.values
array([-0.81642881, 0.29825013, 0.07605179, -0.93059269, 1.11895431])
s1.index
DatetimeIndex(['2016-09-01', '2016-09-10', '2017-01-10', '2017-09-01',
'2017-09-10'],
dtype='datetime64[ns]', freq=None)
## 第一种 s1 Series 直接通过index的位置信息
s1[1]
0.29825013475003143
# 第二种 通过datetime 对象
s1[datetime(2016,9,10)]
0.29825013475003143
# 第三种
s1['20160910']
0.29825013475003143
# 第四种
s1['2016-09-10']
0.29825013475003143
# 返回 2016年 9月分数据
s1['2016-09']
2016-09-01 -0.816429
2016-09-10 0.298250
dtype: float64
# 返回 2016 年的数据
s1['2016']
2016-09-01 -0.816429
2016-09-10 0.298250
dtype: float64
产生一段时间范围内的datetime类型
# freq='D' 间隔天 W 周 H 小时 (5H)5 小时
# start= 起始时间
# end= 终止时间
# periods 时间间隔
t_range=pd.date_range('2016-09-01','2016-12-30',freq='D')
t_range
DatetimeIndex(['2016-09-01', '2016-09-02', '2016-09-03', '2016-09-04',
'2016-09-05', '2016-09-06', '2016-09-07', '2016-09-08',
'2016-09-09', '2016-09-10',
...
'2016-12-21', '2016-12-22', '2016-12-23', '2016-12-24',
'2016-12-25', '2016-12-26', '2016-12-27', '2016-12-28',
'2016-12-29', '2016-12-30'],
dtype='datetime64[ns]', length=121, freq='D')
s1=Series(np.random.randn(len(t_range)),index=t_range)
s1
2016-09-01 -0.455936
2016-09-02 -0.139572
2016-09-03 0.026517
2016-09-04 0.869604
2016-09-05 0.118923
2016-09-06 -0.253617
2016-09-07 0.583654
2016-09-08 0.419779
2016-09-09 -0.026094
2016-09-10 -0.214967
2016-09-11 1.282340
2016-09-12 -0.935493
2016-09-13 -0.006994
2016-09-14 1.392700
2016-09-15 -1.981734
2016-09-16 1.470759
2016-09-17 -0.224829
2016-09-18 0.702895
2016-09-19 -1.921391
2016-09-20 0.063877
2016-09-21 0.969874
2016-09-22 -1.830443
2016-09-23 -3.101344
2016-09-24 -0.502835
2016-09-25 -0.577289
2016-09-26 0.212470
2016-09-27 1.288838
2016-09-28 0.792510
2016-09-29 1.755656
2016-09-30 0.741112
...
2016-12-01 0.363644
2016-12-02 -0.234677
2016-12-03 1.516533
2016-12-04 0.266701
2016-12-05 1.518905
2016-12-06 0.989958
2016-12-07 0.436439
2016-12-08 0.919553
2016-12-09 1.710792
2016-12-10 1.180212
2016-12-11 0.770745
2016-12-12 0.551608
2016-12-13 -2.450998
2016-12-14 0.160223
2016-12-15 0.849345
2016-12-16 0.340485
2016-12-17 0.903086
2016-12-18 -0.654804
2016-12-19 1.302723
2016-12-20 1.419518
2016-12-21 -0.112140
2016-12-22 -0.303501
2016-12-23 1.723859
2016-12-24 1.068970
2016-12-25 -0.049457
2016-12-26 0.759129
2016-12-27 -0.070325
2016-12-28 -0.392154
2016-12-29 1.327748
2016-12-30 0.427450
Freq: D, dtype: float64
s1['2016-09'].mean()
0.017298998699809685
s1_moutn=s1.resample('M').mean() # 每月采样
s1_moutn
2016-09-30 0.017299
2016-10-31 -0.322476
2016-11-30 -0.335975
2016-12-31 0.541319
Freq: M, dtype: float64
# ffill相当于 往前填充 8月31号的数据填充9月1号数据
s1.resample('H').ffill().head()
2016-09-01 00:00:00 -0.455936
2016-09-01 01:00:00 -0.455936
2016-09-01 02:00:00 -0.455936
2016-09-01 03:00:00 -0.455936
2016-09-01 04:00:00 -0.455936
Freq: H, dtype: float64
# 往后填充 相当于从10月1号的数据倒推到9月1号
s1.resample('H').bfill()
2016-09-01 00:00:00 -0.455936
2016-09-01 01:00:00 -0.139572
2016-09-01 02:00:00 -0.139572
2016-09-01 03:00:00 -0.139572
2016-09-01 04:00:00 -0.139572
2016-09-01 05:00:00 -0.139572
2016-09-01 06:00:00 -0.139572
2016-09-01 07:00:00 -0.139572
2016-09-01 08:00:00 -0.139572
2016-09-01 09:00:00 -0.139572
2016-09-01 10:00:00 -0.139572
2016-09-01 11:00:00 -0.139572
2016-09-01 12:00:00 -0.139572
2016-09-01 13:00:00 -0.139572
2016-09-01 14:00:00 -0.139572
2016-09-01 15:00:00 -0.139572
2016-09-01 16:00:00 -0.139572
2016-09-01 17:00:00 -0.139572
2016-09-01 18:00:00 -0.139572
2016-09-01 19:00:00 -0.139572
2016-09-01 20:00:00 -0.139572
2016-09-01 21:00:00 -0.139572
2016-09-01 22:00:00 -0.139572
2016-09-01 23:00:00 -0.139572
2016-09-02 00:00:00 -0.139572
2016-09-02 01:00:00 0.026517
2016-09-02 02:00:00 0.026517
2016-09-02 03:00:00 0.026517
2016-09-02 04:00:00 0.026517
2016-09-02 05:00:00 0.026517
...
2016-12-28 19:00:00 1.327748
2016-12-28 20:00:00 1.327748
2016-12-28 21:00:00 1.327748
2016-12-28 22:00:00 1.327748
2016-12-28 23:00:00 1.327748
2016-12-29 00:00:00 1.327748
2016-12-29 01:00:00 0.427450
2016-12-29 02:00:00 0.427450
2016-12-29 03:00:00 0.427450
2016-12-29 04:00:00 0.427450
2016-12-29 05:00:00 0.427450
2016-12-29 06:00:00 0.427450
2016-12-29 07:00:00 0.427450
2016-12-29 08:00:00 0.427450
2016-12-29 09:00:00 0.427450
2016-12-29 10:00:00 0.427450
2016-12-29 11:00:00 0.427450
2016-12-29 12:00:00 0.427450
2016-12-29 13:00:00 0.427450
2016-12-29 14:00:00 0.427450
2016-12-29 15:00:00 0.427450
2016-12-29 16:00:00 0.427450
2016-12-29 17:00:00 0.427450
2016-12-29 18:00:00 0.427450
2016-12-29 19:00:00 0.427450
2016-12-29 20:00:00 0.427450
2016-12-29 21:00:00 0.427450
2016-12-29 22:00:00 0.427450
2016-12-29 23:00:00 0.427450
2016-12-30 00:00:00 0.427450
Freq: H, dtype: float64
以上是关于python之时间序列数据操作的主要内容,如果未能解决你的问题,请参考以下文章