时间序列
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了时间序列相关的知识,希望对你有一定的参考价值。
时间序列基础频率表
别名 | 偏移量类型 | 说明 |
D | Day | 每日历日 |
B | BusinessDay | 每工作日 |
H | Hour | 每小时 |
T或min | Minute | 每分 |
S | Second | 每秒 |
L或ms | Milli | 每毫秒 |
U | 每微秒 | |
M | 每月最后一个日历日 | |
BM | 每月最后一个工作日 | |
W-MON,W-TUE... | 从指定的星期几(MON,TUE...)开始算起,每周 | |
W-1MON,WOM-2MON... | 每月第几个星期几,WOM-2MON即每月第2个星期一 | |
Q-JAN,Q-FEB... | 对于指定月份结束的年度,每季度最后一个月的最后一个日历日 | |
BQ-JAN,BQ-FEB... | 对于指定月份结束的年度,每季度最后一个月的最后一个工作日 | |
QS-JAN,QS-FEB... | 对于指定月份结束的年度,每季度最后一个月的第一个日历日 | |
BQS-JAN,BQS-FEB... | 对于指定月份结束的年度,每季度最后一个月的第一个工作日 | |
A-JAN,A-FEB... | 每年指定月份的最后一个日历日 | |
BA-JAN,BA-FEB... | 每年指定月份的最后一个工作日 | |
AS-JAN,AS-FEB... | 每年指定月份的第一个日历日 | |
BAS-JAN,BAS-FEB... | 每年指定月份的第一个工作日 |
时间序列基础
基本导入
In [1]: import numpy as np In [2]: import pandas as pd In [3]: import matplotlib.pyplot as plt In [4]: import datetime as dt In [5]: from pandas import Series,DataFrame In [6]: from datetime import datetime
1.时间序列基础
In [9]: dates = [datetime(2011,1,2),datetime(2011,1,5),datetime(2011,1,7), ...: datetime(2011,1,8),datetime(2011,1,10),datetime(2011,1,12)] In [10]: ts = Series(np.random.randn(6),index = dates) #时间序列对象 In [11]: ts Out[11]: 2011-01-02 -0.535451 2011-01-05 2.177724 2011-01-07 1.894591 2011-01-08 0.163426 2011-01-10 1.171710 2011-01-12 0.131111 dtype: float64 In [12]: type(ts) Out[12]: pandas.core.series.Series In [13]: ts.index Out[13]: DatetimeIndex([‘2011-01-02‘, ‘2011-01-05‘, ‘2011-01-07‘, ‘2011-01-08‘, ‘2011-01-10‘, ‘2011-01-12‘], dtype=‘datetime64[ns]‘, freq=None) In [14]: ts + ts[::2] #时间序列运算 Out[14]: 2011-01-02 -1.070903 2011-01-05 NaN 2011-01-07 3.789183 2011-01-08 NaN 2011-01-10 2.343421 2011-01-12 NaN dtype: float64 In [15]: ts.index.dtype Out[15]: dtype(‘<M8[ns]‘) In [16]: ts.index[0] Out[16]: Timestamp(‘2011-01-02 00:00:00‘) In [17]: stamp = ts.index[2] #时间戳对象 In [18]: ts[stamp],‘\n‘ Out[18]: (1.8945912547163455, ‘\n‘) In [19]: ts[‘1/10/2011‘] #时间序列索引 Out[19]: 1.1717104597202732 In [24]: ts[datetime(2011,1,7):] #时间序列切片 Out[24]: 2011-01-07 1.894591 2011-01-08 0.163426 2011-01-10 1.171710 2011-01-12 0.131111 dtype: float64 In [25]: ts.truncate(after = ‘1/9/2011‘) Out[25]: 2011-01-02 -0.535451 2011-01-05 2.177724 2011-01-07 1.894591 2011-01-08 0.163426 dtype: float64
2.日期范围
生成日期范围基本语法:
dates = pd.date_range(‘1/1/2000‘,periods = 10,freq = ‘W-WED‘)
dates = pd.date_range(‘1/1/2000‘,‘2/2/2000‘,freq = ‘10h‘)
dates = pd.date_range(start = ‘1/1/2000‘,periods = 10)
查询日期:dates.ix[‘2001-01‘]
freq表示频率和日期偏移
In [43]: index = pd.date_range(‘4/1/2015‘,‘5/1/2015‘)
In [44]: index
Out[44]:
DatetimeIndex([‘2015-04-01‘, ‘2015-04-02‘, ‘2015-04-03‘, ‘2015-04-04‘,
‘2015-04-05‘, ‘2015-04-06‘, ‘2015-04-07‘, ‘2015-04-08‘,
‘2015-04-09‘, ‘2015-04-10‘, ‘2015-04-11‘, ‘2015-04-12‘,
‘2015-04-13‘, ‘2015-04-14‘, ‘2015-04-15‘, ‘2015-04-16‘,
‘2015-04-17‘, ‘2015-04-18‘, ‘2015-04-19‘, ‘2015-04-20‘,
‘2015-04-21‘, ‘2015-04-22‘, ‘2015-04-23‘, ‘2015-04-24‘,
‘2015-04-25‘, ‘2015-04-26‘, ‘2015-04-27‘, ‘2015-04-28‘,
‘2015-04-29‘, ‘2015-04-30‘, ‘2015-05-01‘],
dtype=‘datetime64[ns]‘, freq=‘D‘)
In [45]: pd.date_range(‘1/1/2016‘,periods = 31)
Out[45]:
DatetimeIndex([‘2016-01-01‘, ‘2016-01-02‘, ‘2016-01-03‘, ‘2016-01-04‘,
‘2016-01-05‘, ‘2016-01-06‘, ‘2016-01-07‘, ‘2016-01-08‘,
‘2016-01-09‘, ‘2016-01-10‘, ‘2016-01-11‘, ‘2016-01-12‘,
‘2016-01-13‘, ‘2016-01-14‘, ‘2016-01-15‘, ‘2016-01-16‘,
‘2016-01-17‘, ‘2016-01-18‘, ‘2016-01-19‘, ‘2016-01-20‘,
‘2016-01-21‘, ‘2016-01-22‘, ‘2016-01-23‘, ‘2016-01-24‘,
‘2016-01-25‘, ‘2016-01-26‘, ‘2016-01-27‘, ‘2016-01-28‘,
‘2016-01-29‘, ‘2016-01-30‘, ‘2016-01-31‘],
dtype=‘datetime64[ns]‘, freq=‘D‘)
In [46]: pd.date_range(‘12/18/2015‘,‘1/1/2016‘,freq = ‘BM‘)
Out[46]: DatetimeIndex([‘2015-12-31‘], dtype=‘datetime64[ns]‘, freq=‘BM‘)
In [47]: pd.date_range(‘5/2/2015 12:12:12‘,periods = 5)
Out[47]:
DatetimeIndex([‘2015-05-02 12:12:12‘, ‘2015-05-03 12:12:12‘,
‘2015-05-04 12:12:12‘, ‘2015-05-05 12:12:12‘,
‘2015-05-06 12:12:12‘],
dtype=‘datetime64[ns]‘, freq=‘D‘)
In [48]: pd.date_range(‘5/2/2015 12:12:12‘,periods = 5,normalize = True)
Out[48]:
DatetimeIndex([‘2015-05-02‘, ‘2015-05-03‘, ‘2015-05-04‘, ‘2015-05-05‘,
‘2015-05-06‘],
dtype=‘datetime64[ns]‘, freq=‘D‘)
2016-01-24‘, ‘2016-01-25‘, ‘2016-01-26‘, ‘2016-01-27‘, ‘2016-01-28‘, ‘2016-01-29‘, ‘2016-01-30‘, ‘2016-01-31‘], dtype=‘datetime64[ns]‘, freq=‘D‘) In [46]: pd.date_range(‘12/18/2015‘,‘1/1/2016‘,freq = ‘BM‘) Out[46]: DatetimeIndex([‘2015-12-31‘], dtype=‘datetime64[ns]‘, freq=‘BM‘) In [47]: pd.date_range(‘5/2/2015 12:12:12‘,periods = 5) Out[47]: DatetimeIndex([‘2015-05-02 12:12:12‘, ‘2015-05-03 12:12:12‘, ‘2015-05-04 12:12:12‘, ‘2015-05-05 12:12:12‘, ‘2015-05-06 12:12:12‘], dtype=‘datetime64[ns]‘, freq=‘D‘) In [48]: pd.date_range(‘5/2/2015 12:12:12‘,periods = 5,normalize = True) Out[48]: DatetimeIndex([‘2015-05-02‘, ‘2015-05-03‘, ‘2015-05-04‘, ‘2015-05-05‘, ‘2015-05-06‘], dtype=‘datetime64[ns]‘, freq=‘D‘)
3.重复索引的时间序列
In [31]: dates = pd.DatetimeIndex([‘1/1/2000‘,‘1/2/2000‘,‘1/2/2000‘,‘1/2/2000‘,‘1/3/2000‘]) In [32]: dup_ts = Series(np.arange(5),index = dates) In [33]: dup_ts Out[33]: 2000-01-01 0 2000-01-02 1 2000-01-02 2 2000-01-02 3 2000-01-03 4 dtype: int32 In [34]: dup_ts.index.is_unique Out[34]: False In [35]: dup_ts[‘1/2/2000‘] Out[35]: 2000-01-02 1 2000-01-02 2 2000-01-02 3 dtype: int32 In [36]: grouped = dup_ts.groupby(level = 0) In [37]: grouped Out[37]: <pandas.core.groupby.SeriesGroupBy object at 0x00000000082FA5C0> In [38]: grouped.count() Out[38]: 2000-01-01 1 2000-01-02 3 2000-01-03 1 dtype: int64
4.固定频率采样
In [39]: dates = [datetime(2011,1,2),datetime(2011,1,5),datetime(2011,1,7), ...: datetime(2011,1,8),datetime(2011,1,10),datetime(2011,1,12)] In [40]: ts = Series(np.random.randn(6),index = dates) In [42]: ts.resample(‘D‘) Out[42]: DatetimeIndexResampler [freq=<Day>, axis=0, closed=left, label=left, convention=start, base=0]
5.频率和日期偏移量
In [52]: from pandas.tseries.offsets import Hour,Minute In [53]: hour = Hour() In [54]: hour Out[54]: <Hour> In [55]: Hour(4) Out[55]: <4 * Hours> In [56]: pd.date_range(‘1/1/2016‘,‘1/2/2016‘,freq = ‘2h‘) Out[56]: DatetimeIndex([‘2016-01-01 00:00:00‘, ‘2016-01-01 02:00:00‘, ‘2016-01-01 04:00:00‘, ‘2016-01-01 06:00:00‘, ‘2016-01-01 08:00:00‘, ‘2016-01-01 10:00:00‘, ‘2016-01-01 12:00:00‘, ‘2016-01-01 14:00:00‘, ‘2016-01-01 16:00:00‘, ‘2016-01-01 18:00:00‘, ‘2016-01-01 20:00:00‘, ‘2016-01-01 22:00:00‘, ‘2016-01-02 00:00:00‘], dtype=‘datetime64[ns]‘, freq=‘2H‘) In [57]: Hour(1) + Minute(30) Out[57]: <90 * Minutes> In [58]: pd.date_range(‘1/1/2016‘,periods = 5,freq = ‘1h30min‘) Out[58]: DatetimeIndex([‘2016-01-01 00:00:00‘, ‘2016-01-01 01:30:00‘, ‘2016-01-01 03:00:00‘, ‘2016-01-01 04:30:00‘, ‘2016-01-01 06:00:00‘], dtype=‘datetime64[ns]‘, freq=‘90T‘) In [59]: pd.date_range(‘1/1/2016‘,‘3/1/2016‘,freq = ‘WOM-3FRI‘) Out[59]: DatetimeIndex([‘2016-01-15‘, ‘2016-02-19‘], dtype=‘datetime64[ns]‘, freq=‘WOM-3FRI‘) In [60]: pd.date_range(‘1/1/2016‘,‘9/1/2016‘,freq = ‘WOM-3FRI‘) Out[60]: DatetimeIndex([‘2016-01-15‘, ‘2016-02-19‘, ‘2016-03-18‘, ‘2016-04-15‘, ‘2016-05-20‘, ‘2016-06-17‘, ‘2016-07-15‘, ‘2016-08-19‘], dtype=‘datetime64[ns]‘, freq=‘WOM-3FRI‘) In [73]: from pandas.tseries.offsets import Hour,Minute,Day,MonthEnd In [74]: now + Day(3) Out[74]: Timestamp(‘2011-12-02 00:00:00‘) In [75]: now + MonthEnd() Out[75]: Timestamp(‘2011-11-30 00:00:00‘) In [76]: now + MonthEnd(2) Out[76]: Timestamp(‘2011-12-31 00:00:00‘) In [77]: offset = MonthEnd() In [78]: offset.rollforward(now) Out[78]: Timestamp(‘2011-11-30 00:00:00‘) In [79]: offset.rollback(now) Out[79]: Timestamp(‘2011-10-31 00:00:00‘) In [80]: ts = Series(np.random.randn(5),index = pd.date_range(‘1/15/2000‘,periods = 5,freq = ‘4d‘)) In [81]: ts Out[81]: 2000-01-15 -0.518612 2000-01-19 0.749769 2000-01-23 -1.020916 2000-01-27 -1.164565 2000-01-31 0.695788 Freq: 4D, dtype: float64 In [82]: ts.groupby(offset.rollforward) Out[82]: <pandas.core.groupby.SeriesGroupBy object at 0x0000000008495D68> In [83]: ts.groupby(offset.rollforward).mean() Out[83]: 2000-01-31 -0.251707 dtype: float64 In [84]: ts.resample(‘M‘,how = ‘mean‘) C:/Anaconda2/Scripts/ipython-script.py:1: FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).mean() if __name__ == ‘__main__‘: Out[84]: 2000-01-31 -0.251707 Freq: M, dtype: float64
6.沿时间轴前移或后移
In [61]: ts = Series(np.random.randn(4),index = pd.date_range(‘1/1/2016‘,periods = 4,freq = ‘M‘)) In [62]: ts Out[62]: 2016-01-31 -2.002437 2016-02-29 -1.000022 2016-03-31 1.442409 2016-04-30 -0.578137 Freq: M, dtype: float64 In [63]: ts.shift(2) Out[63]: 2016-01-31 NaN 2016-02-29 NaN 2016-03-31 -2.002437 2016-04-30 -1.000022 Freq: M, dtype: float64 In [64]: ts.shift(-2) Out[64]: 2016-01-31 1.442409 2016-02-29 -0.578137 2016-03-31 NaN 2016-04-30 NaN Freq: M, dtype: float64 In [65]: ts / ts.shift(1) Out[65]: 2016-01-31 NaN 2016-02-29 0.499402 2016-03-31 -1.442377 2016-04-30 -0.400814 Freq: M, dtype: float64 In [66]: ts / ts.shift(1)-1 Out[66]: 2016-01-31 NaN 2016-02-29 -0.500598 2016-03-31 -2.442377 2016-04-30 -1.400814 Freq: M, dtype: float64 In [67]: ts.shift(2,freq = ‘M‘) Out[67]: 2016-03-31 -2.002437 2016-04-30 -1.000022 2016-05-31 1.442409 2016-06-30 -0.578137 Freq: M, dtype: float64 In [68]: ts.shift(3,freq = ‘D‘) Out[68]: 2016-02-03 -2.002437 2016-03-03 -1.000022 2016-04-03 1.442409 2016-05-03 -0.578137 dtype: float64 In [69]: ts.shift(1,freq = ‘3D‘) Out[69]: 2016-02-03 -2.002437 2016-03-03 -1.000022 2016-04-03 1.442409 2016-05-03 -0.578137 dtype: float64
时区与时期
1.本地化时间和转换时间
In [86]: rng = pd.date_range(‘3/9/2012 9:30‘,periods = 6,freq = ‘D‘) In [87]: ts = Series(np.random.randn(len(rng)),index = rng) In [88]: ts Out[88]: 2012-03-09 09:30:00 0.611651 2012-03-10 09:30:00 -0.343742 2012-03-11 09:30:00 0.082115 2012-03-12 09:30:00 0.560457 2012-03-13 09:30:00 -2.086978 2012-03-14 09:30:00 0.395750 Freq: D, dtype: float64 In [89]: ts_utc = ts.tz_localize(‘US/Pacific‘) In [90]: ts_utc Out[90]: 2012-03-09 09:30:00-08:00 0.611651 2012-03-10 09:30:00-08:00 -0.343742 2012-03-11 09:30:00-07:00 0.082115 2012-03-12 09:30:00-07:00 0.560457 2012-03-13 09:30:00-07:00 -2.086978 2012-03-14 09:30:00-07:00 0.395750 Freq: D, dtype: float64 In [91]: ts_utc.tz_convert(‘US/Eastern‘) Out[91]: 2012-03-09 12:30:00-05:00 0.611651 2012-03-10 12:30:00-05:00 -0.343742 2012-03-11 12:30:00-04:00 0.082115 2012-03-12 12:30:00-04:00 0.560457 2012-03-13 12:30:00-04:00 -2.086978 2012-03-14 12:30:00-04:00 0.395750 Freq: D, dtype: float64 In [92]: ts1 = ts[:7].tz_localize(‘Europe/London‘) In [93]: ts1 Out[93]: 2012-03-09 09:30:00+00:00 0.611651 2012-03-10 09:30:00+00:00 -0.343742 2012-03-11 09:30:00+00:00 0.082115 2012-03-12 09:30:00+00:00 0.560457 2012-03-13 09:30:00+00:00 -2.086978 2012-03-14 09:30:00+00:00 0.395750 Freq: D, dtype: float64 In [95]: ts2 = ts1[2:].tz_convert(‘Europe/Moscow‘) In [96]: ts2 Out[96]: 2012-03-11 13:30:00+04:00 0.082115 2012-03-12 13:30:00+04:00 0.560457 2012-03-13 13:30:00+04:00 -2.086978 2012-03-14 13:30:00+04:00 0.395750 Freq: D, dtype: float64 In [97]: result = ts1 + ts2 #不同时区之间的运算 In [98]: result Out[98]: 2012-03-09 09:30:00+00:00 NaN 2012-03-10 09:30:00+00:00 NaN 2012-03-11 09:30:00+00:00 0.164230 2012-03-12 09:30:00+00:00 1.120913 2012-03-13 09:30:00+00:00 -4.173957 2012-03-14 09:30:00+00:00 0.791499 Freq: D, dtype: float64
2.时期运算
In [9]: p = pd.Period(‘2016‘,freq = ‘A-DEC‘) In [10]: p Out[10]: Period(‘2016‘, ‘A-DEC‘) In [11]: p+5 Out[11]: Period(‘2021‘, ‘A-DEC‘) In [12]: rng = pd.Period(‘2015‘,freq = ‘A-DEC‘) - p In [13]: rng Out[13]: -1L In [14]: rng1 = pd.period_range(‘1/1/2000‘,‘6/30/2000‘,freq = ‘M‘) In [15]: rng1 Out[15]: PeriodIndex([‘2000-01‘, ‘2000-02‘, ‘2000-03‘, ‘2000-04‘, ‘2000-05‘, ‘2000-06‘], dtype=‘int64‘, freq=‘M‘) In [16]: type(rng1) Out[16]: pandas.tseries.period.PeriodIndex In [18]: Series(np.random.randn(6),index = rng1) Out[18]: 2000-01 -0.147543 2000-02 1.232261 2000-03 0.703814 2000-04 1.717671 2000-05 0.478153 2000-06 -0.291470 Freq: M, dtype: float64 In [19]: values = [‘2001Q3‘,‘2002Q2‘,‘2003Q1‘] In [20]: pd.PeriodIndex(values,freq = ‘Q-DEC‘) Out[20]: PeriodIndex([‘2001Q3‘, ‘2002Q2‘, ‘2003Q1‘], dtype=‘int64‘, freq=‘Q-DEC‘)
3.时期的频率转换
In [21]: p = pd.Period(‘2007‘,freq = ‘A-DEC‘) In [22]: p Out[22]: Period(‘2007‘, ‘A-DEC‘) In [23]: p.asfreq(‘M‘,how = ‘start‘) Out[23]: Period(‘2007-01‘, ‘M‘) In [24]: p.asfreq(‘M‘,how = ‘end‘) Out[24]: Period(‘2007-12‘, ‘M‘) In [25]: p = pd.Period(‘2007-08‘,‘M‘) In [26]: p.asfreq(‘A-JUN‘) Out[26]: Period(‘2008‘, ‘A-JUN‘) In [27]: rng = pd.period_range(‘2006‘,‘2009‘,freq = ‘A-DEC‘) In [28]: ts = Series(np.random.randn(len(rng)),index = rng) In [29]: ts Out[29]: 2006 0.415646 2007 0.206330 2008 -0.495015 2009 -0.665069 Freq: A-DEC, dtype: float64 In [30]: ts.asfreq(‘M‘,how = ‘start‘) Out[30]: 2006-01 0.415646 2007-01 0.206330 2008-01 -0.495015 2009-01 -0.665069 Freq: M, dtype: float64 In [31]: ts.asfreq(‘B‘,how = ‘end‘) Out[31]: 2006-12-29 0.415646 2007-12-31 0.206330 2008-12-31 -0.495015 2009-12-31 -0.665069 Freq: B, dtype: float64 #季度频率转换 In [32]: p = pd.Period(‘2012Q4‘,freq = ‘Q-JAN‘) In [33]: p Out[33]: Period(‘2012Q4‘, ‘Q-JAN‘) In [34]: p.asfreq(‘D‘,‘start‘) Out[34]: Period(‘2011-11-01‘, ‘D‘) In [35]: p.asfreq(‘D‘,‘end‘) Out[35]: Period(‘2012-01-31‘, ‘D‘) In [36]: rng = pd.period_range(‘2011Q3‘,‘2012Q4‘,freq = ‘Q-JAN‘) In [37]: rng.to_timestamp() Out[37]: DatetimeIndex([‘2010-08-01‘, ‘2010-11-01‘, ‘2011-02-01‘, ‘2011-05-01‘, ‘2011-08-01‘, ‘2011-11-01‘], dtype=‘datetime64[ns]‘, freq=‘QS-NOV‘) In [38]: new_rng = (rng.asfreq(‘B‘,‘e‘) - 1).asfreq(‘T‘,‘s‘) + 16 * 60 In [39]: new_rng Out[39]: PeriodIndex([‘2010-10-28 16:00‘, ‘2011-01-28 16:00‘, ‘2011-04-28 16:00‘, ‘2011-07-28 16:00‘, ‘2011-10-28 16:00‘, ‘2012-01-30 16:00‘], dtype=‘int64‘, freq=‘T‘) In [40]: new_rng.to_timestamp() Out[40]: DatetimeIndex([‘2010-10-28 16:00:00‘, ‘2011-01-28 16:00:00‘, ‘2011-04-28 16:00:00‘, ‘2011-07-28 16:00:00‘, ‘2011-10-28 16:00:00‘, ‘2012-01-30 16:00:00‘], dtype=‘datetime64[ns]‘, freq=None)
4.时间戳对象转时期索引对象
In [41]: rng = pd.date_range(‘1/1/2015‘,periods = 3,freq = ‘M‘) In [42]: ts = Series(np.random.randn(3),index = rng) In [43]: ts Out[43]: 2015-01-31 0.529904 2015-02-28 -0.349043 2015-03-31 0.046308 Freq: M, dtype: float64 In [44]: ts.to_period() Out[44]: 2015-01 0.529904 2015-02 -0.349043 2015-03 0.046308 Freq: M, dtype: float64 In [45]: rng = pd.date_range(‘1/29/2000‘,periods = 6,freq = ‘D‘) In [46]: ts2 = Series(np.random.randn(6),index = rng) In [47]: ts2 Out[47]: 2000-01-29 1.462543 2000-01-30 0.486943 2000-01-31 0.477313 2000-02-01 -1.160804 2000-02-02 0.306688 2000-02-03 0.016622 Freq: D, dtype: float64 In [48]: ts2.to_period(‘M‘) Out[48]: 2000-01 1.462543 2000-01 0.486943 2000-01 0.477313 2000-02 -1.160804 2000-02 0.306688 2000-02 0.016622 Freq: M, dtype: float64 In [49]: pts = ts2.to_period() In [50]: pts Out[50]: 2000-01-29 1.462543 2000-01-30 0.486943 2000-01-31 0.477313 2000-02-01 -1.160804 2000-02-02 0.306688 2000-02-03 0.016622 Freq: D, dtype: float64 In [51]: pts.to_timestamp(how = ‘end‘) Out[51]: 2000-01-29 1.462543 2000-01-30 0.486943 2000-01-31 0.477313 2000-02-01 -1.160804 2000-02-02 0.306688 2000-02-03 0.016622 Freq: D, dtype: float64
采样
1.重采样
#1.OHLC重采样 In [63]: rng = pd.date_range(‘1/1/2000‘,periods = 12,freq = ‘T‘) In [64]: ts = Series(np.random.randn(12),index = rng) In [65]: ts Out[65]: 2000-01-01 00:00:00 -0.975897 2000-01-01 00:01:00 -0.817074 2000-01-01 00:02:00 -0.438881 2000-01-01 00:03:00 -1.852057 2000-01-01 00:04:00 0.869463 2000-01-01 00:05:00 0.837448 2000-01-01 00:06:00 1.847643 2000-01-01 00:07:00 0.653615 2000-01-01 00:08:00 0.065392 2000-01-01 00:09:00 0.411093 2000-01-01 00:10:00 -1.184392 2000-01-01 00:11:00 0.523688 Freq: T, dtype: float64 In [66]: ts.resample(‘5min‘,how = ‘ohlc‘) C:/Anaconda2/Scripts/ipython-script.py:1: FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).ohlc() if __name__ == ‘__main__‘: Out[66]: open high low close 2000-01-01 00:00:00 -0.975897 0.869463 -1.852057 0.869463 2000-01-01 00:05:00 0.837448 1.847643 0.065392 0.411093 2000-01-01 00:10:00 -1.184392 0.523688 -1.184392 0.523688 #2.通过groupby进行重采样 In [67]: rng = pd.date_range(‘1/1/2000‘,periods = 100,freq = ‘D‘) In [68]: ts = Series(np.arange(100),index = rng) In [69]: ts.groupby(lambda x:x.month).mean() Out[69]: 1 15 2 45 3 75 4 95 dtype: int32 In [70]: ts.groupby(lambda x:x.weekday).mean() Out[70]: 0 47.5 1 48.5 2 49.5 3 50.5 4 51.5 5 49.0 6 50.0 dtype: float64
#对日期进行重采样
In [79]: annual_frame = frame.resample(‘A-DEC‘,how = ‘mean‘)
C:/Anaconda2/Scripts/ipython-script.py:1: FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...).mean()
if __name__ == ‘__main__‘:
In [80]: annual_frame
Out[80]:
Colorado Texas New York Ohio
2000 0.031121 0.267223 -0.328301 0.592017
2001 0.441272 -0.115328 0.073894 0.094406
In [81]: annual_frame.resample(‘Q-DEC‘,fill_method = ‘ffill‘)
C:/Anaconda2/Scripts/ipython-script.py:1: FutureWarning: fill_method is deprecated to .resample()
the new syntax is .resample(...).ffill()
if __name__ == ‘__main__‘:
Out[81]:
Colorado Texas New York Ohio
2000Q1 0.031121 0.267223 -0.328301 0.592017
2000Q2 0.031121 0.267223 -0.328301 0.592017
2000Q3 0.031121 0.267223 -0.328301 0.592017
2000Q4 0.031121 0.267223 -0.328301 0.592017
2001Q1 0.441272 -0.115328 0.073894 0.094406
2001Q2 0.441272 -0.115328 0.073894 0.094406
2001Q3 0.441272 -0.115328 0.073894 0.094406
2001Q4 0.441272 -0.115328 0.073894 0.094406
In [82]: annual_frame.resample(‘Q-DEC‘,fill_method = ‘ffill‘,convention = ‘start‘)
C:/Anaconda2/Scripts/ipython-script.py:1: FutureWarning: fill_method is deprecated to .resample()
the new syntax is .resample(...).ffill()
if __name__ == ‘__main__‘:
Out[82]:
Colorado Texas New York Ohio
2000Q1 0.031121 0.267223 -0.328301 0.592017
2000Q2 0.031121 0.267223 -0.328301 0.592017
2000Q3 0.031121 0.267223 -0.328301 0.592017
2000Q4 0.031121 0.267223 -0.328301 0.592017
2001Q1 0.441272 -0.115328 0.073894 0.094406
2001Q2 0.441272 -0.115328 0.073894 0.094406
2001Q3 0.441272 -0.115328 0.073894 0.094406
2001Q4 0.441272 -0.115328 0.073894 0.094406
In [83]: annual_frame.resample(‘Q-MAR‘,fill_method = ‘ffill‘)
C:/Anaconda2/Scripts/ipython-script.py:1: FutureWarning: fill_method is deprecated to .resample()
the new syntax is .resample(...).ffill()
if __name__ == ‘__main__‘:
Out[83]:
Colorado Texas New York Ohio
2000Q4 0.031121 0.267223 -0.328301 0.592017
2001Q1 0.031121 0.267223 -0.328301 0.592017
2001Q2 0.031121 0.267223 -0.328301 0.592017
2001Q3 0.031121 0.267223 -0.328301 0.592017
2001Q4 0.441272 -0.115328 0.073894 0.094406
2002Q1 0.441272 -0.115328 0.073894 0.094406
2002Q2 0.441272 -0.115328 0.073894 0.094406
2002Q3 0.441272 -0.115328 0.073894 0.094406
2.升采样和差值
In [71]: frame = DataFrame(np.random.randn(2,4),index = pd.date_range(‘1/1/2000‘,periods = 2,freq = ‘W-WED‘), ...: columns = [‘Colorado‘,‘Texas‘,‘New York‘,‘Ohio‘]) In [72]: frame Out[72]: Colorado Texas New York Ohio 2000-01-05 -0.391780 0.623187 2.168219 -0.434276 2000-01-12 0.611064 0.618274 -0.206151 -0.926855 In [73]: df_daily = frame.resample(‘D‘) In [74]: df_daily Out[74]: C:\Anaconda2\lib\site-packages\IPython\utils\dir2.py:65: FutureWarning: .resample() is now a deferred operation use .resample(...).mean() instead of .resample(...) canary = getattr(obj, ‘_ipython_canary_method_should_not_exist_‘, None) DatetimeIndexResampler [freq=<Day>, axis=0, closed=left, label=left, convention=start, base=0] In [75]: frame.resample(‘D‘,fill_method = ‘ffill‘) C:/Anaconda2/Scripts/ipython-script.py:1: FutureWarning: fill_method is deprecated to .resample() the new syntax is .resample(...).ffill() if __name__ == ‘__main__‘: Out[75]: Colorado Texas New York Ohio 2000-01-05 -0.391780 0.623187 2.168219 -0.434276 2000-01-06 -0.391780 0.623187 2.168219 -0.434276 2000-01-07 -0.391780 0.623187 2.168219 -0.434276 2000-01-08 -0.391780 0.623187 2.168219 -0.434276 2000-01-09 -0.391780 0.623187 2.168219 -0.434276 2000-01-10 -0.391780 0.623187 2.168219 -0.434276 2000-01-11 -0.391780 0.623187 2.168219 -0.434276 2000-01-12 0.611064 0.618274 -0.206151 -0.926855 In [76]: frame.resample(‘W-THU‘,fill_method = ‘ffill‘) C:/Anaconda2/Scripts/ipython-script.py:1: FutureWarning: fill_method is deprecated to .resample() the new syntax is .resample(...).ffill() if __name__ == ‘__main__‘: Out[76]: Colorado Texas New York Ohio 2000-01-06 -0.391780 0.623187 2.168219 -0.434276 2000-01-13 0.611064 0.618274 -0.206151 -0.926855 In [77]: frame = DataFrame(np.random.randn(24,4),index = pd.period_range(‘1-2000‘,‘12-2001‘,freq = ‘M‘), ...: columns = [‘Colorado‘,‘Texas‘,‘New York‘,‘Ohio‘]) In [78]: frame Out[78]: Colorado Texas New York Ohio 2000-01 -0.410764 0.493883 0.372263 1.292698 2000-02 -1.062080 0.918195 -0.724518 0.164564 2000-03 -0.043802 2.993207 -1.522635 0.838996 2000-04 -0.363853 -0.212628 0.528066 1.275305 2000-05 1.497088 -1.067684 0.092587 2.278649 2000-06 1.093441 0.807193 -2.299057 0.806335 2000-07 1.233143 -1.279697 1.340937 -0.293675 2000-08 0.361909 0.069654 0.431176 -0.126774 2000-09 -1.529141 -0.124773 -0.807565 1.108400 2000-10 0.259290 -0.493926 -1.511169 0.853348 2000-11 -0.941060 1.205524 -0.754967 -1.066688 2000-12 0.279276 -0.102266 0.915269 -0.026958 2001-01 0.238687 -0.057031 1.632795 -0.859731 2001-02 1.878275 0.344334 1.375966 -0.276001 2001-03 -0.694704 -0.566174 0.066509 -0.189110 2001-04 -0.080118 -0.182078 -0.356520 -0.458191 2001-05 -0.088858 -1.934695 -0.153724 0.347450 2001-06 1.742578 1.659385 0.031750 0.462085 2001-07 0.972973 0.797676 -0.561107 -0.200623 2001-08 0.628312 0.916874 -1.138119 1.766849 2001-09 -0.129747 -1.861520 0.523099 -1.124577 2001-10 -0.138502 0.644714 -1.045726 0.336395 2001-11 1.265029 -0.461168 -0.239620 0.835510 2001-12 -0.298662 -0.684251 0.751426 0.492821
以上是关于时间序列的主要内容,如果未能解决你的问题,请参考以下文章