Python鏃堕棿搴忓垪鏁版嵁鍒嗘瀽--浠ョず渚嬭鏄?/h1> Posted 鏈哄櫒瀛︿範AI绠楁硶宸ョ▼
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python鏃堕棿搴忓垪鏁版嵁鍒嗘瀽--浠ョず渚嬭鏄?/h1>
相关的知识,希望对你有一定的参考价值。
鍚慉I杞瀷鐨勭▼搴忓憳閮藉叧娉ㄤ簡杩欎釜鍙?/span>馃憞馃憞馃憞
鏈枃鐨勫唴瀹逛富瑕佹潵婧愪簬鍗氬锛?br>https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ 鑻辨枃涓嶉敊鐨勮鑰呭彲浠ュ墠鍘婚槄璇诲師鏂囥€?/p>
鍦ㄩ槄璇绘湰鏂囦箣鍓?锛屾帹鑽愬厛闃呰锛?/h2>
鏃堕棿搴忓垪棰勬祴涔?-ARIMA妯″瀷
http://www.cnblogs.com/bradleon/p/6827109.html
鏈枃涓昏鍒嗕负鍥涗釜閮ㄥ垎锛?/p>
鐢╬andas澶勭悊鏃跺簭鏁版嵁
鎬庢牱妫€鏌ユ椂搴忔暟鎹殑绋冲畾鎬?/p>
鎬庢牱璁╂椂搴忔暟鎹叿鏈夌ǔ瀹氭€?/p>
鏃跺簭鏁版嵁鐨勯娴?/p>
1. 鐢╬andas瀵煎叆鍜屽鐞嗘椂搴忔暟鎹?/h3>
绗竴姝ワ細瀵煎叆甯哥敤鐨勫簱
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab
import rcParams
#rcParams璁惧畾濂界敾甯冪殑澶у皬
rcParams['figure.figsize'] = 15, 6
绗簩姝ワ細瀵煎叆鏃跺簭鏁版嵁
鏁版嵁鏂囦欢鍙湪github锛?/p>
data = pd.read_csv(path+"AirPassengers.csv")
print data.head()
print '\n Data types:'
print data.dtypes
杩愯缁撴灉濡備笅锛氭暟鎹寘鎷瘡涓湀瀵瑰簲鐨刾assenger鐨勬暟鐩€?br>鍙互鐪嬪埌data宸茬粡鏄竴涓?span>DataFrame锛屽寘鍚袱鍒桵onth鍜?Passengers锛屽叾涓璏onth鐨勭被鍨嬫槸object锛岃€宨ndex鏄?,1,2...
绗笁姝ワ細澶勭悊鏃跺簭鏁版嵁
鎴戜滑闇€瑕佸皢Month鐨勭被鍨嬪彉涓?span>datetime锛屽悓鏃朵綔涓篿ndex銆?/p>
dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')
#---鍏朵腑parse_dates 琛ㄦ槑閫夋嫨鏁版嵁涓殑鍝釜column浣滀负date-time淇℃伅锛?br>
#---index_col 鍛婅瘔pandas浠ュ摢涓猚olumn浣滀负 index
#--- date_parser 浣跨敤涓€涓猣unction(鏈枃鐢╨ambda琛ㄨ揪寮忎唬鏇?锛屼娇涓€涓猻tring杞崲涓轰竴涓猟atetime鍙橀噺
data = pd.read_csv('AirPassengers.csv', parse_dates=['Month'], index_col='Month',date_parser=dateparse)
print data.head()
print data.index
缁撴灉濡備笅锛氬彲浠ョ湅鍒癲ata鐨刬ndex宸茬粡鍙樻垚datetime绫诲瀷鐨凪onth浜嗐€?br>2.鎬庢牱妫€鏌ユ椂搴忔暟鎹殑绋冲畾鎬?Stationarity)
鍥犱负ARIMA妯″瀷瑕佹眰鏁版嵁鏄ǔ瀹氱殑锛屾墍浠ヨ繖涓€姝ヨ嚦鍏抽噸瑕併€?/p>
1. 鍒ゆ柇鏁版嵁鏄ǔ瀹氱殑甯稿熀浜庡浜庢椂闂存槸甯搁噺鐨勫嚑涓粺璁¢噺锛?/h4>
甯搁噺鐨勫潎鍊?/p>
甯搁噺鐨勬柟宸?/p>
涓庢椂闂寸嫭绔嬬殑鑷崗鏂瑰樊
鐢ㄥ浘鍍忚鏄庡涓嬶細
鍧囧€?br>2. python鍒ゆ柇鏃跺簭鏁版嵁绋冲畾鎬?/h4>
鏈変袱绉嶆柟娉曪細
1.Rolling statistic-- 鍗虫瘡涓椂闂存鍐呯殑骞冲潎鐨勬暟鎹潎鍊煎拰鏍囧噯宸儏鍐点€?/p>
Dickey-Fuller Test -- 杩欎釜姣旇緝澶嶆潅锛屽ぇ鑷存剰鎬濆氨鏄湪涓€瀹氱疆淇℃按骞充笅锛屽浜庢椂搴忔暟鎹亣璁?Null hypothesis: 闈炵ǔ瀹氥€?br>if 閫氳繃妫€楠屽€?statistic)< 涓寸晫鍊?critical value)锛屽垯鎷掔粷null hypothesis锛屽嵆鏁版嵁鏄ǔ瀹氱殑锛涘弽涔嬪垯鏄潪绋冲畾鐨勩€?/p>
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
#杩欓噷浠ヤ竴骞翠负涓€涓獥鍙o紝姣忎竴涓椂闂磘鐨勫€肩敱瀹冨墠闈?2涓湀锛堝寘鎷嚜宸憋級鐨勫潎鍊间唬鏇匡紝鏍囧噯宸悓鐞嗐€?/span>
rolmean = pd.rolling_mean(timeseries,window=12)
rolstd = pd.rolling_std(timeseries, window=12)
#plot rolling statistics:
fig = plt.figure()
fig.add_subplot()
orig = plt.plot(timeseries, color = 'blue',label='Original')
mean = plt.plot(rolmean , color = 'red',label = 'rolling mean')
std = plt.plot(rolstd, color = 'black', label= 'Rolling standard deviation')
plt.legend(loc = 'best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)
#Dickey-Fuller test:
print 'Results of Dickey-Fuller Test:'
dftest = adfuller(timeseries,autolag = 'AIC') #dftest鐨勮緭鍑哄墠涓€椤逛緷娆′负妫€娴嬪€硷紝p鍊硷紝婊炲悗鏁帮紝浣跨敤鐨勮娴嬫暟锛屽悇涓疆淇″害涓嬬殑涓寸晫鍊?/span>
dfoutput = pd.Series(dftest[0:4],index = ['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical value (%s)' %key] = value
print dfoutput
ts = data['#Passengers']
test_stationarity(ts)
缁撴灉濡備笅锛?br>
3. 璁╂椂搴忔暟鎹彉鎴愮ǔ瀹氱殑鏂规硶
璁╂暟鎹彉寰椾笉绋冲畾鐨勫師鍥犱富瑕佹湁淇╋細
瓒嬪娍锛坱rend锛?鏁版嵁闅忕潃鏃堕棿鍙樺寲銆傛瘮濡傝鍗囬珮鎴栬€呴檷浣庛€?/p>
瀛h妭鎬?seasonality)-鏁版嵁鍦ㄧ壒瀹氱殑鏃堕棿娈靛唴鍙樺姩銆傛瘮濡傝鑺傚亣鏃ワ紝鎴栬€呮椿鍔ㄥ鑷存暟鎹殑寮傚父銆?/p>
鐢变簬鍘熸暟鎹€煎煙鑼冨洿姣旇緝澶э紝涓轰簡缂╁皬鍊煎煙锛屽悓鏃朵繚鐣欏叾浠栦俊鎭紝甯哥敤鐨勬柟娉曟槸瀵规暟鍖?/strong>锛屽彇log銆?/p>
ts_log = np.log(ts)
妫€娴嬪拰鍘婚櫎瓒嬪娍
閫氬父鏈変笁绉嶆柟娉曪細
鑱氬悎 : 灏嗘椂闂磋酱缂╃煭锛屼互涓€娈垫椂闂村唴鏄熸湡/鏈?骞寸殑鍧囧€间綔涓烘暟鎹€笺€備娇涓嶅悓鏃堕棿娈靛唴鐨勫€煎樊璺濈缉灏忋€?/p>
骞虫粦锛?浠ヤ竴涓粦鍔ㄧ獥鍙e唴鐨勫潎鍊间唬鏇垮師鏉ョ殑鍊硷紝涓轰簡浣垮€间箣闂寸殑宸窛缂╁皬
澶氶」寮忚繃婊わ細鐢ㄤ竴涓洖褰掓ā鍨嬫潵鎷熷悎鐜版湁鏁版嵁锛屼娇寰楁暟鎹洿骞虫粦銆?/p>
鏈枃涓昏浣跨敤骞虫粦鏂规硶
Moving Average--绉诲姩骞冲潎
moving_avg = pd.rolling_mean(ts_log,12)
plt.plot(ts_log ,color = 'blue')
plt.plot(moving_avg, color='red')
鍙互鐪嬪嚭moving_average瑕佹瘮鍘熷€煎钩婊戣澶氥€?/p>
鐒跺悗浣滃樊锛?/p>
ts_log_moving_avg_diff = ts_log-moving_avg
ts_log_moving_avg_diff.dropna(inplace = True)
test_stationarity(ts_log_moving_avg_diff)
涓婇潰鐨勬柟娉曟槸灏嗘墍鏈夌殑鏃堕棿骞崇瓑鐪嬪緟锛岃€屽湪璁稿鎯呭喌涓嬶紝鍙互璁や负瓒婅繎鐨勬椂鍒昏秺閲嶈銆傛墍浠ュ紩鍏ユ寚鏁板姞鏉冪Щ鍔ㄥ钩鍧?- Exponentially-weighted moving average.锛坧andas涓€氳繃ewma()鍑芥暟鎻愪緵浜嗘鍔熻兘銆傦級
# halflife鐨勫€煎喅瀹氫簡琛板噺鍥犲瓙alpha锛? alpha = 1 - exp(log(0.5) / halflife)expweighted_avg = pd.ewma(ts_log,halflife=12)
ts_log_ewma_diff = ts_log - expweighted_avg
test_stationarity(ts_log_ewma_diff)
妫€娴嬪拰鍘婚櫎瀛h妭鎬?br>鏈変袱绉嶆柟娉曪細
1 宸垎鍖栵細 浠ョ壒瀹氭粸鍚庢暟鐩殑鏃跺埢鐨勫€肩殑浣滃樊
2 鍒嗚В锛?瀵硅秼鍔垮拰瀛h妭鎬у垎鍒缓妯″湪绉婚櫎瀹冧滑
Differencing--宸垎
ts_log_diff = ts_log - ts_log.shift()
ts_log_diff.dropna(inplace=True)
test_stationarity(ts_log_diff)
3.Decomposing-鍒嗚В
#鍒嗚В(decomposing) 鍙互鐢ㄦ潵鎶婃椂搴忔暟鎹腑鐨勮秼鍔垮拰鍛ㄦ湡鎬ф暟鎹兘鍒嗙鍑烘潵:
from statsmodels.tsa.seasonal import seasonal_decompose
def decompose(timeseries):
# 杩斿洖鍖呭惈涓変釜閮ㄥ垎 trend锛堣秼鍔块儴鍒嗭級 锛?seasonal锛堝鑺傛€ч儴鍒嗭級 鍜宺esidual (娈嬬暀閮ㄥ垎)
decomposition = seasonal_decompose(timeseries)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
plt.subplot(411)
plt.plot(ts_log, label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()
return trend , seasonal, residual
#娑堥櫎浜唗rend 鍜宻easonal涔嬪悗锛屽彧瀵箁esidual閮ㄥ垎浣滀负鎯宠鐨勬椂搴忔暟鎹繘琛屽鐞?/span>trend , seasonal, residual = decompose(ts_log)
residual.dropna(inplace=True)
test_stationarity(residual)
4. 瀵规椂搴忔暟鎹繘琛岄娴?/h3>
鍋囪缁忚繃澶勭悊锛屽凡缁忓緱鍒颁簡绋冲畾鏃跺簭鏁版嵁銆傛帴涓嬫潵锛屾垜浠娇鐢ˋRIMA妯″瀷
瀵规暟鎹凡缁忛娴嬨€侫RIMA鐨勪粙缁嶅彲浠ヨ鏈洰褰曚笅鐨勫彟涓€绡囨枃绔犮€?/p>
step1锛?閫氳繃ACF,PACF杩涜ARIMA锛坧锛宒锛宷锛夌殑p锛宷鍙傛暟浼拌
鐢卞墠鏂嘍ifferencing閮ㄥ垎宸茬煡锛屼竴闃跺樊鍒嗗悗鏁版嵁宸茬粡绋冲畾锛屾墍浠=1銆?br>鎵€浠ョ敤涓€闃跺樊鍒嗗寲鐨則s_log_diff = ts_log - ts_log.shift() 浣滀负杈撳叆銆?br>绛変环浜?span class="">
yt=Yt鈭?/span>Yt鈭?/span>1
浣滀负杈撳叆銆?/p>
鍏堢敾鍑篈CF,PACF鐨勫浘鍍?浠g爜濡備笅锛?/p>
#ACF and PACF plots:from statsmodels.tsa.stattools import acf, pacf
lag_acf = acf(ts_log_diff, nlags=20)
lag_pacf = pacf(ts_log_diff, nlags=20, method='ols')#Plot ACF: plt.subplot(121)
plt.plot(lag_acf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Autocorrelation Function')#Plot PACF:plt.subplot(122)
plt.plot(lag_pacf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Partial Autocorrelation Function')
plt.tight_layout()
鍥句腑锛屼笂涓嬩袱鏉$伆绾夸箣闂存槸缃俊鍖洪棿锛宲鐨勫€煎氨鏄疉CF绗竴娆$┛杩囦笂缃俊鍖洪棿鏃剁殑妯酱鍊笺€俼鐨勫€煎氨鏄疨ACF绗竴娆$┛杩囦笂缃俊鍖洪棿鐨勬í杞村€笺€傛墍浠ヤ粠鍥句腑鍙互寰楀埌p=2锛宷=2銆?/p>
step2锛?寰楀埌鍙傛暟浼拌鍊紁锛宒锛宷涔嬪悗锛岀敓鎴愭ā鍨婣RIMA锛坧锛宒锛宷锛?br>涓轰簡绐佸嚭宸埆锛岀敤涓夌鍙傛暟鍙栧€肩殑涓変釜妯″瀷浣滀负瀵规瘮銆?br>妯″瀷1锛欰R妯″瀷(ARIMA(2,1,0))
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(ts_log, order=(2, 1, 0))
results_AR = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_AR.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_AR.fittedvalues-ts_log_diff)**2))
鍥句腑锛岃摑绾挎槸杈撳叆鍊硷紝绾㈢嚎鏄ā鍨嬬殑鎷熷悎鍊硷紝RSS鐨勭疮璁″钩鏂硅宸€?/p>
妯″瀷2锛歁A妯″瀷锛圓RIMA锛?,1,2锛夛級
model = ARIMA(ts_log, order=(0, 1, 2))
results_MA = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_MA.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_MA.fittedvalues-ts_log_diff)**2))
妯″瀷3锛欰RIMA妯″瀷(ARIMA(2,1,2))
model = ARIMA(ts_log, order=(2, 1, 2))
results_ARIMA = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_ARIMA.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_ARIMA.fittedvalues-ts_log_diff)**2))
鐢盧SS锛屽彲鐭ユā鍨?--ARIMA锛?,1,2锛夌殑鎷熷悎搴︽渶濂斤紝鎵€浠ユ垜浠‘瀹氫簡鏈€缁堢殑棰勬祴妯″瀷銆?/p>
step3: 灏嗘ā鍨嬩唬鍏ュ師鏁版嵁杩涜棰勬祴
鍥犱负涓婇潰鐨勬ā鍨嬬殑鎷熷悎鍊兼槸瀵瑰師鏁版嵁杩涜绋冲畾鍖栦箣鍚庣殑杈撳叆鏁版嵁鐨勬嫙鍚堬紝鎵€浠ラ渶瑕佸鎷熷悎鍊艰繘琛岀浉搴斿鐞嗙殑閫嗘搷浣滐紝浣垮緱瀹冨洖鍒颁笌鍘熸暟鎹竴鑷寸殑灏哄害銆?/p>
#ARIMA鎷熷悎鐨勫叾瀹炴槸涓€闃跺樊鍒唗s_log_diff锛宲redictions_ARIMA_diff[i]鏄i涓湀涓巌-1涓湀鐨則s_log鐨勫樊鍊笺€?/span>#鐢变簬宸垎鍖栨湁涓€闃舵粸鍚庯紝鎵€浠ョ涓€涓湀鐨勬暟鎹槸绌虹殑锛?br>
predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)print predictions_ARIMA_diff.head()
#绱姞鐜版湁鐨刣iff锛屽緱鍒版瘡涓€间笌绗竴涓湀鐨勫樊鍒嗭紙鍚宭og搴曠殑鎯呭喌涓嬶級銆?br>
#鍗硃redictions_ARIMA_diff_cumsum[i] 鏄i涓湀涓庣1涓湀鐨則s_log鐨勫樊鍊笺€?br>
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
#鍏坱s_log_diff => ts_log=>ts_log => ts
#鍏堜互ts_log鐨勭涓€涓€间綔涓哄熀鏁帮紝澶嶅埗缁欐墍鏈夊€硷紝鐒跺悗姣忎釜鏃跺埢鐨勫€肩疮鍔犱笌绗竴涓湀瀵瑰簲鐨勫樊鍊?杩欐牱灏辫В鍐充簡锛岀涓€涓湀diff鏁版嵁涓虹┖鐨勯棶棰樹簡)
#鐒跺悗寰楀埌浜唒redictions_ARIMA_log => predictions_ARIMA
predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
predictions_ARIMA = np.exp(predictions_ARIMA_log)
plt.figure()
plt.plot(ts)
plt.plot(predictions_ARIMA)
plt.title('RMSE: %.4f'% np.sqrt(sum((predictions_ARIMA-ts)**2)/len(ts)))
5.鎬荤粨
鍓嶉潰涓€绡囨枃绔狅紝鎬荤粨浜咥RIMA寤烘ā鐨勬楠ゃ€?br>(1). 鑾峰彇琚娴嬬郴缁熸椂闂村簭鍒楁暟鎹紱
(2). 瀵规暟鎹粯鍥撅紝瑙傛祴鏄惁涓哄钩绋虫椂闂村簭鍒楋紱瀵逛簬闈炲钩绋虫椂闂村簭鍒楄鍏堣繘琛宒闃跺樊鍒嗚繍绠楋紝鍖栦负骞崇ǔ鏃堕棿搴忓垪锛?br>(3). 缁忚繃绗簩姝ュ鐞嗭紝宸茬粡寰楀埌骞崇ǔ鏃堕棿搴忓垪銆傝瀵瑰钩绋虫椂闂村簭鍒楀垎鍒眰寰楀叾鑷浉鍏崇郴鏁癆CF 鍜屽亸鑷浉鍏崇郴鏁癙ACF锛岄€氳繃瀵硅嚜鐩稿叧鍥惧拰鍋忚嚜鐩稿叧鍥剧殑鍒嗘瀽锛屽緱鍒版渶浣崇殑闃跺眰 p 鍜岄樁鏁?q
(4). 鐢变互涓婂緱鍒扮殑d銆乹銆乸锛屽緱鍒癆RIMA妯″瀷銆傜劧鍚庡紑濮嬪寰楀埌鐨勬ā鍨嬭繘琛屾ā鍨嬫楠屻€?br>鍏蜂綋渚嬪瓙浼氬湪鍙︿竴绡囨枃绔犱腑缁欏嚭銆?/p>
鏈枃缁撳悎涓€涓緥瀛愶紝璇存槑python濡備綍瑙e喅锛?br>1.鍒ゆ柇涓€涓椂搴忔暟鎹槸鍚︽槸绋冲畾銆傚搴旀楠?1)
鎬庢牱璁╂椂搴忔暟鎹ǔ瀹氬寲銆傚搴旀楠?2)
浣跨敤ARIMA妯″瀷杩涜鏃跺簭鏁版嵁棰勬祴銆傚搴旀楠?3,4)
鍙﹀瀵筪ata science鎰熷叴瓒g殑鍚屽鍙互鍏虫敞杩欎釜缃戠珯锛屽共璐ц繕鎸哄鐨勩€?br>https://www.analyticsvidhya.com/blog/
娣卞害瀛︿範銆佹満鍣ㄥ涔犮€佹暟鎹垎鏋愩€乸ython
以上是关于Python鏃堕棿搴忓垪鏁版嵁鍒嗘瀽--浠ョず渚嬭鏄?/h1>
的主要内容,如果未能解决你的问题,请参考以下文章
鍟嗕笟鏁版嵁鍒嗘瀽鍜屽彲瑙嗗寲BI锛岄浂鍩虹鍏ラ棬闇€瑕佸涔咃紵
Posted 鏈哄櫒瀛︿範AI绠楁硶宸ョ▼
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python鏃堕棿搴忓垪鏁版嵁鍒嗘瀽--浠ョず渚嬭鏄?/h1>
鍚慉I杞瀷鐨勭▼搴忓憳閮藉叧娉ㄤ簡杩欎釜鍙?/span>馃憞馃憞馃憞
鏈枃鐨勫唴瀹逛富瑕佹潵婧愪簬鍗氬锛?br>https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ 鑻辨枃涓嶉敊鐨勮鑰呭彲浠ュ墠鍘婚槄璇诲師鏂囥€?/p>
鍦ㄩ槄璇绘湰鏂囦箣鍓?锛屾帹鑽愬厛闃呰锛?/h2>
鏃堕棿搴忓垪棰勬祴涔?-ARIMA妯″瀷
http://www.cnblogs.com/bradleon/p/6827109.html
鏈枃涓昏鍒嗕负鍥涗釜閮ㄥ垎锛?/p>
鐢╬andas澶勭悊鏃跺簭鏁版嵁
鎬庢牱妫€鏌ユ椂搴忔暟鎹殑绋冲畾鎬?/p>
鎬庢牱璁╂椂搴忔暟鎹叿鏈夌ǔ瀹氭€?/p>
鏃跺簭鏁版嵁鐨勯娴?/p>
1. 鐢╬andas瀵煎叆鍜屽鐞嗘椂搴忔暟鎹?/h3>
绗竴姝ワ細瀵煎叆甯哥敤鐨勫簱
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab
import rcParams
#rcParams璁惧畾濂界敾甯冪殑澶у皬
rcParams['figure.figsize'] = 15, 6
绗簩姝ワ細瀵煎叆鏃跺簭鏁版嵁
鏁版嵁鏂囦欢鍙湪github锛?/p>
data = pd.read_csv(path+"AirPassengers.csv")
print data.head()
print '\n Data types:'
print data.dtypes
杩愯缁撴灉濡備笅锛氭暟鎹寘鎷瘡涓湀瀵瑰簲鐨刾assenger鐨勬暟鐩€?br>鍙互鐪嬪埌data宸茬粡鏄竴涓?span>DataFrame锛屽寘鍚袱鍒桵onth鍜?Passengers锛屽叾涓璏onth鐨勭被鍨嬫槸object锛岃€宨ndex鏄?,1,2...
绗笁姝ワ細澶勭悊鏃跺簭鏁版嵁
鎴戜滑闇€瑕佸皢Month鐨勭被鍨嬪彉涓?span>datetime锛屽悓鏃朵綔涓篿ndex銆?/p>
dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')
#---鍏朵腑parse_dates 琛ㄦ槑閫夋嫨鏁版嵁涓殑鍝釜column浣滀负date-time淇℃伅锛?br>
#---index_col 鍛婅瘔pandas浠ュ摢涓猚olumn浣滀负 index
#--- date_parser 浣跨敤涓€涓猣unction(鏈枃鐢╨ambda琛ㄨ揪寮忎唬鏇?锛屼娇涓€涓猻tring杞崲涓轰竴涓猟atetime鍙橀噺
data = pd.read_csv('AirPassengers.csv', parse_dates=['Month'], index_col='Month',date_parser=dateparse)
print data.head()
print data.index
缁撴灉濡備笅锛氬彲浠ョ湅鍒癲ata鐨刬ndex宸茬粡鍙樻垚datetime绫诲瀷鐨凪onth浜嗐€?br>2.鎬庢牱妫€鏌ユ椂搴忔暟鎹殑绋冲畾鎬?Stationarity)
鍥犱负ARIMA妯″瀷瑕佹眰鏁版嵁鏄ǔ瀹氱殑锛屾墍浠ヨ繖涓€姝ヨ嚦鍏抽噸瑕併€?/p>
1. 鍒ゆ柇鏁版嵁鏄ǔ瀹氱殑甯稿熀浜庡浜庢椂闂存槸甯搁噺鐨勫嚑涓粺璁¢噺锛?/h4>
甯搁噺鐨勫潎鍊?/p>
甯搁噺鐨勬柟宸?/p>
涓庢椂闂寸嫭绔嬬殑鑷崗鏂瑰樊
甯搁噺鐨勫潎鍊?/p>
甯搁噺鐨勬柟宸?/p>
涓庢椂闂寸嫭绔嬬殑鑷崗鏂瑰樊
鐢ㄥ浘鍍忚鏄庡涓嬶細
鍧囧€?br>2. python鍒ゆ柇鏃跺簭鏁版嵁绋冲畾鎬?/h4>
鏈変袱绉嶆柟娉曪細
1.Rolling statistic-- 鍗虫瘡涓椂闂存鍐呯殑骞冲潎鐨勬暟鎹潎鍊煎拰鏍囧噯宸儏鍐点€?/p>Dickey-Fuller Test -- 杩欎釜姣旇緝澶嶆潅锛屽ぇ鑷存剰鎬濆氨鏄湪涓€瀹氱疆淇℃按骞充笅锛屽浜庢椂搴忔暟鎹亣璁?Null hypothesis: 闈炵ǔ瀹氥€?br>if 閫氳繃妫€楠屽€?statistic)< 涓寸晫鍊?critical value)锛屽垯鎷掔粷null hypothesis锛屽嵆鏁版嵁鏄ǔ瀹氱殑锛涘弽涔嬪垯鏄潪绋冲畾鐨勩€?/p>
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries): #杩欓噷浠ヤ竴骞翠负涓€涓獥鍙o紝姣忎竴涓椂闂磘鐨勫€肩敱瀹冨墠闈?2涓湀锛堝寘鎷嚜宸憋級鐨勫潎鍊间唬鏇匡紝鏍囧噯宸悓鐞嗐€?/span> rolmean = pd.rolling_mean(timeseries,window=12) rolstd = pd.rolling_std(timeseries, window=12) #plot rolling statistics: fig = plt.figure() fig.add_subplot() orig = plt.plot(timeseries, color = 'blue',label='Original') mean = plt.plot(rolmean , color = 'red',label = 'rolling mean') std = plt.plot(rolstd, color = 'black', label= 'Rolling standard deviation') plt.legend(loc = 'best') plt.title('Rolling Mean & Standard Deviation') plt.show(block=False) #Dickey-Fuller test: print 'Results of Dickey-Fuller Test:' dftest = adfuller(timeseries,autolag = 'AIC') #dftest鐨勮緭鍑哄墠涓€椤逛緷娆′负妫€娴嬪€硷紝p鍊硷紝婊炲悗鏁帮紝浣跨敤鐨勮娴嬫暟锛屽悇涓疆淇″害涓嬬殑涓寸晫鍊?/span> dfoutput = pd.Series(dftest[0:4],index = ['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items(): dfoutput['Critical value (%s)' %key] = value print dfoutput ts = data['#Passengers'] test_stationarity(ts)缁撴灉濡備笅锛?br>
3. 璁╂椂搴忔暟鎹彉鎴愮ǔ瀹氱殑鏂规硶
璁╂暟鎹彉寰椾笉绋冲畾鐨勫師鍥犱富瑕佹湁淇╋細
瓒嬪娍锛坱rend锛?鏁版嵁闅忕潃鏃堕棿鍙樺寲銆傛瘮濡傝鍗囬珮鎴栬€呴檷浣庛€?/p>
瀛h妭鎬?seasonality)-鏁版嵁鍦ㄧ壒瀹氱殑鏃堕棿娈靛唴鍙樺姩銆傛瘮濡傝鑺傚亣鏃ワ紝鎴栬€呮椿鍔ㄥ鑷存暟鎹殑寮傚父銆?/p>
鐢变簬鍘熸暟鎹€煎煙鑼冨洿姣旇緝澶э紝涓轰簡缂╁皬鍊煎煙锛屽悓鏃朵繚鐣欏叾浠栦俊鎭紝甯哥敤鐨勬柟娉曟槸瀵规暟鍖?/strong>锛屽彇log銆?/p>
ts_log = np.log(ts)
妫€娴嬪拰鍘婚櫎瓒嬪娍
閫氬父鏈変笁绉嶆柟娉曪細鑱氬悎 : 灏嗘椂闂磋酱缂╃煭锛屼互涓€娈垫椂闂村唴鏄熸湡/鏈?骞寸殑鍧囧€间綔涓烘暟鎹€笺€備娇涓嶅悓鏃堕棿娈靛唴鐨勫€煎樊璺濈缉灏忋€?/p>
骞虫粦锛?浠ヤ竴涓粦鍔ㄧ獥鍙e唴鐨勫潎鍊间唬鏇垮師鏉ョ殑鍊硷紝涓轰簡浣垮€间箣闂寸殑宸窛缂╁皬
澶氶」寮忚繃婊わ細鐢ㄤ竴涓洖褰掓ā鍨嬫潵鎷熷悎鐜版湁鏁版嵁锛屼娇寰楁暟鎹洿骞虫粦銆?/p>
鏈枃涓昏浣跨敤骞虫粦鏂规硶
Moving Average--绉诲姩骞冲潎
moving_avg = pd.rolling_mean(ts_log,12) plt.plot(ts_log ,color = 'blue') plt.plot(moving_avg, color='red')
鍙互鐪嬪嚭moving_average瑕佹瘮鍘熷€煎钩婊戣澶氥€?/p>
鐒跺悗浣滃樊锛?/p>
ts_log_moving_avg_diff = ts_log-moving_avg ts_log_moving_avg_diff.dropna(inplace = True) test_stationarity(ts_log_moving_avg_diff)
涓婇潰鐨勬柟娉曟槸灏嗘墍鏈夌殑鏃堕棿骞崇瓑鐪嬪緟锛岃€屽湪璁稿鎯呭喌涓嬶紝鍙互璁や负瓒婅繎鐨勬椂鍒昏秺閲嶈銆傛墍浠ュ紩鍏ユ寚鏁板姞鏉冪Щ鍔ㄥ钩鍧?- Exponentially-weighted moving average.锛坧andas涓€氳繃ewma()鍑芥暟鎻愪緵浜嗘鍔熻兘銆傦級
# halflife鐨勫€煎喅瀹氫簡琛板噺鍥犲瓙alpha锛? alpha = 1 - exp(log(0.5) / halflife)expweighted_avg = pd.ewma(ts_log,halflife=12) ts_log_ewma_diff = ts_log - expweighted_avg test_stationarity(ts_log_ewma_diff)
妫€娴嬪拰鍘婚櫎瀛h妭鎬?br>鏈変袱绉嶆柟娉曪細
1 宸垎鍖栵細 浠ョ壒瀹氭粸鍚庢暟鐩殑鏃跺埢鐨勫€肩殑浣滃樊
2 鍒嗚В锛?瀵硅秼鍔垮拰瀛h妭鎬у垎鍒缓妯″湪绉婚櫎瀹冧滑
Differencing--宸垎
ts_log_diff = ts_log - ts_log.shift()
ts_log_diff.dropna(inplace=True)
test_stationarity(ts_log_diff)
3.Decomposing-鍒嗚В
#鍒嗚В(decomposing) 鍙互鐢ㄦ潵鎶婃椂搴忔暟鎹腑鐨勮秼鍔垮拰鍛ㄦ湡鎬ф暟鎹兘鍒嗙鍑烘潵:
from statsmodels.tsa.seasonal import seasonal_decompose
def decompose(timeseries):
# 杩斿洖鍖呭惈涓変釜閮ㄥ垎 trend锛堣秼鍔块儴鍒嗭級 锛?seasonal锛堝鑺傛€ч儴鍒嗭級 鍜宺esidual (娈嬬暀閮ㄥ垎)
decomposition = seasonal_decompose(timeseries)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
plt.subplot(411)
plt.plot(ts_log, label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()
return trend , seasonal, residual
#娑堥櫎浜唗rend 鍜宻easonal涔嬪悗锛屽彧瀵箁esidual閮ㄥ垎浣滀负鎯宠鐨勬椂搴忔暟鎹繘琛屽鐞?/span>trend , seasonal, residual = decompose(ts_log)
residual.dropna(inplace=True)
test_stationarity(residual)
4. 瀵规椂搴忔暟鎹繘琛岄娴?/h3>
鍋囪缁忚繃澶勭悊锛屽凡缁忓緱鍒颁簡绋冲畾鏃跺簭鏁版嵁銆傛帴涓嬫潵锛屾垜浠娇鐢ˋRIMA妯″瀷
瀵规暟鎹凡缁忛娴嬨€侫RIMA鐨勪粙缁嶅彲浠ヨ鏈洰褰曚笅鐨勫彟涓€绡囨枃绔犮€?/p>
step1锛?閫氳繃ACF,PACF杩涜ARIMA锛坧锛宒锛宷锛夌殑p锛宷鍙傛暟浼拌
鐢卞墠鏂嘍ifferencing閮ㄥ垎宸茬煡锛屼竴闃跺樊鍒嗗悗鏁版嵁宸茬粡绋冲畾锛屾墍浠=1銆?br>鎵€浠ョ敤涓€闃跺樊鍒嗗寲鐨則s_log_diff = ts_log - ts_log.shift() 浣滀负杈撳叆銆?br>绛変环浜?span class="">
yt=Yt鈭?/span>Yt鈭?/span>1
浣滀负杈撳叆銆?/p>
鍏堢敾鍑篈CF,PACF鐨勫浘鍍?浠g爜濡備笅锛?/p>
#ACF and PACF plots:from statsmodels.tsa.stattools import acf, pacf
lag_acf = acf(ts_log_diff, nlags=20)
lag_pacf = pacf(ts_log_diff, nlags=20, method='ols')#Plot ACF: plt.subplot(121)
plt.plot(lag_acf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Autocorrelation Function')#Plot PACF:plt.subplot(122)
plt.plot(lag_pacf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Partial Autocorrelation Function')
plt.tight_layout()
鍥句腑锛屼笂涓嬩袱鏉$伆绾夸箣闂存槸缃俊鍖洪棿锛宲鐨勫€煎氨鏄疉CF绗竴娆$┛杩囦笂缃俊鍖洪棿鏃剁殑妯酱鍊笺€俼鐨勫€煎氨鏄疨ACF绗竴娆$┛杩囦笂缃俊鍖洪棿鐨勬í杞村€笺€傛墍浠ヤ粠鍥句腑鍙互寰楀埌p=2锛宷=2銆?/p>
step2锛?寰楀埌鍙傛暟浼拌鍊紁锛宒锛宷涔嬪悗锛岀敓鎴愭ā鍨婣RIMA锛坧锛宒锛宷锛?br>涓轰簡绐佸嚭宸埆锛岀敤涓夌鍙傛暟鍙栧€肩殑涓変釜妯″瀷浣滀负瀵规瘮銆?br>妯″瀷1锛欰R妯″瀷(ARIMA(2,1,0))
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(ts_log, order=(2, 1, 0))
results_AR = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_AR.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_AR.fittedvalues-ts_log_diff)**2))
鍥句腑锛岃摑绾挎槸杈撳叆鍊硷紝绾㈢嚎鏄ā鍨嬬殑鎷熷悎鍊硷紝RSS鐨勭疮璁″钩鏂硅宸€?/p>
妯″瀷2锛歁A妯″瀷锛圓RIMA锛?,1,2锛夛級
model = ARIMA(ts_log, order=(0, 1, 2))
results_MA = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_MA.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_MA.fittedvalues-ts_log_diff)**2))
妯″瀷3锛欰RIMA妯″瀷(ARIMA(2,1,2))
model = ARIMA(ts_log, order=(2, 1, 2))
results_ARIMA = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_ARIMA.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_ARIMA.fittedvalues-ts_log_diff)**2))
鐢盧SS锛屽彲鐭ユā鍨?--ARIMA锛?,1,2锛夌殑鎷熷悎搴︽渶濂斤紝鎵€浠ユ垜浠‘瀹氫簡鏈€缁堢殑棰勬祴妯″瀷銆?/p>
step3: 灏嗘ā鍨嬩唬鍏ュ師鏁版嵁杩涜棰勬祴
鍥犱负涓婇潰鐨勬ā鍨嬬殑鎷熷悎鍊兼槸瀵瑰師鏁版嵁杩涜绋冲畾鍖栦箣鍚庣殑杈撳叆鏁版嵁鐨勬嫙鍚堬紝鎵€浠ラ渶瑕佸鎷熷悎鍊艰繘琛岀浉搴斿鐞嗙殑閫嗘搷浣滐紝浣垮緱瀹冨洖鍒颁笌鍘熸暟鎹竴鑷寸殑灏哄害銆?/p>
#ARIMA鎷熷悎鐨勫叾瀹炴槸涓€闃跺樊鍒唗s_log_diff锛宲redictions_ARIMA_diff[i]鏄i涓湀涓巌-1涓湀鐨則s_log鐨勫樊鍊笺€?/span>#鐢变簬宸垎鍖栨湁涓€闃舵粸鍚庯紝鎵€浠ョ涓€涓湀鐨勬暟鎹槸绌虹殑锛?br>
predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)print predictions_ARIMA_diff.head()
#绱姞鐜版湁鐨刣iff锛屽緱鍒版瘡涓€间笌绗竴涓湀鐨勫樊鍒嗭紙鍚宭og搴曠殑鎯呭喌涓嬶級銆?br>
#鍗硃redictions_ARIMA_diff_cumsum[i] 鏄i涓湀涓庣1涓湀鐨則s_log鐨勫樊鍊笺€?br>
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
#鍏坱s_log_diff => ts_log=>ts_log => ts
#鍏堜互ts_log鐨勭涓€涓€间綔涓哄熀鏁帮紝澶嶅埗缁欐墍鏈夊€硷紝鐒跺悗姣忎釜鏃跺埢鐨勫€肩疮鍔犱笌绗竴涓湀瀵瑰簲鐨勫樊鍊?杩欐牱灏辫В鍐充簡锛岀涓€涓湀diff鏁版嵁涓虹┖鐨勯棶棰樹簡)
#鐒跺悗寰楀埌浜唒redictions_ARIMA_log => predictions_ARIMA
predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
predictions_ARIMA = np.exp(predictions_ARIMA_log)
plt.figure()
plt.plot(ts)
plt.plot(predictions_ARIMA)
plt.title('RMSE: %.4f'% np.sqrt(sum((predictions_ARIMA-ts)**2)/len(ts)))
5.鎬荤粨
鍓嶉潰涓€绡囨枃绔狅紝鎬荤粨浜咥RIMA寤烘ā鐨勬楠ゃ€?br>(1). 鑾峰彇琚娴嬬郴缁熸椂闂村簭鍒楁暟鎹紱
(2). 瀵规暟鎹粯鍥撅紝瑙傛祴鏄惁涓哄钩绋虫椂闂村簭鍒楋紱瀵逛簬闈炲钩绋虫椂闂村簭鍒楄鍏堣繘琛宒闃跺樊鍒嗚繍绠楋紝鍖栦负骞崇ǔ鏃堕棿搴忓垪锛?br>(3). 缁忚繃绗簩姝ュ鐞嗭紝宸茬粡寰楀埌骞崇ǔ鏃堕棿搴忓垪銆傝瀵瑰钩绋虫椂闂村簭鍒楀垎鍒眰寰楀叾鑷浉鍏崇郴鏁癆CF 鍜屽亸鑷浉鍏崇郴鏁癙ACF锛岄€氳繃瀵硅嚜鐩稿叧鍥惧拰鍋忚嚜鐩稿叧鍥剧殑鍒嗘瀽锛屽緱鍒版渶浣崇殑闃跺眰 p 鍜岄樁鏁?q
(4). 鐢变互涓婂緱鍒扮殑d銆乹銆乸锛屽緱鍒癆RIMA妯″瀷銆傜劧鍚庡紑濮嬪寰楀埌鐨勬ā鍨嬭繘琛屾ā鍨嬫楠屻€?br>鍏蜂綋渚嬪瓙浼氬湪鍙︿竴绡囨枃绔犱腑缁欏嚭銆?/p>
鏈枃缁撳悎涓€涓緥瀛愶紝璇存槑python濡備綍瑙e喅锛?br>1.鍒ゆ柇涓€涓椂搴忔暟鎹槸鍚︽槸绋冲畾銆傚搴旀楠?1)
鎬庢牱璁╂椂搴忔暟鎹ǔ瀹氬寲銆傚搴旀楠?2)
浣跨敤ARIMA妯″瀷杩涜鏃跺簭鏁版嵁棰勬祴銆傚搴旀楠?3,4)
鍙﹀瀵筪ata science鎰熷叴瓒g殑鍚屽鍙互鍏虫敞杩欎釜缃戠珯锛屽共璐ц繕鎸哄鐨勩€?br>https://www.analyticsvidhya.com/blog/
娣卞害瀛︿範銆佹満鍣ㄥ涔犮€佹暟鎹垎鏋愩€乸ython
以上是关于Python鏃堕棿搴忓垪鏁版嵁鍒嗘瀽--浠ョず渚嬭鏄?/h1>
鍟嗕笟鏁版嵁鍒嗘瀽鍜屽彲瑙嗗寲BI锛岄浂鍩虹鍏ラ棬闇€瑕佸涔咃紵