Python Auto ARIMA 模型无法正常工作
Posted
技术标签:
【中文标题】Python Auto ARIMA 模型无法正常工作【英文标题】:Python Auto ARIMA model not working correctly 【发布时间】:2020-04-09 20:54:22 【问题描述】:我创建了一个带有有效 DatetimeIndex 的 pandas DataFrame
df.index = df.timestamp
df = df.resample("10Min", how="mean")
plt.plot_date(df.index, df['delay'])
fig = plt.gcf()
fig.set_size_inches(18.5, 8.5)
看起来是这样的:
模型拟合的相关属性:
df['delay'].head(5)
timestamp
2016-10-30 04:30:00 32.000000
2016-10-30 04:40:00 12.714286
2016-10-30 04:50:00 36.941176
2016-10-30 05:00:00 37.273381
2016-10-30 05:10:00 38.960526
Name: delay, dtype: float64
然后我将 ARIMA 拟合到数据中:
import pmdarima as pm
import numpy as np
import matplotlib.pyplot as plt
df = df.dropna()
model = pm.auto_arima(df.delay, error_action='ignore', trace=1,
suppress_warnings=True,
seasonal=True, m=12)
model.plot_diagnostics(figsize=(7,5))
plt.show()
有诊断结果:
Fit ARIMA: order=(2, 0, 2) seasonal_order=(1, 0, 1, 12); AIC=15089.595, BIC=15133.343, Fit time=4.145 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 0, 0, 12); AIC=17785.720, BIC=17796.657, Fit time=0.026 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(1, 0, 0, 12); AIC=15136.460, BIC=15158.334, Fit time=1.219 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(0, 0, 1, 12); AIC=16256.966, BIC=16278.840, Fit time=1.508 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 0, 0, 12); AIC=20520.379, BIC=20525.847, Fit time=0.020 seconds
Fit ARIMA: order=(2, 0, 2) seasonal_order=(0, 0, 1, 12); AIC=15087.594, BIC=15125.874, Fit time=3.259 seconds
Fit ARIMA: order=(2, 0, 2) seasonal_order=(0, 0, 0, 12); AIC=15085.811, BIC=15118.622, Fit time=0.757 seconds
Fit ARIMA: order=(2, 0, 2) seasonal_order=(1, 0, 0, 12); AIC=15087.595, BIC=15125.874, Fit time=3.221 seconds
Fit ARIMA: order=(1, 0, 2) seasonal_order=(0, 0, 0, 12); AIC=15083.914, BIC=15111.257, Fit time=0.566 seconds
Fit ARIMA: order=(1, 0, 2) seasonal_order=(1, 0, 0, 12); AIC=15085.685, BIC=15118.496, Fit time=2.917 seconds
Fit ARIMA: order=(1, 0, 2) seasonal_order=(0, 0, 1, 12); AIC=15085.684, BIC=15118.495, Fit time=2.064 seconds
Fit ARIMA: order=(1, 0, 2) seasonal_order=(1, 0, 1, 12); AIC=15087.685, BIC=15125.965, Fit time=3.655 seconds
Fit ARIMA: order=(0, 0, 2) seasonal_order=(0, 0, 0, 12); AIC=15765.080, BIC=15786.954, Fit time=0.538 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 0, 0, 12); AIC=15127.434, BIC=15149.308, Fit time=0.252 seconds
Fit ARIMA: order=(1, 0, 3) seasonal_order=(0, 0, 0, 12); AIC=15085.728, BIC=15118.539, Fit time=0.772 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(0, 0, 0, 12); AIC=16323.047, BIC=16339.452, Fit time=0.275 seconds
Fit ARIMA: order=(0, 0, 3) seasonal_order=(0, 0, 0, 12); AIC=15554.326, BIC=15581.669, Fit time=0.782 seconds
Fit ARIMA: order=(2, 0, 1) seasonal_order=(0, 0, 0, 12); AIC=15108.477, BIC=15135.819, Fit time=0.684 seconds
Fit ARIMA: order=(2, 0, 3) seasonal_order=(0, 0, 0, 12); AIC=15085.457, BIC=15123.737, Fit time=1.764 seconds
Total fit time: 28.444 seconds
然后我对未来 2 天进行预测,但 ARIMA 模型以一种奇怪的方式变平:
# Forecast
n_periods = 288
fc, confint = model.predict(n_periods=n_periods, return_conf_int=True)
index_of_fc = np.arange(len(df.delay), len(df.delay)+n_periods)
idx = pd.date_range('2016-11-13 01:20:00', periods=n_periods, freq='10min')
# make series for plotting purpose
fc_series = pd.Series(fc, index=idx)
lower_series = pd.Series(confint[:, 0], index=idx)
upper_series = pd.Series(confint[:, 1], index=idx)
#type(fc_series)
#idx
#type(df.index)
# Plot
plt.plot(df.delay)
plt.plot(fc_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.title("Forecast of delays with 2 days future horizon")
fig = plt.gcf()
fig.set_size_inches(18.5, 8.5)
plt.show()
看起来像这样:
预测的系列在 ~76 处持平
fc_series.describe()
count 240.000000
mean 86.422551
std 30.717400
min 76.344097
25% 76.344159
50% 76.353180
75% 77.662985
max 303.833528
dtype: float64
这是预测系列的图形描述:
fc_series.plot()
有谁知道我做错了什么?我已经尝试使用许多 auto_arima
参数来调整模型,但它总是像这样变平。
【问题讨论】:
可能是模式对模型来说过于根深蒂固;您是否尝试过记录该值?它通常有助于自回归模型。 我尝试过使用 auto_arima,但无济于事。最终将 ARIMA 与 statsmodels ARIMA 实施相结合,取得了良好的效果。 尽管我付出了很多努力,但同样的事情发生在我身上。我也在使用预测,但它会在某个日期之后消除所有预测。我很惊讶在过去的两年里,没有人(除了作者)对这个问题有答案。有人知道这是否是 pm.auto_arima 的错误吗? 【参考方案1】:我对 auto_arima 的实现并不满意。我已经通过 statsmodels ARIMA 实现将 ARIMA 安装到我的时间序列中。
MSE:715.12
完整代码:
import warnings
import itertools
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
#data = sm.datasets.co2.load_pandas()
#y = data.data
y = df
y.drop(y.columns.difference(['delay']), 1, inplace=True)
# The 'MS' string groups the data in buckets by start of the month
y = y['delay'].resample("4H", how="mean")
# The term bfill means that we use the value before filling in missing values
y = y.fillna(y.bfill())
print(y)
y.plot(figsize=(15, 6))
plt.show()
# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 2)
# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))
# Generate all different combinations of seasonal p, q and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: x '.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: x '.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: x '.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: x '.format(pdq[2], seasonal_pdq[4]))
warnings.filterwarnings("ignore") # specify to ignore warning messages
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(y,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print('ARIMAx12 - AIC:'.format(param, param_seasonal, results.aic))
except:
continue
mod = sm.tsa.statespace.SARIMAX(y,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print(results.summary().tables[1])
results.plot_diagnostics(figsize=(15, 12))
plt.show()
pred = results.get_prediction(start=pd.to_datetime('2016-11-20 00:00:00'), dynamic=False)
pred_ci = pred.conf_int()
ax = y['2016-10-30 04:00:00':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7)
ax.set_xlabel('Date')
ax.set_ylabel('Delays')
plt.legend()
plt.show()
y_forecasted = pred.predicted_mean
y_truth = y['2016-11-20 00:00:00':]
# Compute the mean square error
mse = ((y_forecasted - y_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is '.format(round(mse, 2)))
pred_dynamic = results.get_prediction(start=pd.to_datetime('2016-11-20 00:00:00'), dynamic=True, full_results=True)
pred_dynamic_ci = pred_dynamic.conf_int()
ax = y['2016-10-30 04:00:00':].plot(label='observed', figsize=(20, 15))
pred_dynamic.predicted_mean.plot(label='Dynamic Forecast', ax=ax)
ax.fill_betweenx(ax.get_ylim(), pd.to_datetime('2016-11-20 00:00:00'), y.index[-1],
alpha=.1, zorder=-1)
ax.set_xlabel('Date')
ax.set_ylabel('Delays')
plt.legend()
plt.show()
y_forecasted = pred_dynamic.predicted_mean
y_truth = y['2016-11-20 00:00:00':]
# Compute the mean square error
mse = ((y_forecasted - y_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is '.format(round(mse, 2)))
# Get forecast 500 steps ahead in future
pred_uc = results.get_forecast(steps=32)
# Get confidence intervals of forecasts
pred_ci = pred_uc.conf_int()
print(pred_ci.iloc[:, 1])
ax = y.plot(label='observed', figsize=(20, 15))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
ax.set_ylabel('Delays')
plt.legend()
plt.show()
【讨论】:
您能上传数据框df
的数据集吗?
我没有了。但是问题有效负载上有 df 记录示例。以上是关于Python Auto ARIMA 模型无法正常工作的主要内容,如果未能解决你的问题,请参考以下文章
Python/Pandas - 如何调整 auto_arima 模型参数以获得未来预测
即使时间序列是固定的并且在 Python 中没有季节性分量,auto_arima 也会返回最佳模型作为 SARIMAX