必看！最简单的Python时间序列预测模型

Posted 2021-05-02 AI全球动态

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了必看！最简单的Python时间序列预测模型相关的知识，希望对你有一定的参考价值。

获得数据

芝加哥期权交易所波动率指数(CBOE Volatility Index，简称VIX)是衡量标普500指数期权隐含的股市波动预期的常用指标，它是由芝加哥期权交易所(CBOE)实时计算和传播的。

本文设置的标普500指数（SP500）日期范围是从2011年2月11日到2019年2月11日。我们的目的是使用ANN和LSTM预测波动性标普500指数时间序列。

首先，我们需要导入以下内容到库：

import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import r2_score
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras.optimizers import Adam
from keras.layers import LSTM

然后，将数据加载到Pandas数据帧中：

df = pd.read_csv("vix_2011_2019.csv")

我们可以快速浏览一下前几行，

print(df.head())

接下来，我们删除不需要的列，然后将“日期”列转换为datatime数据类型，并将“日期”列设置为索引。

df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], axis=1, inplace=True)
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index(['Date'], drop=True)
df.head(10)

以上步骤

必看！最简单的Python时间序列预测模型

完成后，我们再来绘制一张时间序列线图。

plt.figure(figsize=(10, 6))
df['Adj Close'].plot();

必看！最简单的Python时间序列预测模型

可以看出，“Adj Close”数据非常不稳定，既没有上升趋势也没有下降趋势。

接下来，以“2018-01-01”为分界将数据拆分训练和测试数据。也就是说，此日期之前的数据是训练数据，之后的数据是测试数据，然后再次对其进行可视化。

split_date = pd.Timestamp('2018-01-01')
df = df['Adj Close']
train = df.loc[:split_date]
test = df.loc[split_date:]
plt.figure(figsize=(10, 6))
ax = train.plot()
test.plot(ax=ax)
plt.legend(['train', 'test']);

必看！最简单的Python时间序列预测模型

然后将训练和测试数据扩展到[- 1,1]。

scaler = MinMaxScaler(feature_range=(-1, 1))
train_sc = scaler.fit_transform(train)
test_sc = scaler.transform(test)

最后，获取训练数据和测试数据。

X_train = train_sc[:-1]
y_train = train_sc[1:]
X_test = test_sc[:-1]
y_test = test_sc[1:]

创建用于时间序列预测的简单ANN

创建一个顺序模型。。
通过Add()方法添加层。
将input_dim参数传递给第一层。
经过整流线性单元Relu激活函数。
通过compile语法配置学习过程。
损失函数是mean_squared_error，而优化器是adam。
当监测到损失不再提高时，停止训练。
patience=2表示没有改善空间，之后将停止训练。
ANN被训练100个周期，使用的批量大小为1。

nn_model = Sequential()
nn_model.add(Dense(12, input_dim=1, activation='relu'))
nn_model.add(Dense(1))
nn_model.compile(loss='mean_squared_error', optimizer='adam')
early_stop = EarlyStopping(monitor='loss', patience=2, verbose=1)
history = nn_model.fit(X_train, y_train, epochs=100, batch_size=1, verbose=1, callbacks=[early_stop], shuffle=False)

必看！最简单的Python时间序列预测模型

这里没有显示全部输出，但我们可以看到它在第19个周期就停止了。

y_pred_test_nn = nn_model.predict(X_test)
y_train_pred_nn = nn_model.predict(X_train)
print("The R2 score on the Train set is:	{:0.3f}".format(r2_score(y_train, y_train_pred_nn)))
print("The R2 score on the Test set is:	{:0.3f}".format(r2_score(y_test, y_pred_test_nn)))

必看！最简单的Python时间序列预测模型

LSTM

在创建LSTM时，我们将使用pandas中的shift函数将整列移动1.在下面的代码片段中，我们将列向下移动1.然后我们需要将所有输入变量转换为以3D矢量形式表示。

train_sc_df = pd.DataFrame(train_sc, columns=['Y'], index=train.index)
	test_sc_df = pd.DataFrame(test_sc, columns=['Y'], index=test.index)
	

	for s in range(1,2):
	 train_sc_df['X_{}'.format(s)] = train_sc_df['Y'].shift(s)
	 test_sc_df['X_{}'.format(s)] = test_sc_df['Y'].shift(s)
	

	X_train = train_sc_df.dropna().drop('Y', axis=1)
	y_train = train_sc_df.dropna().drop('X_1', axis=1)
	

	X_test = test_sc_df.dropna().drop('Y', axis=1)
	y_test = test_sc_df.dropna().drop('X_1', axis=1)
	

	X_train = X_train.as_matrix()
	y_train = y_train.as_matrix()
	

	X_test = X_test.as_matrix()
	y_test = y_test.as_matrix()
	

	X_train_lmse = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
	X_test_lmse = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
	

	print('Train shape: ', X_train_lmse.shape)
	print('Test shape: ', X_test_lmse.shape)

必看！最简单的Python时间序列预测模型

LSTM网络的创建和模型编译与ANN的类似。
LSTM具有带有1个输入的可见层。
一个含有7个LSTM神经元的隐藏层。
一个只做单值预测的输出层。
LSTM神经元采用relu激活函数。
LSTM经过100个周期的训练，使用的批量大小为1。

lstm_model = Sequential()
lstm_model.add(LSTM(7, input_shape=(1, X_train_lmse.shape[1]), activation='relu', kernel_initializer='lecun_uniform', return_sequences=False))
lstm_model.add(Dense(1))
lstm_model.compile(loss='mean_squared_error', optimizer='adam')
early_stop = EarlyStopping(monitor='loss', patience=2, verbose=1)
history_lstm_model = lstm_model.fit(X_train_lmse, y_train, epochs=100, batch_size=1, verbose=1, shuffle=False, callbacks=[early_stop])

必看！最简单的Python时间序列预测模型

如图所示，它在第10个周期停止了。

y_pred_test_lstm = lstm_model.predict(X_test_lmse)
y_train_pred_lstm = lstm_model.predict(X_train_lmse)
print("The R2 score on the Train set is:	{:0.3f}".format(r2_score(y_train, y_train_pred_lstm)))
print("The R2 score on the Test set is:	{:0.3f}".format(r2_score(y_test, y_pred_test_lstm)))

必看！最简单的Python时间序列预测模型

可以看出，LSTM模型的训练和测试R^2均优于ANN模型。

比较模型

接下来，我们比较两种模型的测试MSE。

nn_test_mse = nn_model.evaluate(X_test, y_test, batch_size=1)
lstm_test_mse = lstm_model.evaluate(X_test_lmse, y_test, batch_size=1)
print('NN: %f'%nn_test_mse)
print('LSTM: %f'%lstm_test_mse)

必看！最简单的Python时间序列预测模型

做出预测

nn_y_pred_test = nn_model.predict(X_test)
lstm_y_pred_test = lstm_model.predict(X_test_lmse)
plt.figure(figsize=(10, 6))
plt.plot(y_test, label='True')
plt.plot(y_pred_test_nn, label='NN')
plt.title("NN's Prediction")
plt.xlabel('Observation')
plt.ylabel('Adj Close Scaled')
plt.legend()
plt.show();

plt.figure(figsize=(10, 6))
plt.plot(y_test, label='True')
plt.plot(y_pred_test_lstm, label='LSTM')
plt.title("LSTM's Prediction")
plt.xlabel('Observation')
plt.ylabel('Adj Close scaled')
plt.legend()
plt.show();

这样，我们就知道了如何利用Keras深度学习网络，在Python中开发用于时间序列预测的ANN和LSTM模型，以及如何利用它们更好地预测时间序列数据。

以上是关于必看！最简单的Python时间序列预测模型的主要内容，如果未能解决你的问题，请参考以下文章