测试经过训练的 LSTM 模型后如何预测实际的未来值?

Posted

技术标签:

【中文标题】测试经过训练的 LSTM 模型后如何预测实际的未来值?【英文标题】:How to predict actual future values after testing the trained LSTM model? 【发布时间】:2022-01-21 23:19:58 【问题描述】:

我通过将数据集拆分为训练和测试来训练我的股票价格预测模型。 我还通过将有效数据与预测数据进行比较来测试预测,并且模型运行良好。 但我想预测实际未来值。

我需要在下面的代码中进行哪些更改?

我如何才能预测到实际未来的特定日期?


代码(在 Jupyter Notebook 中):

(要运行代码,请在您拥有的类似 csv 文件中尝试,或使用命令pip install nsepy 安装 nsepy python 库)

# imports
import pandas as pd  # data processing
import numpy as np  # linear algebra
import matplotlib.pyplot as plt  # plotting
from datetime import date  # date
from nsepy import get_history  # NSE historical data
from keras.models import Sequential  # neural network
from keras.layers import LSTM, Dropout, Dense  # LSTM layer
from sklearn.preprocessing import MinMaxScaler  # scaling

nseCode = 'TCS'
stockTitle = 'Tata Consultancy Services'

# API call
apiData = get_history(symbol = nseCode, start = date(2017,1,1), end = date(2021,12,19))
data = apiData  # copy the dataframe (not necessary)

# remove columns you don't need
del data['Symbol']
del data['Series']
del data['Prev Close']
del data['Volume']
del data['Turnover']
del data['Trades']
del data['Deliverable Volume']
del data['%Deliverble']

# store the data in a csv file
data.to_csv('infy2.csv')

# Read the csv file
data = pd.read_csv('infy2.csv')

# convert the date column to datetime; if you read data from csv, do this. Otherwise, no need if you read data from API
data['Date'] = pd.to_datetime(data['Date'], format = '%Y-%m-%d')
data.index = data['Date']

# plot
plt.xlabel('Date')
plt.ylabel('Close Price (Rs.)')
data['Close'].plot(legend = True, figsize = (10,6), title = stockTitle, grid = True, color = 'blue')

# Sort data into Date and Close columns
data2 = data.sort_index(ascending = True, axis = 0)

newData = pd.DataFrame(index = range(0,len(data2)), columns = ['Date', 'Close'])

for i in range(0, len(data2)):  # only if you read data from csv
    newData['Date'][i] = data2['Date'][i]
    newData['Close'][i] = data2['Close'][I]

# Calculate the row number to split the dataset into train and test
split = len(newData) - 100

# normalize the new dataset
scaler = MinMaxScaler(feature_range = (0, 1))
finalData = newData.values

trainData = finalData[0:split, :]
validData = finalData[split:, :]

newData.index = newData.Date
newData.drop('Date', axis = 1, inplace = True)
scaler = MinMaxScaler(feature_range = (0, 1))
scaledData = scaler.fit_transform(newData)

xTrainData, yTrainData = [], []

for i in range(60, len(trainData)):  # data-flair has used 60 instead of 30
    xTrainData.append(scaledData[i-60:i, 0])
    yTrainData.append(scaledData[i, 0])

xTrainData, yTrainData = np.array(xTrainData), np.array(yTrainData)

xTrainData = np.reshape(xTrainData, (xTrainData.shape[0], xTrainData.shape[1], 1))

# build and train the LSTM model
lstmModel = Sequential()
lstmModel.add(LSTM(units = 50, return_sequences = True, input_shape = (xTrainData.shape[1], 1)))
lstmModel.add(LSTM(units = 50))
lstmModel.add(Dense(units = 1))

inputsData = newData[len(newData) - len(validData) - 60:].values
inputsData = inputsData.reshape(-1,1)
inputsData = scaler.transform(inputsData)

lstmModel.compile(loss = 'mean_squared_error', optimizer = 'adam')
lstmModel.fit(xTrainData, yTrainData, epochs = 1, batch_size = 1, verbose = 2)

# Take a sample of a dataset to make predictions
xTestData = []

for i in range(60, inputsData.shape[0]):
    xTestData.append(inputsData[i-60:i, 0])

xTestData = np.array(xTestData)

xTestData = np.reshape(xTestData, (xTestData.shape[0], xTestData.shape[1], 1))

predictedClosingPrice = lstmModel.predict(xTestData)
predictedClosingPrice = scaler.inverse_transform(predictedClosingPrice)

# visualize the results
trainData = newData[:split]
validData = newData[split:]

validData['Predictions'] = predictedClosingPrice

plt.xlabel('Date')
plt.ylabel('Close Price (Rs.)')

trainData['Close'].plot(legend = True, color = 'blue', label = 'Train Data')
validData['Close'].plot(legend = True, color = 'green', label = 'Valid Data')
validData['Predictions'].plot(legend = True, figsize = (12,7), grid = True, color = 'orange', label = 'Predicted Data', title = stockTitle)

【问题讨论】:

【参考方案1】:

以下是如何为您的模型实现this approach 的示例:

import pandas as pd
import numpy as np
from datetime import date
from nsepy import get_history
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
pd.options.mode.chained_assignment = None

# load the data
stock_ticker = 'TCS'
stock_name = 'Tata Consultancy Services'
train_start = date(2017, 1, 1)
train_end = date.today()
data = get_history(symbol=stock_ticker, start=train_start, end=train_end)
data.index = pd.DatetimeIndex(data.index)
data = data[['Close']]

# scale the data
scaler = MinMaxScaler(feature_range=(0, 1)).fit(data)
z = scaler.transform(data)

# extract the input sequences and target values
window_size = 60

x, y = [], []

for i in range(window_size, len(z)):
    x.append(z[i - window_size: i])
    y.append(z[i])

x, y = np.array(x), np.array(y)

# build and train the model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=x.shape[1:]))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(loss='mse', optimizer='adam')
model.fit(x, y, epochs=100, batch_size=128, verbose=1)

# generate the multi-step forecasts
def multi_step_forecasts(n_past, n_future):

    x_past = x[- n_past - 1:, :, :][:1]  # last observed input sequence
    y_past = y[- n_past - 1]             # last observed target value
    y_future = []                        # predicted target values

    for i in range(n_past + n_future):

        # feed the last forecast back to the model as an input
        x_past = np.append(x_past[:, 1:, :], y_past.reshape(1, 1, 1), axis=1)

        # generate the next forecast
        y_past = model.predict(x_past)

        # save the forecast
        y_future.append(y_past.flatten()[0])

    # transform the forecasts back to the original scale
    y_future = scaler.inverse_transform(np.array(y_future).reshape(-1, 1)).flatten()

    # add the forecasts to the data frame
    df_past = data.rename(columns='Close': 'Actual').copy()

    df_future = pd.DataFrame(
        index=pd.bdate_range(start=data.index[- n_past - 1] + pd.Timedelta(days=1), periods=n_past + n_future),
        columns=['Forecast'],
        data=y_future
    )

    return df_past.join(df_future, how='outer')

# forecast the next 30 days
df1 = multi_step_forecasts(n_past=0, n_future=30)
df1.plot(title=stock_name)

# forecast the last 20 days and the next 30 days
df2 = multi_step_forecasts(n_past=20, n_future=30)
df2.plot(title=stock_name)

【讨论】:

如何预测几天前的值(例如,蓝色和橙色重叠),以便同时检查模型是否良好?

以上是关于测试经过训练的 LSTM 模型后如何预测实际的未来值?的主要内容,如果未能解决你的问题,请参考以下文章

如何让我的 LSTM 模型在训练后进行预测

Keras LSTM:如何预测超越验证与预测?

你如何使用 LSTM 模型预测未来的预测?

如何使用 LSTM 单元训练 RNN 以进行时间序列预测

python如何预测下一年的数据

使用经过训练的字符级 LSTM 模型生成文本