测试经过训练的 LSTM 模型后如何预测实际的未来值?
Posted
技术标签:
【中文标题】测试经过训练的 LSTM 模型后如何预测实际的未来值?【英文标题】:How to predict actual future values after testing the trained LSTM model? 【发布时间】:2022-01-21 23:19:58 【问题描述】:我通过将数据集拆分为训练和测试来训练我的股票价格预测模型。 我还通过将有效数据与预测数据进行比较来测试预测,并且模型运行良好。 但我想预测实际未来值。
我需要在下面的代码中进行哪些更改?
我如何才能预测到实际未来的特定日期?
代码(在 Jupyter Notebook 中):
(要运行代码,请在您拥有的类似 csv 文件中尝试,或使用命令pip install nsepy
安装 nsepy python 库)
# imports
import pandas as pd # data processing
import numpy as np # linear algebra
import matplotlib.pyplot as plt # plotting
from datetime import date # date
from nsepy import get_history # NSE historical data
from keras.models import Sequential # neural network
from keras.layers import LSTM, Dropout, Dense # LSTM layer
from sklearn.preprocessing import MinMaxScaler # scaling
nseCode = 'TCS'
stockTitle = 'Tata Consultancy Services'
# API call
apiData = get_history(symbol = nseCode, start = date(2017,1,1), end = date(2021,12,19))
data = apiData # copy the dataframe (not necessary)
# remove columns you don't need
del data['Symbol']
del data['Series']
del data['Prev Close']
del data['Volume']
del data['Turnover']
del data['Trades']
del data['Deliverable Volume']
del data['%Deliverble']
# store the data in a csv file
data.to_csv('infy2.csv')
# Read the csv file
data = pd.read_csv('infy2.csv')
# convert the date column to datetime; if you read data from csv, do this. Otherwise, no need if you read data from API
data['Date'] = pd.to_datetime(data['Date'], format = '%Y-%m-%d')
data.index = data['Date']
# plot
plt.xlabel('Date')
plt.ylabel('Close Price (Rs.)')
data['Close'].plot(legend = True, figsize = (10,6), title = stockTitle, grid = True, color = 'blue')
# Sort data into Date and Close columns
data2 = data.sort_index(ascending = True, axis = 0)
newData = pd.DataFrame(index = range(0,len(data2)), columns = ['Date', 'Close'])
for i in range(0, len(data2)): # only if you read data from csv
newData['Date'][i] = data2['Date'][i]
newData['Close'][i] = data2['Close'][I]
# Calculate the row number to split the dataset into train and test
split = len(newData) - 100
# normalize the new dataset
scaler = MinMaxScaler(feature_range = (0, 1))
finalData = newData.values
trainData = finalData[0:split, :]
validData = finalData[split:, :]
newData.index = newData.Date
newData.drop('Date', axis = 1, inplace = True)
scaler = MinMaxScaler(feature_range = (0, 1))
scaledData = scaler.fit_transform(newData)
xTrainData, yTrainData = [], []
for i in range(60, len(trainData)): # data-flair has used 60 instead of 30
xTrainData.append(scaledData[i-60:i, 0])
yTrainData.append(scaledData[i, 0])
xTrainData, yTrainData = np.array(xTrainData), np.array(yTrainData)
xTrainData = np.reshape(xTrainData, (xTrainData.shape[0], xTrainData.shape[1], 1))
# build and train the LSTM model
lstmModel = Sequential()
lstmModel.add(LSTM(units = 50, return_sequences = True, input_shape = (xTrainData.shape[1], 1)))
lstmModel.add(LSTM(units = 50))
lstmModel.add(Dense(units = 1))
inputsData = newData[len(newData) - len(validData) - 60:].values
inputsData = inputsData.reshape(-1,1)
inputsData = scaler.transform(inputsData)
lstmModel.compile(loss = 'mean_squared_error', optimizer = 'adam')
lstmModel.fit(xTrainData, yTrainData, epochs = 1, batch_size = 1, verbose = 2)
# Take a sample of a dataset to make predictions
xTestData = []
for i in range(60, inputsData.shape[0]):
xTestData.append(inputsData[i-60:i, 0])
xTestData = np.array(xTestData)
xTestData = np.reshape(xTestData, (xTestData.shape[0], xTestData.shape[1], 1))
predictedClosingPrice = lstmModel.predict(xTestData)
predictedClosingPrice = scaler.inverse_transform(predictedClosingPrice)
# visualize the results
trainData = newData[:split]
validData = newData[split:]
validData['Predictions'] = predictedClosingPrice
plt.xlabel('Date')
plt.ylabel('Close Price (Rs.)')
trainData['Close'].plot(legend = True, color = 'blue', label = 'Train Data')
validData['Close'].plot(legend = True, color = 'green', label = 'Valid Data')
validData['Predictions'].plot(legend = True, figsize = (12,7), grid = True, color = 'orange', label = 'Predicted Data', title = stockTitle)
【问题讨论】:
【参考方案1】:以下是如何为您的模型实现this approach 的示例:
import pandas as pd
import numpy as np
from datetime import date
from nsepy import get_history
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
pd.options.mode.chained_assignment = None
# load the data
stock_ticker = 'TCS'
stock_name = 'Tata Consultancy Services'
train_start = date(2017, 1, 1)
train_end = date.today()
data = get_history(symbol=stock_ticker, start=train_start, end=train_end)
data.index = pd.DatetimeIndex(data.index)
data = data[['Close']]
# scale the data
scaler = MinMaxScaler(feature_range=(0, 1)).fit(data)
z = scaler.transform(data)
# extract the input sequences and target values
window_size = 60
x, y = [], []
for i in range(window_size, len(z)):
x.append(z[i - window_size: i])
y.append(z[i])
x, y = np.array(x), np.array(y)
# build and train the model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=x.shape[1:]))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(loss='mse', optimizer='adam')
model.fit(x, y, epochs=100, batch_size=128, verbose=1)
# generate the multi-step forecasts
def multi_step_forecasts(n_past, n_future):
x_past = x[- n_past - 1:, :, :][:1] # last observed input sequence
y_past = y[- n_past - 1] # last observed target value
y_future = [] # predicted target values
for i in range(n_past + n_future):
# feed the last forecast back to the model as an input
x_past = np.append(x_past[:, 1:, :], y_past.reshape(1, 1, 1), axis=1)
# generate the next forecast
y_past = model.predict(x_past)
# save the forecast
y_future.append(y_past.flatten()[0])
# transform the forecasts back to the original scale
y_future = scaler.inverse_transform(np.array(y_future).reshape(-1, 1)).flatten()
# add the forecasts to the data frame
df_past = data.rename(columns='Close': 'Actual').copy()
df_future = pd.DataFrame(
index=pd.bdate_range(start=data.index[- n_past - 1] + pd.Timedelta(days=1), periods=n_past + n_future),
columns=['Forecast'],
data=y_future
)
return df_past.join(df_future, how='outer')
# forecast the next 30 days
df1 = multi_step_forecasts(n_past=0, n_future=30)
df1.plot(title=stock_name)
# forecast the last 20 days and the next 30 days
df2 = multi_step_forecasts(n_past=20, n_future=30)
df2.plot(title=stock_name)
【讨论】:
如何预测几天前的值(例如,蓝色和橙色重叠),以便同时检查模型是否良好?以上是关于测试经过训练的 LSTM 模型后如何预测实际的未来值?的主要内容,如果未能解决你的问题,请参考以下文章