无法在 theano 中为 rnn 编写梯度步骤
Posted
技术标签:
【中文标题】无法在 theano 中为 rnn 编写梯度步骤【英文标题】:Unable to write gradient step in theano for rnn 【发布时间】:2017-02-20 13:34:24 【问题描述】:我有以下代码,其中我将单词转换为一个热向量,并在 theano 中使用 rnn 进行梯度下降,以预测给定单词序列的下一个单词(基本上是语言模型)。
# coding: utf-8
# In[68]:
#Importing stuff
import theano
import theano.tensor as T
import numpy as np
# In[69]:
import nltk
import sys
import operator
import csv
import itertools
from utils import *
from datetime import datetime
# In[70]:
#Fixing vocabulary size for one hot vectors and some initialization stuff
v_size = 8000
unknown_token = "UNKNOWN_TOKEN"
start_token = "<s>"
end_token = "</s>"
# In[71]:
#Read data and start preprocessing
with open('reddit-comments-2015-08.csv','rb') as f:
reader = csv.reader(f, skipinitialspace=True)
reader.next()
sentences = list(itertools.chain(*[nltk.sent_tokenize(x[0].decode('utf-8')) for x in reader]))
print len(sentences)
# In[72]:
#Tokenize the sentences and add start and end tokens
tokenized_sentences = [nltk.word_tokenize(s) for s in sentences]
tokenized_sentences = [[start_token] + s + [end_token] for s in tokenized_sentences]
# In[73]:
#Get word frequencies and use only most frequent words in vocabulary
word_freq = nltk.FreqDist(itertools.chain(*tokenized_sentences))
vocab = word_freq.most_common(v_size-1)
# In[74]:
#Do mapping and reverse mapping
index_to_word = [x[0] for x in vocab]
index_to_word.append(unknown_token)
word_to_index = w:i for i,w in enumerate(index_to_word)
#Removing less frequent words
for i, s in enumerate(tokenized_sentences):
tokenized_sentences[i] = [w if w in word_to_index else unknown_token for w in s]
#Got vectors but they are not one hot
X_train = np.asarray([[word_to_index[w] for w in s[:-1]] for s in tokenized_sentences])
Y_train = np.asarray([[word_to_index[w] for w in s[1:]] for s in tokenized_sentences])
#Preprocessing ends here
# In[75]:
#Take only one sentence for now
X_train = X_train[0]
Y_train = Y_train[0]
# In[76]:
#Make input and output as onehot vectors. This can easily be replaced with vectors generated by word2vec.
X_train_onehot = np.eye(v_size)[X_train]
X = theano.shared(np.array(X_train_onehot).astype('float32'), name = 'X')
Y_train_onehot = np.eye(v_size)[Y_train]
Y = theano.shared(np.array(Y_train_onehot).astype('float32'), name = 'Y')
# In[77]:
#Initializing U, V and W
i_dim = v_size
h_dim = 100
o_dim = v_size
U = theano.shared(np.random.randn(i_dim, h_dim).astype('float32'), name = 'U')
W = theano.shared(np.random.randn(h_dim, h_dim).astype('float32'), name = 'W')
V = theano.shared(np.random.randn(h_dim, o_dim).astype('float32'), name = 'V')
# In[78]:
#forward propagation
s = T.vector('s')
results, updates = theano.scan(lambda x, sm1: T.tanh( T.dot(x, U) + T.dot(sm1, W)),
sequences = X_train_onehot,
outputs_info = s
)
y_hat = T.dot(results, V)
forward_propagation = theano.function(inputs=[s], outputs = y_hat)
# In[80]:
#loss
loss = T.sum(T.nnet.categorical_crossentropy(y_hat, Y))
# In[81]:
#Gradients
dw = T.grad(loss, W)
du = T.grad(loss, U)
dv = T.grad(loss, V)
# In[82]:
#BPTT
learning_rate = T.scalar('learning_rate')
gradient_step = theano.function(inputs = [s, learning_rate],
updates = (
(U, U - learning_rate * du),
(V, V - learning_rate * dv),
(W, W - learning_rate * dw)
)
)
# In[ ]:
但它在梯度步骤中不断抛出错误。我发布了完整的代码,因为我不知道哪个步骤会影响错误。以下是jupyter notebook报错截图。
【问题讨论】:
【参考方案1】:我解决了。问题在于类型不匹配。我不得不将du, dv, dw, learning rate
类型转换为float32。默认情况下,它们是 float64。
【讨论】:
以上是关于无法在 theano 中为 rnn 编写梯度步骤的主要内容,如果未能解决你的问题,请参考以下文章
深度学习之六,基于RNN(GRU,LSTM)的语言模型分析与theano代码实现