李宏毅机器学习12. 循环神经网络
Posted huzheyu
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了李宏毅机器学习12. 循环神经网络相关的知识,希望对你有一定的参考价值。
-
案例:Slot Filling
- Feedforward Network
- Input: word vector
- Output: word属于每个slot的概率
- 问题:无法使用前文的信息,可能造成误判
- 解决:在NN中引入Memory,使NN能够记住前文的信息,即RNN
- word -> vector
- 1-of-N encoding
- Other
- Word hashing
- 1-of-N encoding
- Feedforward Network
-
RNN
- hidden layer的输出存储在memory中
- memory中的数据又作为hidden layer的输入
- 对Input的顺序敏感
- 对于句子中的相同词汇,由于引入了Memory,前文不同,输出也会不同
- 可以扩展到多层
- 种类
- Elman Network
- hidden layer的输出存储到memory中
- Jordan Network
- output layer的输出存储到memory中
- Bidirectional RNN
- 从两个方向扫描句子,output layer的输出来自于两个不同方向hidden layer的输出
- LSTM
- Long Short-term Memory
- Elman Network
-
LSTM
- 每个Neuron有4个组成部分
- Input Gate
- Memory Cell
- Forget Gate
- Output Gate
- 4个输入
- Input of Input Gate
- Signal control of Input Gate
- Signal control of Forget Gate
- Signal control of Output Gate
- 1个输出
- Output of Output Gate
- 每个Neuron中的处理步骤
- Input of Input Gate: (z) -> (g(z))
- Signal control of Input Gate: (z_i) -> (f(z_i))
- Multiply: (g(z)f(z_i))
- Signal control of Forget Gate: (z_f) -> (f(z_f))
- Old Memory: (c)
- Multiply: (cf(z_f))
- New Memory: (c‘=cf(z_f)) -> (h(c‘))
- Signal control of Output Gate: (z_o) -> (f(z_o))
- Multiply: (h(c‘)f(z_o))
- Output of Output Gate: (a=h(c‘)f(z_o))
- (f)通常是Sigmoid
- 可以把整个输入(x_i)分别给到4个输入,也可以把输入(x_i)分成4个vector分别输入
- peephole
- 把Memory Cell的输出也作为Input Gate的输入之一
- Multi-layer LSTM
- LSTM几乎成了RNN的代名词
- 每个Neuron有4个组成部分
-
Keras support
- LSTM
- GRU
- Gated Recurrent Unit
- 比LSTM简单,性能相近
- SimpleRNN
-
训练方法:BPTT
- Backpropagation through time
-
The error surface is though
- very flat or very steep
- 直观解释
- (w=1,w^{1000}=1)
- (w=1.01,w^{1000}approx 20000)
- (w=0.99,w^{1000}approx 0)
- (w=0.01,w^{1000}approx 0)
- 解决方案:
- LSTM
- 能够解决梯度消失,但不能解决梯度爆炸
- 只要Forget Gate打开,memory就会被考虑
- Clockwise RNN
- SCRN
- Structurally Constrained Recurrent Network
- Vanilla RNN+用单位矩阵初始化+ReLU激活函数
- LSTM
-
RNN的其他应用
- 多对一
- Input: vector sequence
- Output: one vector
- Sentiment Analysis 情感分析
- Key Term Extraction 关键词抽取
- 多对多(输出更短)
- Input: vector sequence
- Output: shorter vector sequence
- Speech Recognition 语音识别
- Trimming
- CTC
- Connectionist Temporal Classification
- 多对多(无限制)
- Input: vector sequence
- Output: vector sequence with any length
- Sequence to sequence learning
- Machine Translation 机器翻译
- Syntactic parsing 语法解析
- Auto-encoder
- 单层
- 多层
- Encoder
- Decoder
- 应用
- Chat-bot
- Video Caption Generation
- Image Caption Generation
- Attention-based model
- Reading Comprehension
- Visual Question Answering
- Speech Question Answering
- 托福听力考试
- 多对一
-
RNN v.s. Structured
-
RNN
- 单向RNN没有考虑整个sequence
- 双向RNN
- cost和error不总是相关
- 可以Deep
- 单向RNN没有考虑整个sequence
-
Structured
- 使用Viterbi算法可以考虑整个sequence
- 能明确考虑到label之间的依赖
- cost是error的上界
-
可以结合起来
- Speech Recognition: CNN/LSTM/DNN+HMM
- Semantic Tagging: Bi-directional LSTM+CRF/Structured SVM
-
-
Deep and Structured will be the future
- GAN
- Conditional GAN
- Connect Energy-based model with GAN
- Deep learning model for inference
-
推荐一本教科书:《Deep Learning》《深度学习》(花书)
- Part 2: Deep Learning
- Part 3: Structured Learning
以上是关于李宏毅机器学习12. 循环神经网络的主要内容,如果未能解决你的问题,请参考以下文章
李宏毅《机器学习》丨6. Convolutional Neural Network(卷积神经网络)
李宏毅《机器学习》丨6. Convolutional Neural Network(卷积神经网络)
李宏毅机器学习2021卷积神经网络HW3-Image Classification(更新ing)