0. 引言
参考文献:
- [arxiv] - .attention search
- [Survey] - Wang F, Tax D M J. Survey on the attention based RNN model and its applications in computer vision[J]. arXiv preprint arXiv:1601.06823, 2016.
- [Bahdanau] - Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014.
- [Weight normalization] - Salimans T, Kingma D P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks[C]//Advances in Neural Information Processing Systems. 2016: 901-909.
- [CV] - Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention[J]. arXiv preprint arXiv:1412.7755, 2014.
- [CV] - Mnih V, Heess N, Graves A. Recurrent models of visual attention[C]//Advances in neural information processing systems. 2014: 2204-2212.
- [Speech] - Chorowski J K, Bahdanau D, Serdyuk D, et al. Attention-based models for speech recognition[C]//Advances in Neural Information Processing Systems. 2015: 577-585.
- [CV] - Xu K, Ba J, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//International Conference on Machine Learning. 2015: 2048-2057.
- [Luong ] - Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation[J]. arXiv preprint arXiv:1508.04025, 2015.
- [Speech] - Bahdanau D, Chorowski J, Serdyuk D, et al. End-to-end attention-based large vocabulary speech recognition[C]//Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016: 4945-4949.
- [QA] - Yang Z, He X, Gao J, et al. Stacked attention networks for image question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 21-29.
- [Text] - Nallapati R, Xiang B, Zhou B. Sequence-to-Sequence RNNs for Text Summarization[J]. 2016.
- [Translation] - Wu Y, Schuster M, Chen Z, et al. Google‘s neural machine translation system: Bridging the gap between human and machine translation[J]. arXiv preprint arXiv:1609.08144, 2016.
- [Translation] - Neubig G. Neural Machine Translation and Sequence-to-sequence Models: A Tutorial[J]. arXiv preprint arXiv:1703.01619, 2017.
- [BahdanauMonotonic] - Raffel C, Luong T, Liu P J, et al. Online and linear-time attention by enforcing monotonic alignments[J]. arXiv preprint arXiv:1704.00784, 2017.
- [???] - .Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[J].arXiv preprint arXiv:1706.03762v4, 2017.
- [Blog] - .Attention and Augmented Recurrent Neural Networks
- [Quora] - .How-does-an-attention-mechanism-work-in-deep-learning
- [Quora] - .Can-you-recommend-to-me-an-exhaustive-reading-list-for-attention-models-in-deep-learning
- [Quora] - .What-is-attention-in-the-context-of-deep-learning
- [Quora] - .What-is-an-intuitive-explanation-for-how-attention-works-in-deep-learning
- [Quora] - .What-is-exactly-the-attention-mechanism-introduced-to-RNN-recurrent-neural-network-It-would-be-nice-if-you-could-make-it-easy-to-understand
- [Quora] - .How-is-a-saliency-map-generated-when-training-recurrent-neural-networks-with-soft-attention
- [Quora] - .What-is-the-difference-between-soft-attention-and-hard-attention-in-neural-networks
- [Quora] - .What-is-Attention-Mechanism-in-Neural-Networks
- [Quora] - .How-is-the-attention-component-of-attentional-neural-networks-trained