r-net:machine reading comprehension with self-matching networks

Posted 2022-12-15 AI蜗牛之家

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了r-net:machine reading comprehension with self-matching networks相关的知识，希望对你有一定的参考价值。

文章目录

1.模型概述
2.passage和question编码层
3.Gated Attention-based RNN
4.Self-Matching Attention
5.output layer
6.源码和参数
参考链接

我觉得这篇文章的文笔真的有点不敢恭维，首先向量矩阵的维度不说清楚还能脑补，但是这边前后不同层之间用一样的变量名是什么意思啊(_{这么说出来会不会被MSRA鄙视，以后的简历都过不了了，ORZ})，本文中尽量避免这种情况。嗯嗯，文章还是不错的^@^
标记版文章下载地址：zsweet github
原始文章下载地址：
Gated Self-Matching Networks for Reading Comprehension and Question Answering
R-NET: MACHINE READING COMPREHENSION WITH SELF-MATCHING NETWORKS

这里先总结下几个特点：

同时使用了char-embedding和word-embedding，不同的是char-embedding是通过将char放入双向gru之后，最终的是通过gru的最终状态来得到的。
在attention之后添加gate，主要用在Question-Passage Matching和Passage Self-Matching之后。
对文章本身进行self-attention。

1.模型概述

其实我觉得整体结构就是math-lstm和transformer模型的融合，然后再加了一点的小trick.

模型整体可以分为如下几个模块：

embedding层
RNN网络分别对question和passage单独编码 (本文中将前两部分放在一个部分介绍)
基于门限的注意力循环神经网络（gated-attention based recurrent network）匹配question和passage，获取问题的相关段落表示（question-aware passage representation）
基于自匹配注意力机制的循环神经网络（self-matching attention network），将passage和它自己匹配，从而实现整个段落的高效编码
基于指针网络（pointer-network）定位答案所在位置

模型结构如下：

2.passage和question编码层

输入问题 $Q=w_t^Q_t=1^m$ 和段落 $P=w_t^P_t=1^n$ ，分别进行word-level编码和character-level编码，得到向量 $e$ 和 $c$ 。这里character-level编码主要是为了应对OOV的影响，以往OOV词向量直接就是0，这里可以缓和OOV的影响。之后，利用两个双向RNN网络分别对question和passage再编码。而之前多数都是用的CNN卷积和highway。另外，作者在这里选用了GRU单元，而不是LSTM，原因在于GRU计算量更小。
$\\mathbf u_t^Q = \\rmBiGRU (u_t-1^Q, [e_t^Q, c_t^Q])$

$\\mathbf u_t^P = \\rmBiGRU (u_t-1^P, [e_t^P, c_t^P])$

其中： $e_t, c_t$ 分别表示词向量和字向量，编码后的passage为 $u_1^Q,u_2^Q...u_m^Q]$ ,query为 $u_1^P,u_2^P...u_n^P]$

3.Gated Attention-based RNN

首先对query做attention：
$s_j^t = v^T \\tanh (W_u^Q\\mathbf u_j^Q + W_u^P \\mathbf u_t^P + W_v^P \\mathbf v_t-1^P), \\quad j = 1, \\cdots, m$
$\\alpha_j^t = \\rmsoftmax(s_j^t)$

$\\mathbf c_t = \\sum_i=1^m \\alpha_i^t \\mathbf u_i^Q$
上述的attention可记为 $\\mathbf c_t = \\rmattn(\\mathbf u^Q, [\\mathbf u_t^P, \\mathbf v_t-1^P])。$

上面的attention与match-lstm一样，但是这里又增加了一个gate：
$g_t = sigmoid(W_g[\\mathbf u_t^P,\\mathbf c_t])$
$[\\mathbf u_t^P, \\mathbf c_t]^* = g_t \\odot [\\mathbf u_t^P, \\mathbf c_t]$

然后再像match-lstm一样放入RNN：
$\\mathbf v_t^P = \\rmBiGRU(\\mathbf v_t-1^P, [\\mathbf u_t^P, \\mathbf c_t]^*)$