Open-Domain QA -paper

Posted 2021-02-10 ab229693

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Open-Domain QA -paper相关的知识，希望对你有一定的参考价值。

Open-domain QA

Overview

The whole system is consisted with Document Retriever and Document Reader. The Document Retriever returns top five Wikipedia articles given any question, then the Document Reader will process these articles.

Document Retriever

The Retriever compares the TF-IDF weighted bag of word vectors between the articles and questions. And if take the word order into account with n-gram features, the performence will be better. In the paper, useing bigram counts performed best. It used hashing of (Weinberger et al., 2009) to map the bigrams to (2^{24}) bins with an unsigned murmur3 hash to preserv speed and memory efficiency.

Document Reader

The Document Reader was consisted of a multi-layer BiLSTM and a RNN layer. The input first was processed by a RNN, and then a multi-layer BiLSTM.

Paragraph encoding was comprisied of the following pars:

Word embeddings:
- 300d Glove, only fine-tune the 1000 most frequent question words because the representations of some keu words such as what, how, whick, many could be crucial for QA systems.
Exact match:
- Three simple features, indicating whether (p_i) can be exactly matched to one question word in (q), either in its original, lowercase or lemma form. It‘s helpful as the ablation analysis.
Token features:
- POS, NER TF
Aligned question embedding:
- the embedding is actually an attention mechanism between question and paragraph. It was computed as following: ((alpha(cdot)) is a single dense layer with ReLU nonlinearity.)
  [egin{aligned} &a_{i,j} = frac{exp(alpha(E(p_i))cdot alpha(E(q_j)))}{sum_{j^`}exp(alpha(E(p_i)) cdot alpha(E(q_{j^`})))}&f_{align(p_i)} = sum_j a_{i,j}E(q_j) end{aligned}]

Question encoding

Only apply a recurrent NN on top of word embedding of (q_i) and combine the resulting hidden units into one single vector: ({q_1, cdots, q_l} ightarrow q). The (q) was computed as following:
[ egin{aligned} & b_j = frac{exp(w cdot q_j)}{sum_{j^`} exp(w cdot q_{j^`})}& q = sum_j b_jq_j end{aligned} ]
where (b_j) encodes the importance of each question word. I think the computation is very similar with the question self attention.

Prediction

Take the (p) and (q) as input to train a classifier to predict the correct span positions.
[egin{aligned} P_{start}(i) & propto exp(p_iW_sq) P_{end}(i) & propto exp(p_iW_eq) end{aligned} ]
Then select the best span from token (i) and token (i^`) such that (i leq i^` leq i+15) and (P_{start}(i) imes P_{end}(i^`)) is maximized.

Analysis

The ablation analysis result:

技术图片

As the result showing, the aligned feature and exact_match feature are complementary and similar role as it does not matter when removing them respectively, but the performance drops dramatically wehn removing both of them.

以上是关于Open-Domain QA -paper的主要内容，如果未能解决你的问题，请参考以下文章

Dense Passage Retrieval for Open-Domain Question Answering

Hortonworks 沙箱 - 无法启动，因为 ambari-qa-Sandbox@DOMAIN.COM 应该是 ambari-qa-sandbox@DOMAIN.COM