论文阅读 WWW‘23Zero-shot Clarifying Question Generation for Conversational Search

Posted 长命百岁️

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了论文阅读 WWW‘23Zero-shot Clarifying Question Generation for Conversational Search相关的知识,希望对你有一定的参考价值。

文章目录

前言

Motivation

Generate clarifying questions in a zero-shot setting to overcome the cold start problem and data bias.

cold start problem: 缺少数据导致难应用,难应用导致缺少数据

data bias: 获得包括所有可能topic的监督数据不现实,在这些数据上训练也会有 bias

Contributions

  • the first to propose a zero-shot clarifying question generation system, which attempts to address the cold-start challenge of asking clarifying questions in conversational search.
  • the first to cast clarifying question generation as a constrained language generation task and show the advantage of this configuration.
  • We propose an auxiliary evaluation strategy for clarifying question generations, which removes the information-scarce question templates from both generations and references.

Method

Backbone: a checkpoint of GPT-2

  • original inference objective is to predict the next token given all previous texts

Directly append the query q q q and facet f f f as input and let GPT-2 generate cq will cause two challenges:

  • it does not necessarily cover facets in the generation.
  • the generated sentences are not necessarily in the tone of clarifying questions

We divide our system into two parts:

  • facet-constrained question generation(tackle the first challenge)
  • multi-form question prompting and ranking(tackle the second challenge, rank different clarifying questions generated by different templates)

Facet-constrained Question Generation

Our model utilizes the facet words not as input but as constraints. We employ an algorithm called Neurologic Decoding. Neurologic Decoding is based on beam search.

  • in t t t​ step, assuming the already generated candidates in the beam are 𝐶 = 𝑐 1 : 𝑘 𝐶 = \\𝑐_1:𝑘 \\ C=c1:k, k k k is the beam size, c i = x 1 : ( t − 1 ) i c_i=x^i_1:(t-1) ci=x1:(t1)i is the i i i th candidate, x 1 : ( t − 1 ) i x^i_1:(t-1) x1:(t1)i are tokens generated from decoding step 1 to ( t − 1 ) (t-1) (t1)

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-98Ld4wAG-1678024307327)

    • explain about why this method could better constrain the decoder to generate facet-related questions:
      • ( 2 ) t o p − β (2)top- \\beta (2)topβ​ is the main reason for promoting facet words in generations. Because of this filtering, Neurologic Decoding tends to discard generations with fewer facet words regardless of their generation probability
      • ( 3 ) (3) (3)​ the group is the key for Neurologic Decoding to explore as many branches as possible. Because this grouping method keeps the most cases $(2^| 𝑓 | ) $of facet word inclusions, allowing the decoder to cover the most possibilities of ordering constraints in generation
        • because if we choose top K candidates directly, there may be some candidates containing same facets, this results in less situation containing diverse facets. Towards choosing the best candidate in each group and then choose top K candidates, every candidate will contain different facets.

Multiform Question Prompting and Ranking

Use clarifying question templates as the starting text of the generation and let the decoder generate the rest of question body.

  • if the q q q is “I am looking for information about South Africa.” Then we give the decoder “I am looking for information about South Africa. [SEP] would you like to know” as input and let it generate the rest.
  • we use multiple prompts(templates) to both cover more ways of clarification and avoid making users bored

For each query, we will append these eight prompts to the query and form eight input and generate eight questions.

  • use ranking methods to choose the best one as the returned question

Experiments

Zero-shot clarifying question generation with existing baselines

  • Q-GPT-0
    • input: query
  • QF-GPT-0:
    • input: facet + query
  • Prompt-based GPT-0: includes a special instructional prompt as input
    • input: q “Ask a question that contains words in the list [f]”
  • Template-0: a template-guided approach using GPT-2
    • input: add the eight question templates during decoding and generate the rest of the question

Existing facet-driven baselines(finetuned):

  • Template-facet: append the facet word right after the question template

  • QF-GPT: a GPT-2 finetuning version of QF-GPT-0.
    • finetunes on a set of tuples in the form as f [SEP] q [BOS] cq [EOS]
  • Prompt-based finetuned GPT: a finetuning version of Prompt-based GPT-0
    • finetune GPT-2 with the structure: 𝑞 “Ask a question that contains words in the list [𝑓 ].” 𝑐𝑞

Note: simple facets-input finetuning is highly inefficient in informing the decoder to generate facet-related questions by observing a facet coverage rate of only 20%

Dataset

ClariQ-FKw: has rows of (q,f,cq) tuples.

  • q is an open-domain search query, f is a search facet, cq is a human-generated clarifying question
  • The facet inClariQ is in the form of a faceted search query. ClariQ-FKw extracts the keyword of the faceted query as its facet column and samples a dataset with 1756 training examples and 425 evaluation examples

Our proposed system does not access the training set while the other supervised learning systems can access the training set for finetuning.

Result

Auto-metric evaluation

RQ1: How well can we do in zero-shot clarifying question generation with existing baselines

  • all these baselines(the first four rows) struggle to produce any reasonable generations except for Template-0(but it’s question body is not good)
  • we find existing zero-shot GPT-2-based approaches cannot solve the clarifying question generation task effectively.

RQ2: the effectiveness of facet information for facet-specific clarifying question generation

  • compare our proposed zero-shot facet constrained (ZSFC) methods with a facet-free variation of ZSFC named Subject-constrained which uses subject of the query as constraints.
  • our study show that adequate use of facet information can significantly improve clarifying question generation quality

RQ3: whether our proposed zeroshot approach can perform the same or even better than existing facet-driven baselines

  • We see that from both tables, our zero-shot facet-driven approaches are always better than the finetuning baselines

Note: Template-facet rewriting is a simple yet strong baseline that both finetuning-based methods are actually worse than it.

Human evaluation

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5eC8PWul-1678024307328)

Knowledge

Approaches to clarifying query ambiguity can be roughly divided into three categories:

  • Query Reformulation: iteratively refine the query
    • is more efficient in context-rich situations
  • Query Suggestion: offer related queries to the user
    • is good for leading search topics, discovering user needs
  • Asking Clarifying Questions: proactively engages users to provide additional context.
    • could be exclusively helpful to clarify ambiguous queries without context.

论文解读丨Zero-Shot场景下的信息结构化提取

摘要:在信息结构化提取领域,前人一般需要基于人工标注的模板来完成信息结构化提取。论文提出一种zero-shot的基于图卷积网络的解决方案,可以解决训练集和测试集来自不同垂直领域的问题。

本文分享自华为云社区《论文解读系列十六:Zero-Shot场景下的信息结构化提取》,作者:一笑倾城。

 

摘要

在信息结构化提取领域,前人一般需要基于人工标注的模板来完成信息结构化提取。论文提出一种zero-shot的基于图卷积网络的解决方案,可以解决训练集和测试集来自不同垂直领域的问题。

Figure 1. 训练和推理数据来源的垂直领域不一样。

问题定义

Figure 2. OpenIE和ClosedIE的直观理解。

Relatin Extraction

  • Close Relation Extraction (ClasedIE)
    RR表示类别集合,包含无类别,模型直接为每个实体分配类别即可。
  • Open Relation Extraction(OpenIE)
    RR表示类别集合,模型作两类分类,判断一个实体是否是另一个实体的key。

Zero-Shot Extraction

Zero-Shot按难度分可以区分如下:

  • Unseen-Website Zero-shot Extraction
    即同一垂直领域的不同版式,比如,都是来自电影的网页。只是推理测试的时候使用的网页排版与训练不一样。
  • Unseen-Websiste Zero-shot Extraction
    即不同垂直领域的不同版式,比如,训练是来自电影的网页,而推理测试的时候使用的可能是招聘类网站的网页。

论文提出的解决方案其实是发掘出图网络中全部的key-value对,由于发掘key-value这个任务本身是版式不依赖的,从而起到了跨领域的版式结构解析。

概念

  • relation: 指key
  • object:指value
  • relationship: 指key -> value

编码器(特征构建)

节点信息的构建由图GG来完成,包括一系列的节点NN(实体),和节点之间的边E(Edges)。

基于设计的规则来构建实体之间的关系

以下情况下,会构建节点之间的边(key-value对经常是上下关系或左右关系):

  • 水平情况:水平邻居,而且中间没有其它节点;
  • 垂直情况:垂直邻居,而且中间没有其它节点;
  • 同级情况:同级节点;

使用图网络来实体之间的关系进进建模

基于Graph Attention Network (GAT)来对节点关系进行建模,节点初始(输入)特征:

  • 视觉特征:网页中对节点的视觉类描述;
  • 文本特征:OpenIE是对预训练Bert进行特征平均,CloseIE则是统计该节点字符串出现的频率(似乎对跨领域更友好);

预训练机制

论文设计了辅助的损失函数L_{pre}Lpre​进行三类分类的监督:{key, value, other}。同时为了防止训练过程过拟合,预训练完成后,OpenIE任务中的图网络权重不会更新。

关系预测网络

OpenIE

判断一对节点是否满足第一个节点字符串内容是第二个节点字符串内容的key:

  • 使用the candidate pair identification algorithm来获取潜在的字符串对;
  • 两个节点的原始输入特征+GNN输出特征+两个节点的关系特征作为分类器输入;
  • 全连接网络进行分类;

ClosedIE

交叉熵多类分类

实验

  • 确实是跨领域任务更加困难。

  • CloseIE:确实是网址越多,效果越好。

  • 确认各个因素对网络模型效果的影响。

点击关注,第一时间了解华为云新鲜技术~

以上是关于论文阅读 WWW‘23Zero-shot Clarifying Question Generation for Conversational Search的主要内容,如果未能解决你的问题,请参考以下文章

论文解读丨Zero-Shot场景下的信息结构化提取

论文总结A Survey of Zero-Shot Learning: Settings, Methods, and Applications

论文总结Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation

论文阅读Whisper: Robust Speech Recognition via Large-Scale Weak Supervision

《Towards Robust Monocular Depth Estimation:Mixing Datasets for Zero-shot Cross-dataset Transfer》论文笔记

文献阅读:Finetuned Language Models Are Zero-Shot Learners