今日 cs.AI方向共计26篇文章。
Artificial Intelligence(18篇)
[1]:SEDRo: A Simulated Environment for Developmental Robotics
标题:SEDRo:开发机器人的模拟环境
作者:Aishwarya Pothula, Md Ashaduzzaman Rubel Mondol, Sanath Narasimhan, Sm Mazharul Islam, Deokgun Park
链接:https://arxiv.org/abs/2009.01810
摘要:Even with impressive advances in application-specific models, we still lack knowledge about how to build a model that can learn in a human-like way and do multiple tasks. To learn in a human-like way, we need to provide a diverse experience that is comparable to humans. In this paper, we introduce our ongoing effort to build a simulated environment for developmental robotics (SEDRo). SEDRo provides diverse human experiences ranging from those of a fetus to a 12th-month-old. A series of simulated tests based on developmental psychology will be used to evaluate the progress of a learning model. We anticipate SEDRo to lower the cost of entry and facilitate research in the developmental robotics community.
[2]:Action and Perception as Divergence Minimization
标题:作为散度最小化的行动和感知
作者:Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess
备注:13 pages, 10 figures
链接:https://arxiv.org/abs/2009.01791
摘要:We introduce a unified objective for action and perception of intelligent agents. Extending representation learning and control, we minimize the joint divergence between the world and a target distribution. Intuitively, such agents use perception to align their beliefs with the world, and use actions to align the world with their beliefs. Minimizing the joint divergence to an expressive target maximizes the mutual information between the agent's representations and inputs, thus inferring representations that are informative of past inputs and exploring future inputs that are informative of the representations. This lets us derive intrinsic objectives, such as representation learning, information gain, empowerment, and skill discovery from minimal assumptions. Moreover, interpreting the target distribution as a latent variable model suggests expressive world models as a path toward highly adaptive agents that seek large niches in their environments, while rendering task rewards optional. The presented framework provides a common language for comparing a wide range of objectives, facilitates understanding of latent variables for decision making, and offers a recipe for designing novel objectives. We recommend deriving future agent objectives from the joint divergence to facilitate comparison, to point out the agent's target distribution, and to identify the intrinsic objective terms needed to reach that distribution.
[3]:Grounded Language Learning Fast and Slow
标题:扎实的语言学习快速和缓慢
作者:Felix Hill, Olivier Tieleman, Tamara von Glehn, Nathaniel Wong, Hamza Merzic, Stephen Clark
链接:https://arxiv.org/abs/2009.01719
摘要:Recent work has shown that large text-based neural language models, trained with conventional supervised learning objectives, acquire a surprising propensity for few- and one-shot learning. Here, we show that an embodied agent situated in a simulated 3D world, and endowed with a novel dual-coding external memory, can exhibit similar one-shot word learning when trained with conventional reinforcement learning algorithms. After a single introduction to a novel object via continuous visual perception and a language prompt ("This is a dax"), the agent can re-identify the object and manipulate it as instructed ("Put the dax on the bed"). In doing so, it seamlessly integrates short-term, within-episode knowledge of the appropriate referent for the word "dax" with long-term lexical and motor knowledge acquired across episodes (i.e. "bed" and "putting"). We find that, under certain training conditions and with a particular memory writing mechanism, the agent's one-shot word-object binding generalizes to novel exemplars within the same ShapeNet category, and is effective in settings with unfamiliar numbers of objects. We further show how dual-coding memory can be exploited as a signal for intrinsic motivation, stimulating the agent to seek names for objects that may be useful for later executing instructions. Together, the results demonstrate that deep neural networks can exploit meta-learning, episodic memory and an explicitly multi-modal environment to account for 'fast-mapping', a fundamental pillar of human cognitive development and a potentially transformative capacity for agents that interact with human users.
[4]:On Population-Based Algorithms for Distributed Constraint Optimization Problems
标题:基于种群的分布式约束优化算法研究
作者:Saaduddin Mahmud, Md. Mosaddek Khan, Nicholas R. Jennings
备注:7 Figures. arXiv admin note: text overlap witharXiv:1909.06254,arXiv:2002.12001
链接:https://arxiv.org/abs/2009.01625
摘要:Distributed Constraint Optimization Problems (DCOPs) are a widely studied class of optimization problems in which interaction between a set of cooperative agents are modeled as a set of constraints. DCOPs are NP-hard and significant effort has been devoted to developing methods for finding incomplete solutions. In this paper, we study an emerging class of such incomplete algorithms that are broadly termed as population-based algorithms. The main characteristic of these algorithms is that they maintain a population of candidate solutions of a given problem and use this population to cover a large area of the search space and to avoid local-optima. In recent years, this class of algorithms has gained significant attention due to their ability to produce high-quality incomplete solutions. With the primary goal of further improving the quality of solutions compared to the state-of-the-art incomplete DCOP algorithms, we present two new population-based algorithms in this paper. Our first approach, Anytime Evolutionary DCOP or AED, exploits evolutionary optimization meta-heuristics to solve DCOPs. We also present a novel anytime update mechanism that gives AED its anytime property. While in our second contribution, we show that population-based approaches can be combined with local search approaches. Specifically, we develop an algorithm called DPSA based on the Simulated Annealing meta-heuristic. We empirically evaluate these two algorithms to illustrate their respective effectiveness in different settings against the state-of-the-art incomplete DCOP algorithms including all existing population-based algorithms in a wide variety of benchmarks. Our evaluation shows AED and DPSA markedly outperform the state-of-the-art and produce up to 75% improved solutions.
[5]:Derived metrics for the game of Go -- intrinsic network strength assessment and cheat-detection
标题:围棋博弈的衍生度量——内在网络强度评估与作弊检测
作者:Attila Egri-Nagy, Antti Törmänen
备注:16 pages, 12 figures, final version will be published elsewhere
链接:https://arxiv.org/abs/2009.01606
摘要:The widespread availability of superhuman AI engines is changing how we play the ancient game of Go. The open-source software packages developed after the AlphaGo series shifted focus from producing strong playing entities to providing tools for analyzing games. Here we describe two ways of how the innovations of the second generation engines (e.g.~score estimates, variable komi) can be used for defining new metrics that help deepen our understanding of the game. First, we study how much information the search component contributes in addition to the raw neural network policy output. This gives an intrinsic strength measurement for the neural network. Second, we define the effect of a move by the difference in score estimates. This gives a fine-grained, move-by-move performance evaluation of a player. We use this in combating the new challenge of detecting online cheating.
[6]:Fairness in the Eyes of the Data: Certifying Machine-Learning Models
标题:数据眼中的公平:机器学习模型的认证
作者:Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet
链接:https://arxiv.org/abs/2009.01534
摘要:We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test. The framework verifies any trained model, regardless of its training process and architecture. Thus, it allows us to evaluate any deep learning model on multiple fairness definitions empirically. We tackle two scenarios, where either the test data is privately available only to the tester or is publicly known in advance, even to the model creator. We investigate the soundness of the proposed approach using theoretical analysis and present statistical guarantees for the interactive test. Finally, we provide a cryptographic technique to automate fairness testing and certified inference with only black-box access to the model at hand while hiding the participants' sensitive data.
[7]:User Intention Recognition and Requirement Elicitation Method for Conversational AI Services
标题:面向会话式人工智能服务的用户意图识别与需求获取方法
作者:Junrui Tian, Zhiying Tu, Zhongjie Wang, Xiaofei Xu, Min Liu
备注:accepted as a full paper at IEEE ICWS 2020
链接:https://arxiv.org/abs/2009.01509
摘要:In recent years, chat-bot has become a new type of intelligent terminal to guide users to consume services. However, it is criticized most that the services it provides are not what users expect or most expect. This defect mostly dues to two problems, one is that the incompleteness and uncertainty of user's requirement expression caused by the information asymmetry, the other is that the diversity of service resources leads to the difficulty of service selection. Conversational bot is a typical mesh device, so the guided multi-rounds Q$\&$A is the most effective way to elicit user requirements. Obviously, complex Q$\&$A with too many rounds is boring and always leads to bad user experience. Therefore, we aim to obtain user requirements as accurately as possible in as few rounds as possible. To achieve this, a user intention recognition method based on Knowledge Graph (KG) was developed for fuzzy requirement inference, and a requirement elicitation method based on Granular Computing was proposed for dialog policy generation. Experimental results show that these two methods can effectively reduce the number of conversation rounds, and can quickly and accurately identify the user intention.
[8]:Learning to Infer User Hidden States for Online Sequential Advertising
标题:学习推断在线顺序广告的用户隐藏状态
作者:Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Weinan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai
备注:to be published in CIKM 2020
链接:https://arxiv.org/abs/2009.01453
摘要:To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important. The lack of interpretability in existing deep reinforcement learning methods makes it not easy to understand, diagnose and further optimize the strategy. In this paper, we propose our Deep Intents Sequential Advertising (DISA) method to address these issues. The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states). In this paper, we model this intention as a latent variable and formulate the problem as a Partially Observable Markov Decision Process (POMDP) where the underlying intents are inferred based on the observable behaviors. Large-scale industrial offline and online experiments demonstrate our method's superior performance over several baselines. The inferred hidden states are analyzed, and the results prove the rationality of our inference.
[9]:FairXGBoost: Fairness-aware Classification in XGBoost
标题:FairXGBoost:XGBoost中的公平感知分类
作者:Srinivasan Ravichandran, Drona Khurana, Bharath Venkatesh, Narayanan Unny Edakunni
链接:https://arxiv.org/abs/2009.01442
摘要:Highly regulated domains such as finance have long favoured the use of machine learning algorithms that are scalable, transparent, robust and yield better performance. One of the most prominent examples of such an algorithm is XGBoost. Meanwhile, there is also a growing interest in building fair and unbiased models in these regulated domains and numerous bias-mitigation algorithms have been proposed to this end. However, most of these bias-mitigation methods are restricted to specific model families such as logistic regression or support vector machine models, thus leaving modelers with a difficult decision of choosing between fairness from the bias-mitigation algorithms and scalability, transparency, performance from algorithms such as XGBoost. We aim to leverage the best of both worlds by proposing a fair variant of XGBoost that enjoys all the advantages of XGBoost, while also matching the levels of fairness from the state-of-the-art bias-mitigation algorithms. Furthermore, the proposed solution requires very little in terms of changes to the original XGBoost library, thus making it easy for adoption. We provide an empirical analysis of our proposed method on standard benchmark datasets used in the fairness community.
[10]:Ramifications of Approximate Posterior Inference for Bayesian Deep Learning in Adversarial and Out-of-Distribution Settings
标题:对抗性和非分布环境下贝叶斯深度学习的近似后验推理结果
作者:John Mitros, Arjun Pakrashi, Brian Mac Namee
备注:ARRW@ECCV2020
链接:https://arxiv.org/abs/2009.01798
摘要:Deep neural networks have been successful in diverse discriminative classification tasks, although, they are poorly calibrated often assigning high probability to misclassified predictions. Potential consequences could lead to trustworthiness and accountability of the models when deployed in real applications, where predictions are evaluated based on their confidence scores. Existing solutions suggest the benefits attained by combining deep neural networks and Bayesian inference to quantify uncertainty over the models' predictions for ambiguous datapoints. In this work we propose to validate and test the efficacy of likelihood based models in the task of out of distribution detection (OoD). Across different datasets and metrics we show that Bayesian deep learning models on certain occasions marginally outperform conventional neural networks and in the event of minimal overlap between in/out distribution classes, even the best models exhibit a reduction in AUC scores in detecting OoD data. Preliminary investigations indicate the potential inherent role of bias due to choices of initialisation, architecture or activation functions. We hypothesise that the sensitivity of neural networks to unseen inputs could be a multi-factor phenomenon arising from the different architectural design choices often amplified by the curse of dimensionality. Furthermore, we perform a study to find the effect of the adversarial noise resistance methods on in and out-of-distribution performance, as well as, also investigate adversarial noise robustness of Bayesian deep learners.
[11]:HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings
标题:HyperBench:超图和经验发现的基准和工具
作者:Wolfgang Fischl, Georg Gottlob, Davide Mario Longo, Reinhard Pichler
备注:arXiv admin note: substantial text overlap witharXiv:1811.08181
链接:https://arxiv.org/abs/2009.01769
摘要:To cope with the intractability of answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph decompositions have been proposed -- giving rise to different notions of width, noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and fhw). Given the increasing interest in using such decomposition methods in practice, a publicly accessible repository of decomposition software, as well as a large set of benchmarks, and a web-accessible workbench for inserting, analyzing, and retrieving hypergraphs are called for.
We address this need by providing (i) concrete implementations of hypergraph decompositions (including new practical algorithms), (ii) a new, comprehensive benchmark of hypergraphs stemming from disparate CQ and CSP collections, and (iii) HyperBench, our new web-inter\-face for accessing the benchmark and the results of our analyses. In addition, we describe a number of actual experiments we carried out with this new infrastructure.
[12]:Max-value Entropy Search for Multi-Objective Bayesian Optimization with Constraints
标题:带约束多目标贝叶斯优化的最大值熵搜索
作者:Syrine Belakaria, Aryan Deshwal, Janardhan Rao Doppa
备注:2 figure, 1 table. arXiv admin note: text overlap witharXiv:2008.07029
链接:https://arxiv.org/abs/2009.01721
摘要:We consider the problem of constrained multi-objective blackbox optimization using expensive function evaluations, where the goal is to approximate the true Pareto set of solutions satisfying a set of constraints while minimizing the number of function evaluations. For example, in aviation power system design applications, we need to find the designs that trade-off total energy and the mass while satisfying specific thresholds for motor temperature and voltage of cells. This optimization requires performing expensive computational simulations to evaluate designs. In this paper, we propose a new approach referred as {\em Max-value Entropy Search for Multi-objective Optimization with Constraints (MESMOC)} to solve this problem. MESMOC employs an output-space entropy based acquisition function to efficiently select the sequence of inputs for evaluation to uncover high-quality pareto-set solutions while satisfying constraints.
We apply MESMOC to two real-world engineering design applications to demonstrate its effectiveness over state-of-the-art algorithms.
[13]:Quasi-symplectic Langevin Variational Autoencoder
标题:准辛Langevin变分自动编码器
作者:Zihao Wang, Hervé Delingette
链接:https://arxiv.org/abs/2009.01675
摘要:Variational autoencoder (VAE) as one of the well investigated generative model is very popular in nowadays neural learning research works. To leverage VAE in practical tasks which have high dimensions and huge dataset often face the problem of low variance evidence lower bounds construction. Markov chain Monte Carlo (MCMC) is an effective approach to tight the evidence lower bound (ELBO) for approximating the posterior distribution. Hamiltonian Variational Autoencoder (HVAE) is one of the effective MCMC inspired approaches for constructing the unbiased low-variance ELBO which is also amenable for reparameterization trick. The solution significantly improves the performance of the posterior estimation effectiveness, yet, a main drawback of HVAE is the leapfrog method need to access the posterior gradient twice which leads to bad inference efficiency performance and the GPU memory requirement is fair large. This flaw limited the application of Hamiltonian based inference framework for large scale networks inference. To tackle this problem, we propose a Quasi-symplectic Langevin Variational autoencoder (Langevin-VAE), which can be a significant improvement over resource usage efficiency. We qualitatively and quantitatively demonstrate the effectiveness of the Langevin-VAE compared to the state-of-art gradients informed inference framework.
[14]:Deep Learning Based Antenna Selection for Channel Extrapolation in FDD Massive MIMO
标题:FDD大规模MIMO中基于深度学习的信道外推天线选择
作者:Yindi Yang, Shun Zhang, Feifei Gao, Chao Xu, Jianpeng Ma, Octavia A. Dobre
备注:6 pages, 5 figures
链接:https://arxiv.org/abs/2009.01653
摘要:In massive multiple-input multiple-output (MIMO) systems, the large number of antennas would bring a great challenge for the acquisition of the accurate channel state information, especially in the frequency division duplex mode. To overcome the bottleneck of the limited number of radio links in hybrid beamforming, we utilize the neural networks (NNs) to capture the inherent connection between the uplink and downlink channel data sets and extrapolate the downlink channels from a subset of the uplink channel state information. We study the antenna subset selection problem in order to achieve the best channel extrapolation and decrease the data size of NNs. The probabilistic sampling theory is utilized to approximate the discrete antenna selection as a continuous and differentiable function, which makes the back propagation of the deep learning feasible. Then, we design the proper off-line training strategy to optimize both the antenna selection pattern and the extrapolation NNs. Finally, numerical results are presented to verify the effectiveness of our proposed massive MIMO channel extrapolation algorithm.
[15]:Deep Learning Optimized Sparse Antenna Activation for Reconfigurable Intelligent Surface Assisted Communication
标题:可重构智能表面辅助通信的深度学习优化稀疏天线激活
作者:Shunbo Zhang, Shun Zhang, Feifei Gao, Jianpeng Ma, Octavia A. Dobre
备注:30 pages, 14 figures
链接:https://arxiv.org/abs/2009.01607
摘要:To capture the communications gain of the massive radiating elements with low power cost, the conventional reconfigurable intelligent surface (RIS) usually works in passive mode. However, due to the cascaded channel structure and the lack of signal processing ability, it is difficult for RIS to obtain the individual channel state information and optimize the beamforming vector. In this paper, we add signal processing units for a few antennas at RIS to partially acquire the channels. To solve the crucial active antenna selection problem, we construct an active antenna selection network that utilizes the probabilistic sampling theory to select the optimal locations of these active antennas. With this active antenna selection network, we further design two deep learning (DL) based schemes, i.e., the channel extrapolation scheme and the beam searching scheme, to enable the RIS communication system. The former utilizes the selection network and a convolutional neural network to extrapolate the full channels from the partial channels received by the active RIS antennas, while the latter adopts a fully-connected neural network to achieve the direct mapping between the partial channels and the optimal beamforming vector with maximal transmission rate. Simulation results are provided to demonstrate the effectiveness of the designed DL-based schemes.
[16]:Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks
标题:残差网络层并行训练的惩罚法和增广拉格朗日法
作者:Qi Sun, Hexing Dong, Zewei Chen, Weizhen Dian, Jiacheng Sun, Yitong Sun, Zhenguo Li, Bin Dong
链接:https://arxiv.org/abs/2009.01462
摘要:Algorithms for training residual networks (ResNets) typically require forward pass of data, followed by backpropagating of loss gradient to perform parameter updates, which can take many hours or even days for networks with hundreds of layers. Inspired by the penalty and augmented Lagrangian methods, a layer-parallel training algorithm is proposed in this work to overcome the scalability barrier caused by the serial nature of forward-backward propagation in deep residual learning. Moreover, by viewing the supervised classification task as a numerical discretization of the terminal control problem, we bridge the concept of synthetic gradient for decoupling backpropagation with the parareal method for solving differential equations, which not only offers a novel perspective on the design of synthetic loss function but also performs parameter updates with reduced storage overhead. Experiments on a preliminary example demonstrate that the proposed algorithm achieves comparable or even better testing accuracy to the full serial backpropagation approach, while enabling layer-parallelism can provide speedup over the traditional layer-serial training methods.
[17]:Computational prediction of RNA tertiary structures using machine learning methods
标题:用机器学习方法预测RNA三级结构
作者:Bin Huang, Yuanyang Du, Shuai Zhang, Wenfei Li, Jun Wang, Jian Zhang
备注:20 pages, 2 figures. Chinese Physics B, Aug. 2020
链接:https://arxiv.org/abs/2009.01440
摘要:RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field.
[18]:Convolutional Speech Recognition with Pitch and Voice Quality Features
标题:基于基音和音质特征的卷积语音识别
作者:Guillermo Cámbara, Jordi Luque, Mireia Farrús
备注:5 pages
链接:https://arxiv.org/abs/2009.01309
摘要:The effects of adding pitch and voice quality features such as jitter and shimmer to a state-of-the-art CNN model for Automatic Speech Recognition are studied in this work. Pitch features have been previously used for improving classical HMM and DNN baselines, while jitter and shimmer parameters have proven to be useful for tasks like speaker or emotion recognition. Up to our knowledge, this is the first work combining such pitch and voice quality features with modern convolutional architectures, showing improvements up to 2% absolute WER points, for the publicly available Spanish Common Voice dataset. Particularly, our work combines these features with mel-frequency spectral coefficients (MFSCs) to train a convolutional architecture with Gated Linear Units (Conv GLUs). Such models have shown to yield small word error rates, while being very suitable for parallel processing for online streaming recognition use cases. We have added pitch and voice quality functionality to Facebook's wav2letter speech recognition framework, and we provide with such code and recipes to the community, to carry on with further experiments. Besides, to the best of our knowledge, our Spanish Common Voice recipe is the first public Spanish recipe for wav2letter.
CV方向重复(5篇)
[1]:Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild
标题:野外场景文本检测的合成到真实无监督域自适应
作者:Weijia Wu, Ning Lu, Enze Xie
链接:https://arxiv.org/abs/2009.01766
摘要:Deep learning-based scene text detection can achieve preferable performance, powered with sufficient labeled training data. However, manual labeling is time consuming and laborious. At the extreme, the corresponding annotated data are unavailable. Exploiting synthetic data is a very promising solution except for domain distribution mismatches between synthetic datasets and real datasets. To address the severe domain distribution mismatch, we propose a synthetic-to-real domain adaptation method for scene text detection, which transfers knowledge from synthetic data (source domain) to real data (target domain). In this paper, a text self-training (TST) method and adversarial text instance alignment (ATA) for domain adaptive scene text detection are introduced. ATA helps the network learn domain-invariant features by training a domain classifier in an adversarial manner. TST diminishes the adverse effects of false positives~(FPs) and false negatives~(FNs) from inaccurate pseudo-labels. Two components have positive effects on improving the performance of scene text detectors when adapting from synthetic-to-real scenes. We evaluate the proposed method by transferring from SynthText, VISD to ICDAR2015, ICDAR2013. The results demonstrate the effectiveness of the proposed method with up to 10% improvement, which has important exploration significance for domain adaptive scene text detection. Code is available atthis https URL
[2]:Multi-Loss Weighting with Coefficient of Variations
标题:变分系数多损失加权
作者:Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink
链接:https://arxiv.org/abs/2009.01717
摘要:Many interesting tasks in machine learning and computer vision are learned by optimising an objective function defined as a weighted linear combination of multiple losses. The final performance is sensitive to choosing the correct (relative) weights for these losses. Finding a good set of weights is often done by adopting them into the set of hyper-parameters, which are set using an extensive grid search. This is computationally expensive. In this paper, the weights are defined based on properties observed while training the model, including the specific batch loss, the average loss, and the variance for each of the losses. An additional advantage is that the defined weights evolve during training, instead of using static loss weights. In literature, loss weighting is mostly used in a multi-task learning setting, where the different tasks obtain different weights. However, there is a plethora of single-task multi-loss problems that can benefit from automatic loss weighting. In this paper, it is shown that these multi-task approaches do not work on single tasks. Instead, a method is proposed that automatically and dynamically tunes loss weights throughout training specifically for single-task multi-loss problems. The method incorporates a measure of uncertainty to balance the losses. The validity of the approach is shown empirically for different tasks on multiple datasets.
[3]:TRACE: Transform Aggregate and Compose Visiolinguistic Representations for Image Search with Text Feedback
标题:跟踪:转换聚合和合成视觉语言表示,用于图像搜索和文本反馈
作者:Surgan Jandial, Ayush Chopra, Pinkesh Badjatiya, Pranit Chawla, Mausoom Sarkar, Balaji Krishnamurthy
备注:Surgan Jandial, Ayush Chopra and Pinkesh Badjatiya contributed equally to this work
链接:https://arxiv.org/abs/2009.01485
摘要:The ability to efficiently search for images over an indexed database is the cornerstone for several user experiences. Incorporating user feedback, through multi-modal inputs provide flexible and interaction to serve fine-grained specificity in requirements. We specifically focus on text feedback, through descriptive natural language queries. Given a reference image and textual user feedback, our goal is to retrieve images that satisfy constraints specified by both of these input modalities. The task is challenging as it requires understanding the textual semantics from the text feedback and then applying these changes to the visual representation. To address these challenges, we propose a novel architecture TRACE which contains a hierarchical feature aggregation module to learn the composite visio-linguistic representations. TRACE achieves the SOTA performance on 3 benchmark datasets: FashionIQ, Shoes, and Birds-to-Words, with an average improvement of at least ~5.7%, ~3%, and ~5% respectively in R@K metric. Our extensive experiments and ablation studies show that TRACE consistently outperforms the existing techniques by significant margins both quantitatively and qualitatively.
[4]:Tasks Integrated Networks: Joint Detection and Retrieval for Image Search
标题:任务集成网络:图像搜索的联合检测与检索
作者:Lei Zhang, Zhenwei He, Yi Yang, Liang Wang, Xinbo Gao
备注:To appear in IEEE TPAMI, 18 pages
链接:https://arxiv.org/abs/2009.01438
摘要:The traditional object retrieval task aims to learn a discriminative feature representation with intra-similarity and inter-dissimilarity, which supposes that the objects in an image are manually or automatically pre-cropped exactly. However, in many real-world searching scenarios (e.g., video surveillance), the objects (e.g., persons, vehicles, etc.) are seldom accurately detected or annotated. Therefore, object-level retrieval becomes intractable without bounding-box annotation, which leads to a new but challenging topic, i.e. image-level search. In this paper, to address the image search issue, we first introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A Siamese architecture and an on-line pairing strategy for similar and dissimilar objects in the given images are designed. 2) A novel on-line pairing (OLP) loss is introduced with a dynamic feature dictionary, which alleviates the multi-task training stagnation problem, by automatically generating a number of negative pairs to restrict the positives. 3) A hard example priority (HEP) based softmax loss is proposed to improve the robustness of classification task by selecting hard categories. With the philosophy of divide and conquer, we further propose an improved I-Net, called DC-I-Net, which makes two new contributions: 1) two modules are tailored to handle different tasks separately in the integrated framework, such that the task specification is guaranteed. 2) A class-center guided HEP loss (C2HEP) by exploiting the stored class centers is proposed, such that the intra-similarity and inter-dissimilarity can be captured for ultimate retrieval. Extensive experiments on famous image-level search oriented benchmark datasets demonstrate that the proposed DC-I-Net outperforms the state-of-the-art tasks-integrated and tasks-separated image search models.
[5]:Efficiency in Real-time Webcam Gaze Tracking
标题:实时网络摄像机视线跟踪的效率
作者:Amogh Gudi, Xin Li, Jan van Gemert
备注:Awarded Best Paper at European Conference on Computer Vision (ECCV) Workshop on Eye Gaze in AR, VR, and in the Wild (OpenEyes) 2020
链接:https://arxiv.org/abs/2009.01270
摘要:Efficiency and ease of use are essential for practical applications of camera based eye/gaze-tracking. Gaze tracking involves estimating where a person is looking on a screen based on face images from a computer-facing camera. In this paper we investigate two complementary forms of efficiency in gaze tracking: 1. The computational efficiency of the system which is dominated by the inference speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is determined by the tediousness of the mandatory calibration of the gaze-vector to a computer screen. To do so, we evaluate the computational speed/accuracy trade-off for the CNN and the calibration effort/accuracy trade-off for screen calibration. For the CNN, we evaluate the full face, two-eyes, and single eye input. For screen calibration, we measure the number of calibration points needed and evaluate three types of calibration: 1. pure geometry, 2. pure machine learning, and 3. hybrid geometric regression. Results suggest that a single eye input and geometric regression calibration achieve the best trade-off.
NLP方向重复(3篇)
[1]:Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling
标题:序列自适应稀疏元网络及其在自适应语言建模中的应用
作者:Tsendsuren Munkhdalai
备注:9 pages, 4 figures, 2 tables
链接:https://arxiv.org/abs/2009.01803
摘要:Training a deep neural network requires a large amount of single-task data and involves a long time-consuming optimization phase. This is not scalable to complex, realistic environments with new unexpected changes. Humans can perform fast incremental learning on the fly and memory systems in the brain play a critical role. We introduce Sparse Meta Networks -- a meta-learning approach to learn online sequential adaptation algorithms for deep neural networks, by using deep neural networks. We augment a deep neural network with a layer-specific fast-weight memory. The fast-weights are generated sparsely at each time step and accumulated incrementally through time providing a useful inductive bias for online continual adaptation. We demonstrate strong performance on a variety of sequential adaptation scenarios, from a simple online reinforcement learning to a large scale adaptive language modelling.
[2]:Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
标题:Ref-NMS:突破两阶段引用表达式的瓶颈
作者:Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Wei Liu, Shih-Fu Chang
链接:https://arxiv.org/abs/2009.01449
摘要:The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals. Existing two-stage solutions mostly focus on the grounding step, which aims to align the expressions with the proposals. In this paper, we argue that these methods overlook an obvious mismatch between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i.e., expression-agnostic), hoping that the proposals contain all right instances in the expression (i.e., expression-aware). Due to this mismatch, current two-stage methods suffer from a severe performance drop between detected and ground-truth proposals. To this end, we propose Ref-NMS, which is the first method to yield expression-aware proposals at the first stage. Ref-NMS regards all nouns in the expression as critical objects, and introduces a lightweight module to predict a score for aligning each box with a critical object. These scores can guide the NMSoperation to filter out the boxes irrelevant to the expression, increasing the recall of critical objects, resulting in a significantly improved grounding performance. Since Ref-NMS is agnostic to the grounding step, it can be easily integrated into any state-of-the-art two-stage method. Extensive ablation studies on several backbones, benchmarks, and tasks consistently demonstrate the superiority of Ref-NMS.
[3]:Learning to summarize from human feedback
标题:学会从人的反馈中总结
作者:Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano
链接:https://arxiv.org/abs/2009.01325
摘要:As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about---summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. We conduct extensive analyses to understand our human feedback dataset and fine-tuned models. We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want.
中文来自机器翻译,仅供参考。