机器学习方向0211

Posted arXiv Daily

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了机器学习方向0211相关的知识,希望对你有一定的参考价值。

今日 cs.LG方向共计107篇文章。

知识图谱(1篇)


[1]:Information Prediction using Knowledge Graphs for Contextual Malware  Threat Intelligence
标题:基于知识图的恶意软件威胁情报信息预测
作者:Nidhi Rastogi, Sharmishtha Dutta, Ryan Christian, Mohammad Zaki, Alex Gittens, Charu Aggarwal
备注:14 pages
链接:https://arxiv.org/abs/2102.05571
摘要:Large amounts of threat intelligence information about mal-ware attacks are available in disparate, typically unstructured, formats. Knowledge graphs can capture this information and its context using RDF triples represented by entities and relations. Sparse or inaccurate threat information, however, leads to challenges such as incomplete or erroneous triples. Named entity recognition (NER) and relation extraction (RE) models used to populate the knowledge graph cannot fully guaran-tee accurate information retrieval, further exacerbating this problem. This paper proposes an end-to-end approach to generate a Malware Knowledge Graph called MalKG, the first open-source automated knowledge graph for malware threat intelligence. MalKG dataset called MT40K1 contains approximately 40,000 triples generated from 27,354 unique entities and 34 relations. We demonstrate the application of MalKGin predicting missing malware threat intelligence information in the knowledge graph. For ground truth, we manually curate a knowledge graph called MT3K, with 3,027 triples generated from 5,741 unique entities and 22 relations. For entity prediction via a state-of-the-art entity prediction model(TuckER), our approach achieves 80.4 for the hits@10 metric (predicts the top 10 options for missing entities in the knowledge graph), and 0.75 for the MRR (mean reciprocal rank). We also propose a framework to automate the extraction of thousands of entities and relations into RDF triples, both manually and automatically, at the sentence level from1,100 malware threat intelligence reports and from the com-mon vulnerabilities and exposures (CVE) database.

Graph(2篇)


[1]:On Explainability of Graph Neural Networks via Subgraph Explorations
标题:基于子图探索的图神经网络可解释性研究
作者:Hao Yuan, Haiyang Yu, Jie Wang, Kang Li, Shuiwang Ji
链接:https://arxiv.org/abs/2102.05152
摘要:We consider the problem of explaining the predictions of graph neural networks (GNNs), which otherwise are considered as black boxes. Existing methods invariably focus on explaining the importance of graph nodes or edges but ignore the substructures of graphs, which are more intuitive and human-intelligible. In this work, we propose a novel method, known as SubgraphX, to explain GNNs by identifying important subgraphs. Given a trained GNN model and an input graph, our SubgraphX explains its predictions by efficiently exploring different subgraphs with Monte Carlo tree search. To make the tree search more effective, we propose to use Shapley values as a measure of subgraph importance, which can also capture the interactions among different subgraphs. To expedite computations, we propose efficient approximation schemes to compute Shapley values for graph data. Our work represents the first attempt to explain GNNs via identifying subgraphs explicitly. Experimental results show that our SubgraphX achieves significantly improved explanations, while keeping computations at a reasonable level.

[2]:Node-Level Membership Inference Attacks Against Graph Neural Networks
标题:针对图神经网络的节点级隶属度推理攻击
作者:Xinlei He, Rui Wen, Yixin Wu, Michael Backes, Yun Shen, Yang Zhang
链接:https://arxiv.org/abs/2102.05429
摘要:Many real-world data comes in the form of graphs, such as social networks and protein structure. To fully utilize the information contained in graph data, a new family of machine learning (ML) models, namely graph neural networks (GNNs), has been introduced. Previous studies have shown that machine learning models are vulnerable to privacy attacks. However, most of the current efforts concentrate on ML models trained on data from the Euclidean space, like images and texts. On the other hand, privacy risks stemming from GNNs remain largely unstudied.
In this paper, we fill the gap by performing the first comprehensive analysis of node-level membership inference attacks against GNNs. We systematically define the threat models and propose three node-level membership inference attacks based on an adversary's background knowledge. Our evaluation on three GNN structures and four benchmark datasets shows that GNNs are vulnerable to node-level membership inference even when the adversary has minimal background knowledge. Besides, we show that graph density and feature similarity have a major impact on the attack's success. We further investigate two defense mechanisms and the empirical results indicate that these defenses can reduce the attack performance but with moderate utility loss.

联邦学习(2篇)


[1]:Meta Federated Learning
标题:元联合学习
作者:Omid Aramoon, Pin-Yu Chen, Gang Qu, Yuan Tian
备注:11 pages, 5 figures
链接:https://arxiv.org/abs/2102.05561
摘要:Due to its distributed methodology alongside its privacy-preserving features, Federated Learning (FL) is vulnerable to training time adversarial attacks. In this study, our focus is on backdoor attacks in which the adversary's goal is to cause targeted misclassifications for inputs embedded with an adversarial trigger while maintaining an acceptable performance on the main learning task at hand. Contemporary defenses against backdoor attacks in federated learning require direct access to each individual client's update which is not feasible in recent FL settings where Secure Aggregation is deployed. In this study, we seek to answer the following question, Is it possible to defend against backdoor attacks when secure aggregation is in place?, a question that has not been addressed by prior arts. To this end, we propose Meta Federated Learning (Meta-FL), a novel variant of federated learning which not only is compatible with secure aggregation protocol but also facilitates defense against backdoor attacks. We perform a systematic evaluation of Meta-FL on two classification datasets: SVHN and GTSRB. The results show that Meta-FL not only achieves better utility than classic FL, but also enhances the performance of contemporary defenses in terms of robustness against adversarial attacks.

[2]:Robust Federated Learning with Attack-Adaptive Aggregation
标题:基于攻击自适应聚合的鲁棒联邦学习
作者:Ching Pui Wan, Qifeng Chen
备注:22 pages
链接:https://arxiv.org/abs/2102.05257
摘要:Federated learning is vulnerable to various attacks, such as model poisoning and backdoor attacks, even if some existing defense strategies are used. To address this challenge, we propose an attack-adaptive aggregation strategy to defend against various attacks for robust federated learning. The proposed approach is based on training a neural network with an attention mechanism that learns the vulnerability of federated learning models from a set of plausible attacks. To the best of our knowledge, our aggregation strategy is the first one that can be adapted to defend against various attacks in a data-driven fashion. Our approach has achieved competitive performance in defending model poisoning and backdoor attacks in federated learning tasks on image and text datasets.

对抗样本/GAN(11篇)


[1]:Adversarial Robustness: What fools you makes you stronger
标题:对抗性的坚强:是什么让你变得更坚强
作者:Grzegorz Głuch, Rüdiger Urbanke
备注:15 pages, 1 figure
链接:https://arxiv.org/abs/2102.05475
摘要:We prove an exponential separation for the sample complexity between the standard PAC-learning model and a version of the Equivalence-Query-learning model. We then show that this separation has interesting implications for adversarial robustness. We explore a vision of designing an adaptive defense that in the presence of an attacker computes a model that is provably robust. In particular, we show how to realize this vision in a simplified setting.
In order to do so, we introduce a notion of a strong adversary: he is not limited by the type of perturbations he can apply but when presented with a classifier can repetitively generate different adversarial examples. We explain why this notion is interesting to study and use it to prove the following. There exists an efficient adversarial-learning-like scheme such that for every strong adversary $\mathbf{A}$ it outputs a classifier that (a) cannot be strongly attacked by $\mathbf{A}$, or (b) has error at most $\epsilon$. In both cases our scheme uses exponentially (in $\epsilon$) fewer samples than what the PAC bound requires.

[2]:CIFS: Improving Adversarial Robustness of CNNs via Channel-wise  Importance-based Feature Selection
标题:CIFS:通过基于信道重要性的特征选择提高CNNs的对抗鲁棒性
作者:Hanshu Yan, Jingfeng Zhang, Gang Niu, Jiashi Feng, Vincent Y. F. Tan, Masashi Sugiyama
链接:https://arxiv.org/abs/2102.05311
摘要:We investigate the adversarial robustness of CNNs from the perspective of channel-wise activations. By comparing \textit{non-robust} (normally trained) and \textit{robustified} (adversarially trained) models, we observe that adversarial training (AT) robustifies CNNs by aligning the channel-wise activations of adversarial data with those of their natural counterparts. However, the channels that are \textit{negatively-relevant} (NR) to predictions are still over-activated when processing adversarial data. Besides, we also observe that AT does not result in similar robustness for all classes. For the robust classes, channels with larger activation magnitudes are usually more \textit{positively-relevant} (PR) to predictions, but this alignment does not hold for the non-robust classes. Given these observations, we hypothesize that suppressing NR channels and aligning PR ones with their relevances further enhances the robustness of CNNs under AT. To examine this hypothesis, we introduce a novel mechanism, i.e., \underline{C}hannel-wise \underline{I}mportance-based \underline{F}eature \underline{S}election (CIFS). The CIFS manipulates channels' activations of certain layers by generating non-negative multipliers to these channels based on their relevances to predictions. Extensive experiments on benchmark datasets including CIFAR10 and SVHN clearly verify the hypothesis and CIFS's effectiveness of robustifying CNNs.

[3]:Bayesian Inference with Certifiable Adversarial Robustness
标题:具有可证明对抗鲁棒性的贝叶斯推理
作者:Matthew Wicker, Luca Laurenti, Andrea Patane, Zhoutong Chen, Zheng Zhang, Marta Kwiatkowska
备注:Accepted AISTATS2021
链接:https://arxiv.org/abs/2102.05289
摘要:We consider adversarial training of deep neural networks through the lens of Bayesian learning, and present a principled framework for adversarial training of Bayesian Neural Networks (BNNs) with certifiable guarantees. We rely on techniques from constraint relaxation of non-convex optimisation problems and modify the standard cross-entropy error model to enforce posterior robustness to worst-case perturbations in $\epsilon$-balls around input points. We illustrate how the resulting framework can be combined with methods commonly employed for approximate inference of BNNs. In an empirical investigation, we demonstrate that the presented approach enables training of certifiably robust models on MNIST, FashionMNIST and CIFAR-10 and can also be beneficial for uncertainty calibration. Our method is the first to directly train certifiable BNNs, thus facilitating their deployment in safety-critical applications.

[4]:Finding the Stochastic Shortest Path with Low Regret: The Adversarial  Cost and Unknown Transition Case
标题:寻找低遗憾的随机最短路径:对抗代价和未知转移情形
作者:Liyu Chen, Haipeng Luo
链接:https://arxiv.org/abs/2102.05284
摘要:We make significant progress toward the stochastic shortest path problem with adversarial costs and unknown transition. Specifically, we develop algorithms that achieve $\widetilde{O}(\sqrt{S^2ADT_\star K})$ regret for the full-information setting and $\widetilde{O}(\sqrt{S^3A^2DT_\star K})$ regret for the bandit feedback setting, where $D$ is the diameter, $T_\star$ is the expected hitting time of the optimal policy, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes. Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition. To remedy the gap between our upper bounds and the current best lower bounds constructed via a stochastically oblivious adversary, we also propose algorithms with near-optimal regret for this special case.

[5]:Attentive Gaussian processes for probabilistic time-series generation
标题:概率时间序列生成的注意高斯过程
作者:Kuilin Chen, Chi-Guhn Lee
链接:https://arxiv.org/abs/2102.05208
摘要:The transduction of sequence has been mostly done by recurrent networks, which are computationally demanding and often underestimate uncertainty severely. We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence, which we call the Attentive-GP. The proposed model not only improves the training efficiency by dispensing recurrence and convolutions but also learns the factorized generative distribution with Bayesian representation. However, the presence of the GP precludes the commonly used mini-batch approach to the training of the attention network. Therefore, we develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch, resulting in a scalable training method. The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution. As the algorithm does not assume any specific network architecture, it can be used with a wide range of hybrid models such as neural networks with kernel machine layers in the scarcity of resources for computation and memory.

[6]:Adversarial Perturbations Are Not So Weird: Entanglement of Robust and  Non-Robust Features in Neural Network Classifiers
标题:对抗性扰动并不奇怪:神经网络分类器中鲁棒和非鲁棒特征的纠缠
作者:Jacob M. Springer, Melanie Mitchell, Garrett T. Kenyon
备注:20 pages, 14 figures, 6 tables
链接:https://arxiv.org/abs/2102.05110
摘要:Neural networks trained on visual data are well-known to be vulnerable to often imperceptible adversarial perturbations. The reasons for this vulnerability are still being debated in the literature. Recently Ilyas et al. (2019) showed that this vulnerability arises, in part, because neural network classifiers rely on highly predictive but brittle "non-robust" features. In this paper we extend the work of Ilyas et al. by investigating the nature of the input patterns that give rise to these features. In particular, we hypothesize that in a neural network trained in a standard way, non-robust features respond to small, "non-semantic" patterns that are typically entangled with larger, robust patterns, known to be more human-interpretable, as opposed to solely responding to statistical artifacts in a dataset. Thus, adversarial examples can be formed via minimal perturbations to these small, entangled patterns. In addition, we demonstrate a corollary of our hypothesis: robust classifiers are more effective than standard (non-robust) ones as a source for generating transferable adversarial examples in both the untargeted and targeted settings. The results we present in this paper provide new insight into the nature of the non-robust features responsible for adversarial vulnerability of neural network classifiers.

[7]:Adversarially Robust Classifier with Covariate Shift Adaptation
标题:具有协变量移位自适应的对抗变量鲁棒分类器
作者:Jay Nandy, Sudipan Saha, Wynne Hsu, Mong Li Lee, Xiao Xiang Zhu
备注:36 pages with 12 figures and 15 tables
链接:https://arxiv.org/abs/2102.05096
摘要:Existing adversarially trained models typically perform inference on test examples independently from each other. This mode of testing is unable to handle covariate shift in the test samples. Due to this, the performance of these models often degrades significantly. In this paper, we show that simple adaptive batch normalization (BN) technique that involves re-estimating the batch-normalization parameters during inference, can significantly improve the robustness of these models for any random perturbations, including the Gaussian noise. This simple finding enables us to transform adversarially trained models into randomized smoothing classifiers to produce certified robustness to $\ell_2$ noise. We show that we can achieve $\ell_2$ certified robustness even for adversarially trained models using $\ell_{\infty}$-bounded adversarial examples. We further demonstrate that adaptive BN technique significantly improves robustness against common corruptions, while often enhancing performance against adversarial attacks. This enables us to achieve both adversarial and corruption robustness for the same classifier.

[8]:Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based  on Transfer Learning
标题:语音克隆:一种基于迁移学习的多说话人文本语音合成方法
作者:Giuseppe Ruggiero, Enrico Zovato, Luigi Di Caro, Vincent Pollet
链接:https://arxiv.org/abs/2102.05630
摘要:Deep learning models are becoming predominant in many fields of machine learning. Text-to-Speech (TTS), the process of synthesizing artificial speech from text, is no exception. To this end, a deep neural network is usually trained using a corpus of several hours of recorded speech from a single speaker. Trying to produce the voice of a speaker other than the one learned is expensive and requires large effort since it is necessary to record a new dataset and retrain the model. This is the main reason why the TTS models are usually single speaker. The proposed approach has the goal to overcome these limitations trying to obtain a system which is able to model a multi-speaker acoustic space. This allows the generation of speech audio similar to the voice of different target speakers, even if they were not observed during the training phase.

[9]:On the Existence of Optimal Transport Gradient for Learning Generative  Models
标题:关于学习生成模型的最优运输梯度的存在性
作者:Antoine Houdard, Arthur Leclaire, Nicolas Papadakis, Julien Rabin
链接:https://arxiv.org/abs/2102.05542
摘要:The use of optimal transport cost for learning generative models has become popular with Wasserstein Generative Adversarial Networks (WGAN). Training of WGAN relies on a theoretical background: the calculation of the gradient of the optimal transport cost with respect to the generative model parameters. We first demonstrate that such gradient may not be defined, which can result in numerical instabilities during gradient-based optimization. We address this issue by stating a valid differentiation theorem in the case of entropic regularized transport and specify conditions under which existence is ensured. By exploiting the discrete nature of empirical data, we formulate the gradient in a semi-discrete setting and propose an algorithm for the optimization of the generative model parameters. Finally, we illustrate numerically the advantage of the proposed framework.

[10]:Dompteur: Taming Audio Adversarial Examples
标题:域名:驯服音频对抗的例子
作者:Thorsten Eisenhofer, Lea Schönherr, Joel Frank, Lars Speckemeier, Dorothea Kolossa, Thorsten Holz
链接:https://arxiv.org/abs/2102.05431
摘要:Adversarial examples seem to be inevitable. These specifically crafted inputs allow attackers to arbitrarily manipulate machine learning systems. Even worse, they often seem harmless to human observers. In our digital society, this poses a significant threat. For example, Automatic Speech Recognition (ASR) systems, which serve as hands-free interfaces to many kinds of systems, can be attacked with inputs incomprehensible for human listeners. The research community has unsuccessfully tried several approaches to tackle this problem.
In this paper we propose a different perspective: We accept the presence of adversarial examples against ASR systems, but we require them to be perceivable by human listeners. By applying the principles of psychoacoustics, we can remove semantically irrelevant information from the ASR input and train a model that resembles human perception more closely. We implement our idea in a tool named Dompteur and demonstrate that our augmented system, in contrast to an unmodified baseline, successfully focuses on perceptible ranges of the input signal. This change forces adversarial examples into the audible range, while using minimal computational overhead and preserving benign performance. To evaluate our approach, we construct an adaptive attacker, which actively tries to avoid our augmentations and demonstrate that adversarial examples from this attacker remain clearly perceivable. Finally, we substantiate our claims by performing a hearing test with crowd-sourced human listeners.

[11]:Conditional Versus Adversarial Euler-based Generators For Time Series
标题:基于条件与对抗Euler的时间序列生成器
作者:Carl Remlinger, Joseph Mikael, Romuald Elie
备注:14 page, 9 Figures
链接:https://arxiv.org/abs/2102.05313
摘要:We introduce new generative models for time series based on Euler discretization that do not require any pre-stationarization procedure. Specifically, we develop two GAN based methods, relying on the adaptation of Wasserstein GANs (Arjovsky et al., 2017) and DVD GANs (Clark et al., 2019b) to time series. Alternatively, we consider a conditional Euler Generator (CEGEN) minimizing a distance between the induced conditional densities. In the context of Itô processes, we theoretically validate this approach and demonstrate using the Bures metric that reaching a low loss level provides accurate estimations for both the drift and the volatility terms of the underlying process. Tests on simple models show how the Euler discretization and the use of Wasserstein distance allow the proposed GANs and (more considerably) CEGEN to outperform state-of-the-art Time Series GAN generation( Yoon et al., 2019b) on time structure metrics. In higher dimensions we observe that CEGEN manages to get the correct covariance structures. Finally we illustrate how our model can be combined to a Monte Carlo simulator in a low data context by using a transfer learning technique

Zero/One-Shot、迁移学习、Domain Adaptation(4篇)


[1]:Boosting Template-based SSVEP Decoding by Cross-domain Transfer Learning
标题:基于模板的跨域迁移学习增强SSVEP译码
作者:Kuan-Jung Chiang, Chun-Shu Wei, Masaki Nakanishi, Tzyy-Ping Jung
备注:Mirror version of the manuscript in the Journal of Neural Engineering on IOP Science (this https URL), Journal of Neural Engineering (2020)
链接:https://arxiv.org/abs/2102.05194
摘要:Objective: This study aims to establish a generalized transfer-learning framework for boosting the performance of steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs) by leveraging cross-domain data transferring. Approach: We enhanced the state-of-the-art template-based SSVEP decoding through incorporating a least-squares transformation (LST)-based transfer learning to leverage calibration data across multiple domains (sessions, subjects, and EEG montages). Main results: Study results verified the efficacy of LST in obviating the variability of SSVEPs when transferring existing data across domains. Furthermore, the LST-based method achieved significantly higher SSVEP-decoding accuracy than the standard task-related component analysis (TRCA)-based method and the non-LST naive transfer-learning method. Significance: This study demonstrated the capability of the LST-based transfer learning to leverage existing data across subjects and/or devices with an in-depth investigation of its rationale and behavior in various circumstances. The proposed framework significantly improved the SSVEP decoding accuracy over the standard TRCA approach when calibration data are limited. Its performance in calibration reduction could facilitate plug-and-play SSVEP-based BCIs and further practical applications.

[2]:Locally Adaptive Label Smoothing for Predictive Churn
标题:预测性客户流失的局部自适应标签平滑方法
作者:Dara Bahri, Heinrich Jiang
链接:https://arxiv.org/abs/2102.05140
摘要:Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn} -- disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches -- even when the trained models all attain similar accuracies. Such prediction churn can be very undesirable in practice. In this paper, we present several baselines for reducing churn and show that training on soft labels obtained by adaptively smoothing each example's label based on the example's neighboring labels often outperforms the baselines on churn while improving accuracy on a variety of benchmark classification tasks and model architectures.

[3]:Domain Invariant Representation Learning with Domain Density  Transformations
标题:基于域密度变换的域不变表示学习
作者:A. Tuan Nguyen, Toan Tran, Yarin Gal, Atilim Gunes Baydin
链接:https://arxiv.org/abs/2102.05082
摘要:Domain generalization refers to the problem where we aim to train a model on data from a set of source domains so that the model can generalize to unseen target domains. Naively training a model on the aggregate set of data (pooled from all source domains) has been shown to perform suboptimally, since the information learned by that model might be domain-specific and generalize imperfectly to target domains. To tackle this problem, a predominant approach is to find and learn some domain-invariant information in order to use it for the prediction task. In this paper, we propose a theoretically grounded method to learn a domain-invariant representation by enforcing the representation network to be invariant under all transformation functions among domains. We also show how to use generative adversarial networks to learn such domain transformations to implement our method in practice. We demonstrate the effectiveness of our method on several widely used datasets for the domain generalization problem, on all of which we achieve competitive results with state-of-the-art models.

[4]:Emotion Transfer Using Vector-Valued Infinite Task Learning
标题:基于向量值无限任务学习的情绪传递
作者:Alex Lambert, Sanjeel Parekh, Zoltán Szabó, Florence d'Alché-Buc
备注:17 pages, 10 figures
链接:https://arxiv.org/abs/2102.05075
摘要:Style transfer is a significant problem of machine learning with numerous successful applications. In this work, we present a novel style transfer framework building upon infinite task learning and vector-valued reproducing kernel Hilbert spaces. We instantiate the idea in emotion transfer where the goal is to transform facial images to different target emotions. The proposed approach provides a principled way to gain explicit control over the continuous style space. We demonstrate the efficiency of the technique on popular facial emotion benchmarks, achieving low reconstruction cost and high emotion classification accuracy.

强化学习(7篇)


[1]:Improving Model-Based Reinforcement Learning with Internal State  Representations through Self-Supervision
标题:通过自我监督改进基于模型的内部状态表征强化学习
作者:Julien Scholz, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter
链接:https://arxiv.org/abs/2102.05599
摘要:Using a model of the environment, reinforcement learning agents can plan their future moves and achieve superhuman performance in board games like Chess, Shogi, and Go, while remaining relatively sample-efficient. As demonstrated by the MuZero Algorithm, the environment model can even be learned dynamically, generalizing the agent to many more tasks while at the same time achieving state-of-the-art performance. Notably, MuZero uses internal state representations derived from real environment states for its predictions. In this paper, we bind the model's predicted internal state representation to the environment state via two additional terms: a reconstruction model loss and a simpler consistency loss, both of which work independently and unsupervised, acting as constraints to stabilize the learning process. Our experiments show that this new integration of reconstruction model loss and simpler consistency loss provide a significant performance increase in OpenAI Gym environments. Our modifications also enable self-supervised pretraining for MuZero, so the algorithm can learn about environment dynamics before a goal is made available.

[2]:Non-stationary Reinforcement Learning without Prior Knowledge: An  Optimal Black-box Approach
标题:无先验知识的非平稳强化学习:一种最优黑盒方法
作者:Chen-Yu Wei, Haipeng Luo
链接:https://arxiv.org/abs/2102.05406
摘要:We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly without any prior knowledge on the degree of non-stationarity. By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized algorithms, but also significantly improves the state of the art for linear bandits, episodic MDPs, and infinite-horizon MDPs in various ways. Specifically, in most cases our algorithm achieves the optimal dynamic regret $\widetilde{\mathcal{O}}(\min\{\sqrt{LT}, \Delta^{1/3}T^{2/3}\})$ where $T$ is the number of rounds and $L$ and $\Delta$ are the number and amount of changes of the world respectively, while previous works only obtain suboptimal bounds and/or require the knowledge of $L$ and $\Delta$.

[3]:Risk-Averse Offline Reinforcement Learning
标题:风险规避离线强化学习
作者:Núria Armengol Urpí, Sebastian Curi, Andreas Krause
链接:https://arxiv.org/abs/2102.05371
摘要:Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due to the risk associated to exploration. Thus, the agent can only use data previously collected by safe policies. While previous work considers optimizing the average performance using offline data, we focus on optimizing a risk-averse criteria, namely the CVaR. In particular, we present the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting. We show that O-RAAC learns policies with higher CVaR than risk-neutral approaches in different robot control tasks. Furthermore, considering risk-averse criteria guarantees distributional robustness of the average performance with respect to particular distribution shifts. We demonstrate empirically that in the presence of natural distribution-shifts, O-RAAC learns policies with good average performance.

[4]:Simple Agent, Complex Environment: Efficient Reinforcement Learning with  Agent State
标题:简单Agent,复杂环境:Agent状态下的高效强化学习
作者:Shi Dong, Benjamin Van Roy, Zhengyuan Zhou
链接:https://arxiv.org/abs/2102.05261
摘要:We design a simple reinforcement learning agent that, with a specification only of agent state dynamics and a reward function, can operate with some degree of competence in any environment. The agent maintains only visitation counts and value estimates for each agent-state-action pair. The value function is updated incrementally in response to temporal differences and optimistic boosts that encourage exploration. The agent executes actions that are greedy with respect to this value function. We establish a regret bound demonstrating convergence to near-optimal per-period performance, where the time taken to achieve near-optimality is polynomial in the number of agent states and actions, as well as the reward mixing time of the best policy within the reference policy class, which is comprised of those that depend on history only through agent state. Notably, there is no further dependence on the number of environment states or mixing times associated with other policies or statistics of history. Our result sheds light on the potential benefits of (deep) representation learning, which has demonstrated the capability to extract compact and relevant features from high-dimensional interaction histories.

[5]:Scheduling the NASA Deep Space Network with Deep Reinforcement Learning
标题:基于深度强化学习的NASA深空网络调度
作者:Edwin Goh, Hamsa Shwetha Venkataram, Mark Hoffmann, Mark Johnston, Brian Wilson
链接:https://arxiv.org/abs/2102.05167
摘要:With three complexes spread evenly across the Earth, NASA's Deep Space Network (DSN) is the primary means of communications as well as a significant scientific instrument for dozens of active missions around the world. A rapidly rising number of spacecraft and increasingly complex scientific instruments with higher bandwidth requirements have resulted in demand that exceeds the network's capacity across its 12 antennae. The existing DSN scheduling process operates on a rolling weekly basis and is time-consuming; for a given week, generation of the final baseline schedule of spacecraft tracking passes takes roughly 5 months from the initial requirements submission deadline, with several weeks of peer-to-peer negotiations in between. This paper proposes a deep reinforcement learning (RL) approach to generate candidate DSN schedules from mission requests and spacecraft ephemeris data with demonstrated capability to address real-world operational constraints. A deep RL agent is developed that takes mission requests for a given week as input, and interacts with a DSN scheduling environment to allocate tracks such that its reward signal is maximized. A comparison is made between an agent trained using Proximal Policy Optimization and its random, untrained counterpart. The results represent a proof-of-concept that, given a well-shaped reward signal, a deep RL agent can learn the complex heuristics used by experts to schedule the DSN. A trained agent can potentially be used to generate candidate schedules to bootstrap the scheduling process and thus reduce the turnaround cycle for DSN scheduling.

[6]:Reinforcement Learning for Optimized Beam Training in Multi-Hop  Terahertz Communications
标题:基于强化学习的多跳太赫兹通信波束优化训练
作者:Arian Ahmadi, Omid Semiari
备注:2021 IEEE International Conference on Communications (ICC): Mobile and Wireless Networks Symposium
链接:https://arxiv.org/abs/2102.05269
摘要:Communication at terahertz (THz) frequency bands is a promising solution for achieving extremely high data rates in next-generation wireless networks. While the THz communication is conventionally envisioned for short-range wireless applications due to the high atmospheric absorption at THz frequencies, multi-hop directional transmissions can be enabled to extend the communication range. However, to realize multi-hop THz communications, conventional beam training schemes, such as exhaustive search or hierarchical methods with a fixed number of training levels, can lead to a very large time overhead. To address this challenge, in this paper, a novel hierarchical beam training scheme with dynamic training levels is proposed to optimize the performance of multi-hop THz links. In fact, an optimization problem is formulated to maximize the overall spectral efficiency of the multi-hop THz link by dynamically and jointly selecting the number of beam training levels across all the constituent single-hop links. To solve this problem in presence of unknown channel state information, noise, and path loss, a new reinforcement learning solution based on the multi-armed bandit (MAB) is developed. Simulation results show the fast convergence of the proposed scheme in presence of random channels and noise. The results also show that the proposed scheme can yield up to 75% performance gain, in terms of spectral efficiency, compared to the conventional hierarchical beam training with a fixed number of training levels.

[7]:PyTorchRL: Modular and Distributed Reinforcement Learning in PyTorch
标题:PyTorchRL:PyTorch中的模块化分布式强化学习
作者:Albert Bou, Gianni De Fabritiis
备注:8 pages, 5 figures
链接:https://arxiv.org/abs/2007.02622
摘要:Deep reinforcement learning (RL) has proved successful at solving challenging environments but often requires scaling to large sampling and computing resources. Furthermore, advancing RL requires tools that are flexible enough to easily prototype new methods, yet avoiding impractically slow experimental turnaround times. To this end, we present PyTorchRL, a PyTorch-based library for RL with a modular design that allows composing agents from a set of reusable and easily extendable modules. Additionally, PyTorchRL permits the definition of distributed training architectures with flexibility and independence of the Agent components. In combination, these two features can accelerate the pace at which ideas are implemented and tested, simplifying research and enabling to tackle more challenging RL problems. We present several interesting use-cases of PyTorchRL and showcase the library by obtaining the highest to-date test performance on the Obstacle Tower Unity3D challenge environment.

主动学习(2篇)


[1]:Improved Algorithms for Efficient Active Learning Halfspaces with  Massart and Tsybakov noise
标题:带Massart和Tsybakov噪声的有效主动学习半空间的改进算法
作者:Chicheng Zhang, Yinan Li
备注:32 pages
链接:https://arxiv.org/abs/2102.05312
摘要:We develop a computationally-efficient PAC active learning algorithm for $d$-dimensional homogeneous halfspaces that can tolerate Massart noise~\citep{massart2006risk} and Tsybakov noise~\citep{tsybakov2004optimal}. Specialized to the $\eta$-Massart noise setting, our algorithm achieves an information-theoretic optimal label complexity of $\tilde{O}\left( \frac{d}{(1-2\eta)^2} \mathrm{polylog}(\frac1\epsilon) \right)$ under a wide range of unlabeled data distributions (specifically, the family of "structured distributions" defined in~\citet{diakonikolas2020polynomial}). Under the more challenging Tsybakov noise condition, we identify two subfamilies of noise conditions, under which our algorithm achieves computational efficiency and provide label complexity guarantees strictly lower than passive learning algorithms.

[2]:Bounded Memory Active Learning through Enriched Queries
标题:基于丰富查询的有限记忆主动学习
作者:Max Hopkins, Daniel Kane, Shachar Lovett, Michal Moshkovitz
链接:https://arxiv.org/abs/2102.05047
摘要:The explosive growth of easily-accessible unlabeled data has lead to growing interest in active learning, a paradigm in which data-hungry learning algorithms adaptively select informative examples in order to lower prohibitively expensive labeling costs. Unfortunately, in standard worst-case models of learning, the active setting often provides no improvement over non-adaptive algorithms. To combat this, a series of recent works have considered a model in which the learner may ask enriched queries beyond labels. While such models have seen success in drastically lowering label costs, they tend to come at the expense of requiring large amounts of memory. In this work, we study what families of classifiers can be learned in bounded memory. To this end, we introduce a novel streaming-variant of enriched-query active learning along with a natural combinatorial parameter called lossless sample compression that is sufficient for learning not only with bounded memory, but in a query-optimal and computationally efficient manner as well. Finally, we give three fundamental examples of classifier families with small, easy to compute lossless compression schemes when given access to basic enriched queries: axis-aligned rectangles, decision trees, and halfspaces in two dimensions.

Neural Networks(5篇)


[1]:Fast Classification Learning with Neural Networks and Conceptors for  Speech Recognition and Car Driving Maneuvers
标题:基于神经网络和概念的快速分类学习在语音识别和汽车驾驶中的应用
作者:Stefanie Krause, Oliver Otto, Frieder Stolzenburg
备注:17 pages, 6 figures, 6 tables
链接:https://arxiv.org/abs/2102.05588
摘要:Recurrent neural networks are a powerful means in diverse applications. We show that, together with so-called conceptors, they also allow fast learning, in contrast to other deep learning methods. In addition, a relatively small number of examples suffices to train neural networks with high accuracy. We demonstrate this with two applications, namely speech recognition and detecting car driving maneuvers. We improve the state-of-the art by application-specific preparation techniques: For speech recognition, we use mel frequency cepstral coefficients leading to a compact representation of the frequency spectra, and detecting car driving maneuvers can be done without the commonly used polynomial interpolation, as our evaluation suggests.

[2]:MAIN: Multihead-Attention Imputation Networks
标题:主要:多头注意力插补网络
作者:Spyridon Mouselinos, Kyriakos Polymenakos, Antonis Nikitakis, Konstantinos Kyriakopoulos
备注:8 pages, 7 figures
链接:https://arxiv.org/abs/2102.05428
摘要:The problem of missing data, usually absent incurated and competition-standard datasets, is an unfortunate reality for most machine learning models used in industry applications. Recent work has focused on understanding the nature and the negative effects of such phenomena, while devising solutions for optimal imputation of the missing data, using both discriminative and generative approaches. We propose a novel mechanism based on multi-head attention which can be applied effortlessly in any model and achieves better downstream performance without the introduction of the full dataset in any part of the modeling pipeline. Our method inductively models patterns of missingness in the input data in order to increase the performance of the downstream task. Finally, after evaluating our method against baselines for a number of datasets, we found performance gains that tend to be larger in scenarios of high missingness.

[3]:Towards Certifying $\ell_\infty$ Robustness using Neural Networks with  $\ell_\infty$-dist Neurons
标题:用带有$\ell\infty$-dist神经元的神经网络证明$\ell\infty$的鲁棒性
作者:Bohang Zhang, Tianle Cai, Zhou Lu, Di He, Liwei Wang
链接:https://arxiv.org/abs/2102.05363
摘要:It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small $\ell_\infty$-norm bounded adversarial perturbations. Although many attempts have been made, most previous works either can only provide empirical verification of the defense to a particular attack method, or can only develop a certified guarantee of the model robustness in limited scenarios. In this paper, we seek for a new approach to develop a theoretically principled neural network that inherently resists $\ell_\infty$ perturbations. In particular, we design a novel neuron that uses $\ell_\infty$-distance as its basic operation (which we call $\ell_\infty$-dist neuron), and show that any neural network constructed with $\ell_\infty$-dist neurons (called $\ell_{\infty}$-dist net) is naturally a 1-Lipschitz function with respect to $\ell_\infty$-norm. This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs. We also prove that such networks have enough expressive power to approximate any 1-Lipschitz function with robust generalization guarantee. Our experimental results show that the proposed network is promising. Using $\ell_{\infty}$-dist nets as the basic building blocks, we consistently achieve state-of-the-art performance on commonly used datasets: 93.09% certified accuracy on MNIST ($\epsilon=0.3$), 79.23% on Fashion MNIST ($\epsilon=0.1$) and 35.10% on CIFAR-10 ($\epsilon=8/255$).

[4]:Backdoor Scanning for Deep Neural Networks through K-Arm Optimization
标题:基于k臂优化的深度神经网络后门扫描
作者:Guangyu Shen, Yingqi Liu, Guanhong Tao, Shengwei An, Qiuling Xu, Siyuan Cheng, Shiqing Ma, Xiangyu Zhang
链接:https://arxiv.org/abs/2102.05123
摘要:Back-door attack poses a severe threat to deep learning systems. It injects hidden malicious behaviors to a model such that any input stamped with a special pattern can trigger such behaviors. Detecting back-door is hence of pressing need. Many existing defense techniques use optimization to generate the smallest input pattern that forces the model to misclassify a set of benign inputs injected with the pattern to a target label. However, the complexity is quadratic to the number of class labels such that they can hardly handle models with many classes. Inspired by Multi-Arm Bandit in Reinforcement Learning, we propose a K-Arm optimization method for backdoor detection. By iteratively and stochastically selecting the most promising labels for optimization with the guidance of an objective function, we substantially reduce the complexity, allowing to handle models with many classes. Moreover, by iteratively refining the selection of labels to optimize, it substantially mitigates the uncertainty in choosing the right labels, improving detection accuracy. At the time of submission, the evaluation of our method on over 4000 models in the IARPA TrojAI competition from round 1 to the latest round 4 achieves top performance on the leaderboard. Our technique also supersedes three state-of-the-art techniques in terms of accuracy and the scanning time needed.

[5]:Hybrid In-memory Computing Architecture for the Training of Deep Neural  Networks
标题:用于深层神经网络训练的混合内存计算结构
作者:Vinay Joshi, Wangxin He, Jae-sun Seo, Bipin Rajendran
备注:Accepted at ISCAS 2021 for publication
链接:https://arxiv.org/abs/2102.05271
摘要:The cost involved in training deep neural networks (DNNs) on von-Neumann architectures has motivated the development of novel solutions for efficient DNN training accelerators. We propose a hybrid in-memory computing (HIC) architecture for the training of DNNs on hardware accelerators that results in memory-efficient inference and outperforms baseline software accuracy in benchmark tasks. We introduce a weight representation technique that exploits both binary and multi-level phase-change memory (PCM) devices, and this leads to a memory-efficient inference accelerator. Unlike previous in-memory computing-based implementations, we use a low precision weight update accumulator that results in more memory savings. We trained the ResNet-32 network to classify CIFAR-10 images using HIC. For a comparable model size, HIC-based training outperforms baseline network, trained in floating-point 32-bit (FP32) precision, by leveraging appropriate network width multiplier. Furthermore, we observe that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy. We also show that the temporal drift in PCM devices has a negligible effect on post-training inference accuracy for extended periods (year). Finally, our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM, demonstrating the feasibility of this architecture for achieving hardware platforms that can learn in the field.

梯度(1篇)


[1]:Statistical Inference for Polyak-Ruppert Averaged Zeroth-order  Stochastic Gradient Algorithm
标题:Polyak-Ruppert平均零阶随机梯度算法的统计推断
作者:Yanhao Jin, Tesi Xiao, Krishnakumar Balasubramanian
链接:https://arxiv.org/abs/2102.05198
摘要:As machine learning models are deployed in critical applications, it becomes important to not just provide point estimators of the model parameters (or subsequent predictions), but also quantify the uncertainty associated with estimating the model parameters via confidence sets. In the last decade, estimating or training in several machine learning models has become synonymous with running stochastic gradient algorithms. However, computing the stochastic gradients in several settings is highly expensive or even impossible at times. An important question which has thus far not been addressed sufficiently in the statistical machine learning literature is that of equipping zeroth-order stochastic gradient algorithms with practical yet rigorous inferential capabilities. Towards this, in this work, we first establish a central limit theorem for Polyak-Ruppert averaged stochastic gradient algorithm in the zeroth-order setting. We then provide online estimators of the asymptotic covariance matrix appearing in the central limit theorem, thereby providing a practical procedure for constructing asymptotically valid confidence sets (or intervals) for parameter estimation (or prediction) in the zeroth-order setting.

聚类(1篇)


[1]:Dynamic $β$-VAEs for quantifying biodiversity by clustering  optically recorded insect signals
标题:None
作者:Klas Rydhmer, Raghavendra Selvan
备注:9 pages, 6 figures
链接:https://arxiv.org/abs/2102.05526
摘要:While insects are the largest and most diverse group of animals, constituting ca. 80% of all known species, they are difficult to study due to their small size and similarity between species. Conventional monitoring techniques depend on time consuming trapping methods and tedious microscope-based work by skilled experts in order to identify the caught insect specimen at species, or even family, level. Researchers and policy makers are in urgent need of a scalable monitoring tool in order to conserve biodiversity and secure human food production due to the rapid decline in insect numbers. Recent work has aimed for a broader analysis using unsupervised clustering as a proxy for conventional biodiversity measures, such as species richness and species evenness, without actually identifying the species of the detected target.
In order to improve upon existing insect clustering methods, we propose an adaptive variant of the variational autoencoder (VAE) which is capable of clustering data by phylogenetic groups. The proposed Dynamic $\beta$-VAE dynamically adapts the scaling of the reconstruction and regularization loss terms ($\beta$ value) yielding useful latent representations of the input data. We demonstrate the usefulness of the dynamic $\beta$-VAE on optically recorded insect signals from regions of southern Scandinavia to cluster unlabelled targets into possible species. We also demonstrate improved clustering performance in a semi-supervised setting using a small subset of labelled data. These experimental results, in both unsupervised- and semi-supervised settings, with the dynamic $\beta$-VAE are promising and, in the near future, can be deployed to monitor insects and conserve the rapidly declining insect biodiversity.

其他(52篇)


[1]:Energy-Harvesting Distributed Machine Learning
标题:能量收集分布式机器学习
作者:Basak Guler, Aylin Yener
备注:6 pages, 1 figure
链接:https://arxiv.org/abs/2102.05639
摘要:This paper provides a first study of utilizing energy harvesting for sustainable machine learning in distributed networks. We consider a distributed learning setup in which a machine learning model is trained over a large number of devices that can harvest energy from the ambient environment, and develop a practical learning framework with theoretical convergence guarantees. We demonstrate through numerical experiments that the proposed framework can significantly outperform energy-agnostic benchmarks. Our framework is scalable, requires only local estimation of the energy statistics, and can be applied to a wide range of distributed training settings, including machine learning in wireless networks, edge computing, and mobile internet of things.

[2]:Agnostic Proper Learning of Halfspaces under Gaussian Marginals
标题:高斯边缘下半空间的不可知正确学习
作者:Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis
链接:https://arxiv.org/abs/2102.05629
摘要:We study the problem of agnostically learning halfspaces under the Gaussian distribution. Our main result is the {\em first proper} learning algorithm for this problem whose sample complexity and computational complexity qualitatively match those of the best known improper agnostic learner. Building on this result, we also obtain the first proper polynomial-time approximation scheme (PTAS) for agnostically learning homogeneous halfspaces. Our techniques naturally extend to agnostically learning linear models with respect to other non-linear activations, yielding in particular the first proper agnostic algorithm for ReLU regression.

[3]:NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series  Forecasting
标题:NAST:用于时间序列预测的非自回归时空变换器
作者:Kai Chen, Guang Chen, Dan Xu, Lijun Zhang, Yuyao Huang, Alois Knoll
备注:10 pages, 4 figures
链接:https://arxiv.org/abs/2102.05624
摘要:Although Transformer has made breakthrough success in widespread domains especially in Natural Language Processing (NLP), applying it to time series forecasting is still a great challenge. In time series forecasting, the autoregressive decoding of canonical Transformer models could introduce huge accumulative errors inevitably. Besides, utilizing Transformer to deal with spatial-temporal dependencies in the problem still faces tough difficulties.~To tackle these limitations, this work is the first attempt to propose a Non-Autoregressive Transformer architecture for time series forecasting, aiming at overcoming the time delay and accumulative error issues in the canonical Transformer. Moreover, we present a novel spatial-temporal attention mechanism, building a bridge by a learned temporal influence map to fill the gaps between the spatial and temporal attention, so that spatial and temporal dependencies can be processed integrally. Empirically, we evaluate our model on diversified ego-centric future localization datasets and demonstrate state-of-the-art performance on both real-time and accuracy.

[4]:Addressing the Topological Defects of Disentanglement via Distributed  Operators
标题:用分布算子解决解纠缠的拓扑缺陷
作者:Diane Bouchacourt, Mark Ibrahim, Stéphane Deny
链接:https://arxiv.org/abs/2102.05623
摘要:A core challenge in Machine Learning is to learn to disentangle natural factors of variation in data (e.g. object shape vs. pose). A popular approach to disentanglement consists in learning to map each of these factors to distinct subspaces of a model's latent representation. However, this approach has shown limited empirical success to date. Here, we show that, for a broad family of transformations acting on images--encompassing simple affine transformations such as rotations and translations--this approach to disentanglement introduces topological defects (i.e. discontinuities in the encoder). Motivated by classical results from group representation theory, we study an alternative, more flexible approach to disentanglement which relies on distributed latent operators, potentially acting on the entire latent space. We theoretically and empirically demonstrate the effectiveness of this approach to disentangle affine transformations. Our work lays a theoretical foundation for the recent success of a new generation of models using distributed operators for disentanglement.

[5]:Personalization for Web-based Services using Offline Reinforcement  Learning
标题:基于离线强化学习的Web服务个性化
作者:Pavlos Athanasios Apostolopoulos, Zehui Wang, Hanson Wang, Chad Zhou, Kittipat Virochsiri, Norm Zhou, Igor L. Markov
备注:9 pages, 8 figures, 3 tables
链接:https://arxiv.org/abs/2102.05612
摘要:Large-scale Web-based services present opportunities for improving UI policies based on observed user interactions. We address challenges of learning such policies through model-free offline Reinforcement Learning (RL) with off-policy training. Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives. We articulate practical challenges, compare several ML techniques, provide insights on training and evaluation of RL models, and discuss generalizations.

[6]:Systematic Generalization for Predictive Control in Multivariate Time  Series
标题:多变量时间序列预测控制的系统推广
作者:Hritik Bansal, Gantavya Bhatt, Pankaj Malhotra, Prathosh A.P
备注:8 pages, 8 figures, 2 tables
链接:https://arxiv.org/abs/2102.05602
摘要:Prior work has focused on evaluating the ability of neural networks to reason about novel combinations from known components, an intrinsic property of human cognition. In this work, we aim to study systematic generalization in predicting future state trajectories of a dynamical system, conditioned on past states' trajectory (dependent variables), past and future actions (control variables). In our context, systematic generalization implies that a good model should perform well on all new combinations of future actions after being trained on all of them, but only on a limited set of their combinations. For models to generalize out-of-distribution to unseen action combinations, they should reason about the states and their dependency relation with the applied actions. We conduct a rigorous study of useful inductive biases that learn to predict the trajectories up to large horizons well, and capture true dependency relations between the states and the controls through our synthetic setup, and simulated data from electric motors.

[7]:An Optimal Witness Function for Two-Sample Testing
标题:双样本检验的最优见证函数
作者:Jonas M. Kübler, Wittawat Jitkrittum, Bernhard Schölkopf, Krikamol Muandet
备注:Under Review - Code available upon personal request
链接:https://arxiv.org/abs/2102.05573
摘要:We propose data-dependent test statistics based on a one-dimensional witness function, which we call witness two-sample tests (WiTS tests). We first optimize the witness function by maximizing an asymptotic test-power objective and then use as the test statistic the difference in means of the witness evaluated on two held-out test samples. When the witness function belongs to a reproducing kernel Hilbert space, we show that the optimal witness is given via kernel Fisher discriminant analysis, whose solution we compute in closed form. We show that the WiTS test based on a characteristic kernel is consistent against any fixed alternative. Our experiments demonstrate that the WiTS test can achieve higher test power than existing two-sample tests with optimized kernels, suggesting that learning a high- or infinite-dimensional representation of the data may not be necessary for two-sample testing. The proposed procedure works beyond kernel methods, allowing practitioners to apply it within their preferred machine learning framework.

[8]:Learning Equational Theorem Proving
标题:学习等式定理证明
作者:Jelle Piepenbrock, Tom Heskes, Mikoláš Janota, Josef Urban
备注:17 pages, 4 figures
链接:https://arxiv.org/abs/2102.05547
摘要:We develop Stratified Shortest Solution Imitation Learning (3SIL) to learn equational theorem proving in a deep reinforcement learning (RL) setting. The self-trained models achieve state-of-the-art performance in proving problems generated by one of the top open conjectures in quasigroup theory, the Abelian Inner Mapping (AIM) Conjecture. To develop the methods, we first use two simpler arithmetic rewriting tasks that share tree-structured proof states and sparse rewards with the AIM problems. On these tasks, 3SIL is shown to significantly outperform several established RL and imitation learning methods. The final system is then evaluated in a standalone and cooperative mode on the AIM problems. The standalone 3SIL-trained system proves in 60 seconds more theorems (70.2%) than the complex, hand-engineered Waldmeister system (65.5%). In the cooperative mode, the final system is combined with the Prover9 system, proving in 2 seconds what standalone Prover9 proves in 60 seconds.

[9]:Detecting corruption in single-bidder auctions via positive-unlabelled  learning
标题:基于正非标记学习的单投标人拍卖中的腐败检测
作者:Natalya Goryunova, Artem Baklanov, Egor Ianovski
链接:https://arxiv.org/abs/2102.05523
摘要:In research and policy-making guidelines, the single-bidder rate is a commonly used proxy of corruption in public procurement used but ipso facto this is not evidence of a corrupt auction, but an uncompetitive auction. And while an uncompetitive auction could arise due to a corrupt procurer attempting to conceal the transaction, but it could also be a result of geographic isolation, monopolist presence, or other structural factors. In this paper we use positive-unlabelled classification to attempt to separate public procurement auctions in the Russian Federation into auctions that are probably fair, and those that are suspicious.

[10]:On Minibatch Noise: Discrete-Time SGD, Overparametrization, and Bayes
标题:关于小批量噪声:离散时间SGD、过参数化和Bayes
作者:Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda
备注:The first two authors contributed equally
链接:https://arxiv.org/abs/2102.05375
摘要:The noise in stochastic gradient descent (SGD), caused by minibatch sampling, remains poorly understood despite its enormous practical importance in offering good training efficiency and generalization ability. In this work, we study the minibatch noise in SGD. Motivated by the observation that minibatch sampling does not always cause a fluctuation, we set out to find the conditions that cause minibatch noise to emerge. We first derive the analytically solvable results for linear regression under various settings, which are compared to the commonly used approximations that are used to understand SGD noise. We show that some degree of mismatch between model and data complexity is needed in order for SGD to "cause" a noise, and that such mismatch may be due to the existence of static noise in the labels, in the input, the use of regularization, or underparametrization. Our results motivate a more accurate general formulation to describe minibatch noise.

[11]:GuiltyWalker: Distance to illicit nodes in the Bitcoin network
标题:GuiltyWalker:比特币网络中非法节点的距离
作者:Catarina Oliveira, João Torres, Maria Inês Silva, David Aparício, João Tiago Ascensão, Pedro Bizarro
备注:6 pages, 3 figures
链接:https://arxiv.org/abs/2102.05373
摘要:Money laundering is a global phenomenon with wide-reaching social and economic consequences. Cryptocurrencies are particularly susceptible due to the lack of control by authorities and their anonymity. Thus, it is important to develop new techniques to detect and prevent illicit cryptocurrency transactions. In our work, we propose new features based on the structure of the graph and past labels to boost the performance of machine learning methods to detect money laundering. Our method, GuiltyWalker, performs random walks on the bitcoin transaction graph and computes features based on the distance to illicit transactions. We combine these new features with features proposed by Weber et al. and observe an improvement of about 5pp regarding illicit classification. Namely, we observe that our proposed features are particularly helpful during a black market shutdown, where the algorithm by Weber et al. was low performing.

[12]:Simple and Near-Optimal MAP Inference for Nonsymmetric DPPs
标题:非对称DPP的简单近似最优MAP推断
作者:Nima Anari, Thuy-Duong Vuong
链接:https://arxiv.org/abs/2102.05347
摘要:Determinantal point processes (DPPs) are widely popular probabilistic models used in machine learning to capture diversity in random subsets of items. While traditional DPPs are defined by a symmetric kernel matrix, recent work has shown a significant increase in the modeling power and applicability of models defined by nonsymmetric kernels, where the model can capture interactions that go beyond diversity. We study the problem of maximum a posteriori (MAP) inference for determinantal point processes defined by a nonsymmetric positive semidefinite matrix (NDPPs), where the goal is to find the maximum $k\times k$ principal minor of the kernel matrix $L$. We obtain the first multiplicative approximation guarantee for this problem using local search, a method that has been previously applied to symmetric DPPs. Our approximation factor of $k^{O(k)}$ is nearly tight, and we show theoretically and experimentally that it compares favorably to the state-of-the-art methods for this problem that are based on greedy maximization. The main new insight enabling our improved approximation factor is that we allow local search to update up to two elements of the solution in each iteration, and we show this is necessary to have any multiplicative approximation guarantee.

[13]:On PyTorch Implementation of Density Estimators for von Mises-Fisher and  Its Mixture
标题:von-Mises-Fisher及其混合物密度估计的PyTorch实现
作者:Minyoung Kim
链接:https://arxiv.org/abs/2102.05340
摘要:The von Mises-Fisher (vMF) is a well-known density model for directional random variables. The recent surge of the deep embedding methodologies for high-dimensional structured data such as images or texts, aimed at extracting salient directional information, can make the vMF model even more popular. In this article, we will review the vMF model and its mixture, provide detailed recipes of how to train the models, focusing on the maximum likelihood estimators, in Python/PyTorch. In particular, implementation of vMF typically suffers from the notorious numerical issue of the Bessel function evaluation in the density normalizer, especially when the dimensionality is high, and we address the issue using the MPMath library that supports arbitrary precision. For the mixture learning, we provide both minibatch-based large-scale SGD learning, as well as the EM algorithm which is a full batch estimator. For each estimator/methodology, we test our implementation on some synthetic data, while we also demonstrate the use case in a more realistic scenario of image clustering. Our code is publicly available inthis https URL.

[14]:The importance of understanding instance-level noisy labels
标题:理解实例级标签的重要性
作者:Yang Liu
链接:https://arxiv.org/abs/2102.05336
摘要:This paper aims to provide understandings for the effect of an over-parameterized model, e.g. a deep neural network, memorizing instance-dependent noisy labels. We first quantify the harms caused by memorizing noisy instances from different spectra of the sample distribution. We then analyze how several popular solutions for learning with noisy labels mitigate this harm at the instance-level. Our analysis reveals new understandings for when these approaches work, and provides theoretical justifications for previously reported empirical observations. A key aspect of the analysis is its focus on each training instance.

[15]:Forecasting Nonnegative Time Series via Sliding Mask Method (SMM) and  Latent Clustered Forecast (LCF)
标题:用滑动掩模法(SMM)和潜在聚类预测法(LCF)预测非负时间序列
作者:Yohann de Castro, Luca Mencarelli
链接:https://arxiv.org/abs/2102.05314
摘要:We consider nonnegative time series forecasting framework. Based on recent advances in Nonnegative Matrix Factorization (NMF) and Archetypal Analysis, we introduce two procedures referred to as Sliding Mask Method (SMM) and Latent Clustered Forecast (LCF). SMM is a simple and powerful method based on time window prediction using Completion of Nonnegative Matrices. This new procedure combines low nonnegative rank decomposition and matrix completion where the hidden values are to be forecasted. LCF is two stage: it leverages archetypal analysis for dimension reduction and clustering of time series, then it uses any black-box supervised forecast solver on the clustered latent representation. Theoretical guarantees on uniqueness and robustness of the solution of NMF Completion-type problems are also provided for the first time. Finally, numerical experiments on real-world and synthetic data-set confirms forecasting accuracy for both the methodologies.

[16]:Inductive Granger Causal Modeling for Multivariate Time Series
标题:多元时间序列的归纳Granger因果模型
作者:Yunfei Chu, Xiaowei Wang, Jianxin Ma, Kunyang Jia, Jingren Zhou, Hongxia Yang
备注:6 pages, 6 figures
链接:https://arxiv.org/abs/2102.05298
摘要:Granger causal modeling is an emerging topic that can uncover Granger causal relationship behind multivariate time series data. In many real-world systems, it is common to encounter a large amount of multivariate time series data collected from different individuals with sharing commonalities. However, there are ongoing concerns regarding Granger causality's applicability in such large scale complex scenarios, presenting both challenges and opportunities for Granger causal structure reconstruction. Existing methods usually train a distinct model for each individual, suffering from inefficiency and over-fitting issues. To bridge this gap, we propose an Inductive GRanger cAusal modeling (InGRA) framework for inductive Granger causality learning and common causal structure detection on multivariate time series, which exploits the shared commonalities underlying the different individuals. In particular, we train one global model for individuals with different Granger causal structures through a novel attention mechanism, called prototypical Granger causal attention. The model can detect common causal structures for different individuals and infer Granger causal structures for newly arrived individuals. Extensive experiments, as well as an online A/B test on an E-commercial advertising platform, demonstrate the superior performances of InGRA.

[17]:An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear  Bandits
标题:一种有效的约束线性强盗悲观-乐观算法
作者:Xin Liu, Bin Li, Pengyi Shi, Lei Ying
链接:https://arxiv.org/abs/2102.05295
摘要:This paper considers stochastic linear bandits with general constraints. The objective is to maximize the expected cumulative reward over horizon $T$ subject to a set of constraints in each round $\tau\leq T$. We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects. First, the algorithm yields $\tilde{\cal O}\left(\left(\frac{K^{1.5}}{\delta^2}+d\right)\sqrt{\tau}\right)$ (pseudo) regret in round $\tau\leq T,$ where $K$ is the number of constraints, $d$ is the dimension of the reward feature space, and $\delta$ is a Slater's constant; and zero constraint violation in any round $\tau>\tau',$ where $\tau'$ is independent of horizon $T.$ Second, the algorithm is computationally efficient. Our algorithm is based on the primal-dual approach in optimization, and includes two components. The primal component is similar to unconstrained stochastic linear bandits (our algorithm uses the linear upper confidence bound algorithm (LinUCB)). The computational complexity of the dual component depends on the number of constraints, and is independent of sizes of the contextual space, the action space, and even the feature space. So the overall computational complexity of our algorithm is similar to the linear UCB for unconstrained stochastic linear bandits.

[18]:Clusterability as an Alternative to Anchor Points When Learning with  Noisy Labels
标题:使用噪声标签学习时,聚类性作为锚定点的替代方法
作者:Zhaowei Zhu, Yiwen Song, Yang Liu
链接:https://arxiv.org/abs/2102.05291
摘要:The knowledge of the label noise transition matrix, characterizing the probabilities of a training instance being wrongly annotated, is crucial to designing popular solutions to learning with noisy labels, including loss correction and loss reweighting approaches. Existing works heavily rely on the existence of "anchor points" or their approximates, defined as instances that belong to a particular class almost surely. Nonetheless, finding anchor points remains a non-trivial task, and the estimation accuracy is also often throttled by the number of available anchor points. In this paper, we propose an alternative option to the above task. Our main contribution is the discovery of an efficient estimation procedure based on a clusterability condition. We prove that with clusterable representations of features, using up to third-order consensuses of noisy labels among neighbor representations is sufficient to estimate a unique transition matrix. Compared with methods using anchor points, our approach uses substantially more instances and benefits from a much better sample complexity. We demonstrate the estimation accuracy and advantages of our estimates using both synthetic noisy labels (on CIFAR-10/100) and real human-level noisy labels (on Clothing1M and our self-collected human-annotated CIFAR-10).

[19]:Stability of SGD: Tightness Analysis and Improved Bounds
标题:SGD的稳定性:紧性分析和改进的界
作者:Yikai Zhang, Wenjia Zhang, Sammy Bald, Vamsi Pingali, Chao Chen, Mayank Goswami
链接:https://arxiv.org/abs/2102.05274
摘要:Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from the properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of [18] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give a novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks.

[20]:Regression Oracles and Exploration Strategies for Short-Horizon  Multi-Armed Bandits
标题:短视野多臂土匪的回归预言与探索策略
作者:Robert C. Gray, Jichen Zhu, Santiago Ontañón
备注:8 pages
链接:https://arxiv.org/abs/2102.05263
摘要:This paper explores multi-armed bandit (MAB) strategies in very short horizon scenarios, i.e., when the bandit strategy is only allowed very few interactions with the environment. This is an understudied setting in the MAB literature with many applications in the context of games, such as player modeling. Specifically, we pursue three different ideas. First, we explore the use of regression oracles, which replace the simple average used in strategies such as epsilon-greedy with linear regression models. Second, we examine different exploration patterns such as forced exploration phases. Finally, we introduce a new variant of the UCB1 strategy called UCBT that has interesting properties and no tunable parameters. We present experimental results in a domain motivated by exergames, where the goal is to maximize a player's daily steps. Our results show that the combination of epsilon-greedy or epsilon-decreasing with regression oracles outperforms all other tested strategies in the short horizon setting.

[21]:Memory-Associated Differential Learning
标题:记忆相关差异学习
作者:Yi Luo, Aiguo Chen, Bei Hui, Ke Yan
备注:8 pages, 3 figures, 4 tables
链接:https://arxiv.org/abs/2102.05246
摘要:Conventional Supervised Learning approaches focus on the mapping from input features to output labels. After training, the learnt models alone are adapted onto testing features to predict testing labels in isolation, with training data wasted and their associations ignored. To take full advantage of the vast number of training data and their associations, we propose a novel learning paradigm called Memory-Associated Differential (MAD) Learning. We first introduce an additional component called Memory to memorize all the training data. Then we learn the differences of labels as well as the associations of features in the combination of a differential equation and some sampling methods. Finally, in the evaluating phase, we predict unknown labels by inferencing from the memorized facts plus the learnt differences and associations in a geometrically meaningful manner. We gently build this theory in unary situations and apply it on Image Recognition, then extend it into Link Prediction as a binary situation, in which our method outperforms strong state-of-the-art baselines on three citation networks and ogbl-ddi dataset.

[22]:Patterns, predictions, and actions: A story about machine learning
标题:模式、预测和行动:一个关于机器学习的故事
作者:Moritz Hardt, Benjamin Recht
链接:https://arxiv.org/abs/2102.05242
摘要:This graduate textbook on machine learning tells a story of how patterns in data support predictions and consequential actions. Starting with the foundations of decision making, we cover representation, optimization, and generalization as the constituents of supervised learning. A chapter on datasets as benchmarks examines their histories and scientific bases. Self-contained introductions to causality, the practice of causal inference, sequential decision making, and reinforcement learning equip the reader with concepts and tools to reason about actions and their consequences. Throughout, the text discusses historical context and societal impact. We invite readers from all backgrounds; some experience with probability, calculus, and linear algebra suffices.

[23]:Advanced Ore Mine Optimisation under Uncertainty Using Evolution
标题:基于进化论的不确定性矿山优化
作者:William Reid, Aneta Neumann, Simon Ratcliffe, Frank Neumann
链接:https://arxiv.org/abs/2102.05235
摘要:In this paper, we investigate the impact of uncertainty in advanced ore mine optimisation. We consider Maptek's software system Evolution which optimizes extraction sequences based on evolutionary computation techniques and quantify the uncertainty of the obtained solutions with respect to the ore deposit based on predictions obtained by ensembles of neural networks. Furthermore, we investigate the impact of staging on the obtained optimized solutions and discuss a wide range of components for this large scale stochastic optimisation problem which allow to mitigate the uncertainty in the ore deposit while maintaining high profitability.

[24]:Early Abandoning and Pruning for Elastic Distances
标题:弹性距离的早期放弃和修剪
作者:Matthieu Herrmann, Geoffrey I. Webb
链接:https://arxiv.org/abs/2102.05221
摘要:Elastic distances are key tools for time series analysis. Straightforward implementations require O(n2)space and time complexities, preventing many applications from scaling to long series. Much work hasbeen devoted in speeding up these applications, mostly with the development of lower bounds, allowing to avoid costly distance computations when a given threshold is exceeded. This threshold also allows to early abandon the computation of the distance itself. Another approach, developed for DTW, is to prune parts of the computation. All these techniques are orthogonal to each other. In this work, we develop a new generic strategy, "EAPruned", that tightly integrates pruning with early abandoning. We apply it to DTW, CDTW, WDTW, ERP, MSM and TWE, showing substantial speedup in NN1-like scenarios. Pruning also shows substantial speedup for some distances, benefiting applications such as clustering where all pairwise distances are required and hence early abandoning is not applicable. We release our implementation as part of a new C++ library for time series classification, along with easy to usePython/Numpy bindings.

[25]:Task-Optimal Exploration in Linear Dynamical Systems
标题:线性动力系统的任务优化探索
作者:Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
链接:https://arxiv.org/abs/2102.05214
摘要:Exploration in unknown environments is a fundamental problem in reinforcement learning and control. In this work, we study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a particular task. Formally, we study a broad class of decision-making problems in the setting of linear dynamical systems, a class that includes the linear quadratic regulator problem. We provide instance- and task-dependent lower bounds which explicitly quantify the difficulty of completing a task of interest. Motivated by our lower bound, we propose a computationally efficient experiment-design based exploration algorithm. We show that it optimally explores the environment, collecting precisely the information needed to complete the task, and provide finite-time bounds guaranteeing that it achieves the instance- and task-optimal sample complexity, up to constant factors. Through several examples of the LQR problem, we show that performing task-guided exploration provably improves on exploration schemes which do not take into account the task of interest. Along the way, we establish that certainty equivalence decision making is instance- and task-optimal, and obtain the first algorithm for the linear quadratic regulator problem which is instance-optimal. We conclude with several experiments illustrating the effectiveness of our approach in practice.

[26]:CaPC Learning: Confidential and Private Collaborative Learning
标题:CaPC学习:保密和私人合作学习
作者:Christopher A. Choquette-Choo, Natalie Dullerud, Adam Dziedzic, Yunxiang Zhang, Somesh Jha, Nicolas Papernot, Xiao Wang
备注:Published as a conference paper at ICLR 2021
链接:https://arxiv.org/abs/2102.05188
摘要:Machine learning benefits from large training datasets, which may not always be possible to collect by any single entity, especially when using privacy-sensitive data. In many contexts, such as healthcare and finance, separate parties may wish to collaborate and learn from each other's data but are prevented from doing so due to privacy regulations. Some regulations prevent explicit sharing of data between parties by joining datasets in a central location (confidentiality). Others also limit implicit sharing of data, e.g., through model predictions (privacy). There is currently no method that enables machine learning in such a setting, where both confidentiality and privacy need to be preserved, to prevent both explicit and implicit sharing of data. Federated learning only provides confidentiality, not privacy, since gradients shared still contain private information. Differentially private learning assumes unreasonably large datasets. Furthermore, both of these learning paradigms produce a central model whose architecture was previously agreed upon by all parties rather than enabling collaborative learning where each party learns and improves their own local model. We introduce Confidential and Private Collaborative (CaPC) learning, the first method provably achieving both confidentiality and privacy in a collaborative setting. We leverage secure multi-party computation (MPC), homomorphic encryption (HE), and other techniques in combination with privately aggregated teacher models. We demonstrate how CaPC allows participants to collaborate without having to explicitly join their training sets or train a central model. Each party is able to improve the accuracy and fairness of their model, even in settings where each party has a model that performs well on their own dataset or when datasets are not IID and model architectures are heterogeneous across parties.

[27]:Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement
标题:层次分解的基准、算法和度量
作者:Andrew Slavin Ross, Finale Doshi-Velez
链接:https://arxiv.org/abs/2102.05185
摘要:In representation learning, there has been recent interest in developing algorithms to disentangle the ground-truth generative factors behind data, and metrics to quantify how fully this occurs. However, these algorithms and metrics often assume that both representations and ground-truth factors are flat, continuous, and factorized, whereas many real-world generative processes involve rich hierarchical structure, mixtures of discrete and continuous variables with dependence between them, and even varying intrinsic dimensionality. In this work, we develop benchmarks, algorithms, and metrics for learning such hierarchical representations.

[28]:Nonstochastic Bandits with Infinitely Many Experts
标题:有无数专家的非暴力强盗
作者:X. Flora Meng, Tuhin Sarkar, Munther A. Dahleh
备注:11 pages including appendix, 1 figure
链接:https://arxiv.org/abs/2102.05164
摘要:We study the problem of nonstochastic bandits with infinitely many experts: A learner aims to maximize the total reward by taking actions sequentially based on bandit feedback while benchmarking against a countably infinite set of experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings while preserving the order of the regret upper bound. We then incorporate the variant into a meta-algorithm that works on infinitely many experts. We prove a high-probability upper bound of $\tilde{\mathcal{O}} \big( i^*K + \sqrt{KT} \big)$ on the regret, up to polylog factors, where $i^*$ is the unknown position of the best expert, $K$ is the number of actions, and $T$ is the time horizon. We also provide an example of structured experts and discuss how to expedite learning in such case. Our meta-learning algorithm achieves the tightest regret upper bound for the setting considered when $i^* = \tilde{\mathcal{O}} \big( \sqrt{T/K} \big)$. If a prior distribution is assumed to exist for $i^*$, the probability of satisfying a tight regret bound increases with $T$, the rate of which can be fast.

[29]:Classifier Calibration: with implications to threat scores in  cybersecurity
标题:分类器校准:对网络安全威胁分数的影响
作者:Waleed A. Yousef, Issa Traore, William Briguglio
链接:https://arxiv.org/abs/2102.05143
摘要:This paper explores the calibration of a classifier output score in binary classification problems. A calibrator is a function that maps the arbitrary classifier score, of a testing observation, onto $[0,1]$ to provide an estimate for the posterior probability of belonging to one of the two classes. Calibration is important for two reasons; first, it provides a meaningful score, that is the posterior probability; second, it puts the scores of different classifiers on the same scale for comparable interpretation. The paper presents three main contributions: (1) Introducing multi-score calibration, when more than one classifier provides a score for a single observation. (2) Introducing the idea that the classifier scores to a calibration process are nothing but features to a classifier, hence proposing extending the classifier scores to higher dimensions to boost the calibrator's performance. (3) Conducting a massive simulation study, in the order of 24,000 experiments, that incorporates different configurations, in addition to experimenting on two real datasets from the cybersecurity domain. The results show that there is no overall winner among the different calibrators and different configurations. However, general advices for practitioners include the following: the Platt's calibrator~\citep{Platt1999ProbabilisticOutputsForSupport}, a version of the logistic regression that decreases bias for a small sample size, has a very stable and acceptable performance among all experiments; our suggested multi-score calibration provides better performance than single score calibration in the majority of experiments, including the two real datasets. In addition, extending the scores can help in some experiments.

[30]:Using Deep LSD to build operators in GANs latent space with meaning in  real space
标题:利用深LSD构造具有实空间意义的GAN潜空间算子
作者:J. Quetzalcoatl Toledo-Marin, James A. Glazier
备注:9pp, 8 figs, 1 pseudocode, code available
链接:https://arxiv.org/abs/2102.05132
摘要:Generative models rely on the key idea that data can be represented in terms of latent variables which are uncorrelated by definition. Lack of correlation is important because it suggests that the latent space manifold is simpler to understand and manipulate. Generative models are widely used in deep learning, e.g., variational autoencoders (VAEs) and generative adversarial networks (GANs). Here we propose a method to build a set of linearly independent vectors in the latent space of a GANs, which we call quasi-eigenvectors. These quasi-eigenvectors have two key properties: i) They span all the latent space, ii) A set of these quasi-eigenvectors map to each of the labeled features one-on-one. We show that in the case of the MNIST, while the number of dimensions in latent space is large by construction, 98% of the data in real space map to a sub-domain of latent space of dimensionality equal to the number of labels. We then show how the quasi-eigenvalues can be used for Latent Spectral Decomposition (LSD), which has applications in denoising images and for performing matrix operations in latent space that map to feature transformations in real space. We show how this method provides insight into the latent space topology. The key point is that the set of quasi-eigenvectors form a basis set in latent space and each direction corresponds to a feature in real space.

[31]:Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection
标题:非分布检测的标签平滑嵌入假设
作者:Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler
链接:https://arxiv.org/abs/2102.05131
摘要:Detecting out-of-distribution (OOD) examples is critical in many applications. We propose an unsupervised method to detect OOD samples using a $k$-NN density estimate with respect to a classification model's intermediate activations on in-distribution samples. We leverage a recent insight about label smoothing, which we call the \emph{Label Smoothed Embedding Hypothesis}, and show that one of the implications is that the $k$-NN density estimator performs better as an OOD detection method both theoretically and empirically when the model is trained with label smoothing. Finally, we show that our proposal outperforms many OOD baselines and also provide new finite-sample high-probability statistical results for $k$-NN density estimation's ability to detect OOD examples.

[32]:An exact solver for the Weston-Watkins SVM subproblem
标题:Weston-Watkins支持向量机子问题的精确求解
作者:Yutong Wang, Clayton D. Scott
链接:https://arxiv.org/abs/2102.05640
摘要:Recent empirical evidence suggests that the Weston-Watkins support vector machine is among the best performing multiclass extensions of the binary SVM. Current state-of-the-art solvers repeatedly solve a particular subproblem approximately using an iterative strategy. In this work, we propose an algorithm that solves the subproblem exactly using a novel reparametrization of the Weston-Watkins dual problem. For linear WW-SVMs, our solver shows significant speed-up over the state-of-the-art solver when the number of classes is large. Our exact subproblem solver also allows us to prove linear convergence of the overall solver.

[33]:On the Regularity of Attention
标题:论注意的规律性
作者:James Vuckovic, Aristide Baratin, Remi Tachet des Combes
备注:Conference version ofarXiv:2007.02876
链接:https://arxiv.org/abs/2102.05628
摘要:Attention is a powerful component of modern neural networks across a wide variety of domains. In this paper, we seek to quantify the regularity (i.e. the amount of smoothness) of the attention operation. To accomplish this goal, we propose a new mathematical framework that uses measure theory and integral operators to model attention. We show that this framework is consistent with the usual definition, and that it captures the essential properties of attention. Then we use this framework to prove that, on compact domains, the attention operation is Lipschitz continuous and provide an estimate of its Lipschitz constant. Additionally, by focusing on a specific type of attention, we extend these Lipschitz continuity results to non-compact domains. We also discuss the effects regularity can have on NLP models, and applications to invertible and infinitely-deep networks.

[34]:Deep learning approaches to surrogates for solving the diffusion  equation for mechanistic real-world simulations
标题:机械真实世界模拟中求解扩散方程的代词深度学习方法
作者:J. Quetzalcóatl Toledo-Marín, Geoffrey Fox, James P. Sluka, James A. Glazier
备注:17 pp, 2 tables, 11 figs, 1sm, 12sm-figs, code available at GitHub
链接:https://arxiv.org/abs/2102.05527
摘要:In many mechanistic medical, biological, physical and engineered spatiotemporal dynamic models the numerical solution of partial differential equations (PDEs) can make simulations impractically slow. Biological models require the simultaneous calculation of the spatial variation of concentration of dozens of diffusing chemical species. Machine learning surrogates, neural networks trained to provide approximate solutions to such complicated numerical problems, can often provide speed-ups of several orders of magnitude compared to direct calculation. PDE surrogates enable use of larger models than are possible with direct calculation and can make including such simulations in real-time or near-real time workflows practical. Creating a surrogate requires running the direct calculation tens of thousands of times to generate training data and then training the neural network, both of which are computationally expensive. We use a Convolutional Neural Network to approximate the stationary solution to the diffusion equation in the case of two equal-diameter, circular, constant-value sources located at random positions in a two-dimensional square domain with absorbing boundary conditions. To improve convergence during training, we apply a training approach that uses roll-back to reject stochastic changes to the network that increase the loss function. The trained neural network approximation is about 1e3 times faster than the direct calculation for individual replicas. Because different applications will have different criteria for acceptable approximation accuracy, we discuss a variety of loss functions and accuracy estimators that can help select the best network for a particular application.

[35]:On Disentanglement in Gaussian Process Variational Autoencoders
标题:高斯过程变分自动编码器中的解纠缠
作者:Simon Bing, Vincent Fortuin, Gunnar Rätsch
链接:https://arxiv.org/abs/2102.05507
摘要:Complex multivariate time series arise in many fields, ranging from computer vision to robotics or medicine. Often we are interested in the independent underlying factors that give rise to the high-dimensional data we are observing. While many models have been introduced to learn such disentangled representations, only few attempt to explicitly exploit the structure of sequential data. We investigate the disentanglement properties of Gaussian process variational autoencoders, a class of models recently introduced that have been successful in different tasks on time series data. Our model exploits the temporal structure of the data by modeling each latent channel with a GP prior and employing a structured variational distribution that can capture dependencies in time. We demonstrate the competitiveness of our approach against state-of-the-art unsupervised and weakly-supervised disentanglement methods on a benchmark task. Moreover, we provide evidence that we can learn meaningful disentangled representations on real-world medical time series data.

[36]:On the Suboptimality of Thompson Sampling in High Dimensions
标题:高维Thompson抽样的次优性
作者:Raymond Zhang, Richard Combes
备注:33 pages
链接:https://arxiv.org/abs/2102.05502
摘要:In this paper we consider Thompson Sampling for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, Thompson Sampling is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions. We also show that including a fixed amount of forced exploration to Thompson Sampling does not alleviate the problem. We complement our theoretical results with numerical results and show that in practice Thompson Sampling indeed can perform very poorly in high dimensions.

[37]:Robust estimation of tree structured models
标题:树结构模型的鲁棒估计
作者:Marta Casanellas, Marina Garrote-López, Piotr Zwiernik
链接:https://arxiv.org/abs/2102.05472
摘要:Consider the problem of learning undirected graphical models on trees from corrupted data. Recently Katiyar et al. showed that it is possible to recover trees from noisy binary data up to a small equivalence class of possible trees. Their other paper on the Gaussian case follows a similar pattern. By framing this as a special phylogenetic recovery problem we largely generalize these two settings. Using the framework of linear latent tree models we discuss tree identifiability for binary data under a continuous corruption model. For the Ising and the Gaussian tree model we also provide a characterisation of when the Chow-Liu algorithm consistently learns the underlying tree from the noisy data.

[38]:A Framework of Inertial Alternating Direction Method of Multipliers for  Non-Convex Non-Smooth Optimization
标题:非凸非光滑优化的惯性交替方向乘子法框架
作者:Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis
备注:25 pages
链接:https://arxiv.org/abs/2102.05433
摘要:In this paper, we propose an algorithmic framework dubbed inertial alternating direction methods of multipliers (iADMM), for solving a class of nonconvex nonsmooth multiblock composite optimization problems with linear constraints. Our framework employs the general minimization-majorization (MM) principle to update each block of variables so as to not only unify the convergence analysis of previous ADMM that use specific surrogate functions in the MM step, but also lead to new efficient ADMM schemes. To the best of our knowledge, in the \emph{nonconvex nonsmooth} setting, ADMM used in combination with the MM principle to update each block of variables, and ADMM combined with inertial terms for the primal variables have not been studied in the literature. Under standard assumptions, we prove the subsequential convergence and global convergence for the generated sequence of iterates. We illustrate the effectiveness of iADMM on a class of nonconvex low-rank representation problems.

[39]:Learning Interaction-Aware Trajectory Predictions for Decentralized  Multi-Robot Motion Planning in Dynamic Environments
标题:动态环境下分散多机器人运动规划的学习交互感知轨迹预测
作者:Hai Zhu, Francisco Martinez Claramunt, Bruno Brito, Javier Alonso-Mora
备注:8 pages, 5 figures, IEEE Robotics and Automation Letters
链接:https://arxiv.org/abs/2102.05382
摘要:This paper presents a data-driven decentralized trajectory optimization approach for multi-robot motion planning in dynamic environments. When navigating in a shared space, each robot needs accurate motion predictions of neighboring robots to achieve predictive collision avoidance. These motion predictions can be obtained among robots by sharing their future planned trajectories with each other via communication. However, such communication may not be available nor reliable in practice. In this paper, we introduce a novel trajectory prediction model based on recurrent neural networks (RNN) that can learn multi-robot motion behaviors from demonstrated trajectories generated using a centralized sequential planner. The learned model can run efficiently online for each robot and provide interaction-aware trajectory predictions of its neighbors based on observations of their history states. We then incorporate the trajectory prediction model into a decentralized model predictive control (MPC) framework for multi-robot collision avoidance. Simulation results show that our decentralized approach can achieve a comparable level of performance to a centralized planner while being communication-free and scalable to a large number of robots. We also validate our approach with a team of quadrotors in real-world experiments.

[40]:Explaining Inference Queries with Bayesian Optimization
标题:用贝叶斯优化解释推理查询
作者:Brandon Lockhart, Jinglin Peng, Weiyuan Wu, Jiannan Wang, Eugene Wu
链接:https://arxiv.org/abs/2102.05308
摘要:Obtaining an explanation for an SQL query result can enrich the analysis experience, reveal data errors, and provide deeper insight into the data. Inference query explanation seeks to explain unexpected aggregate query results on inference data; such queries are challenging to explain because an explanation may need to be derived from the source, training, or inference data in an ML pipeline. In this paper, we model an objective function as a black-box function and propose BOExplain, a novel framework for explaining inference queries using Bayesian optimization (BO). An explanation is a predicate defining the input tuples that should be removed so that the query result of interest is significantly affected. BO - a technique for finding the global optimum of a black-box function - is used to find the best predicate. We develop two new techniques (individual contribution encoding and warm start) to handle categorical variables. We perform experiments showing that the predicates found by BOExplain have a higher degree of explanation compared to those found by the state-of-the-art query explanation engines. We also show that BOExplain is effective at deriving explanations for inference queries from source and training data on three real-world datasets.

[41]:Using hardware performance counters to speed up autotuning convergence  on GPUs
标题:利用硬件性能计数器加速GPU上的自动调谐收敛
作者:Jiří Filipovič, Jana Hozzová, Amin Nezarat, Jaroslav Oľha, Filip Petrovič
链接:https://arxiv.org/abs/2102.05297
摘要:Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware.
In this paper, we introduce a novel method for searching tuning spaces. The method takes advantage of collecting hardware performance counters (also known as profiling counters) during empirical tuning. Those counters are used to navigate the searching process towards faster implementations. The method requires the tuning space to be sampled on any GPU. It builds a problem-specific model, which can be used during autotuning on various, even previously unseen inputs or GPUs. Using a set of five benchmarks, we experimentally demonstrate that our method can speed up autotuning when an application needs to be ported to different hardware or when it needs to process data with different characteristics. We also compared our method to state of the art and show that our method is superior in terms of the number of searching steps and typically outperforms other searches in terms of convergence time.

[42]:Player Modeling via Multi-Armed Bandits
标题:基于多武装土匪的玩家建模
作者:Robert C. Gray, Jichen Zhu, Dannielle Arigo, Evan Forman, Santiago Ontañón
链接:https://arxiv.org/abs/2102.05264
摘要:This paper focuses on building personalized player models solely from player behavior in the context of adaptive games. We present two main contributions: The first is a novel approach to player modeling based on multi-armed bandits (MABs). This approach addresses, at the same time and in a principled way, both the problem of collecting data to model the characteristics of interest for the current player and the problem of adapting the interactive experience based on this model. Second, we present an approach to evaluating and fine-tuning these algorithms prior to generating data in a user study. This is an important problem, because conducting user studies is an expensive and labor-intensive process; therefore, an ability to evaluate the algorithms beforehand can save a significant amount of resources. We evaluate our approach in the context of modeling players' social comparison orientation (SCO) and present empirical results from both simulations and real players.

[43]:A Deep Learning Approach for Characterizing Major Galaxy Mergers
标题:描述主要星系合并的深度学习方法
作者:Skanda Koppula, Victor Bapst, Marc Huertas-Company, Sam Blackwell, Agnieszka Grabska-Barwinska, Sander Dieleman, Andrea Huber, Natasha Antropova, Mikolaj Binkowski, Hannah Openshaw, Adria Recasens, Fernando Caro, Avishai Deke, Yohan Dubois, Jesus Vega Ferrero, David C. Koo, Joel R. Primack, Trevor Back
备注:Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020), Vancouver, Canada
链接:https://arxiv.org/abs/2102.05182
摘要:Fine-grained estimation of galaxy merger stages from observations is a key problem useful for validation of our current theoretical understanding of galaxy formation. To this end, we demonstrate a CNN-based regression model that is able to predict, for the first time, using a single image, the merger stage relative to the first perigee passage with a median error of 38.3 million years (Myrs) over a period of 400 Myrs. This model uses no specific dynamical modeling and learns only from simulated merger events. We show that our model provides reasonable estimates on real observations, approximately matching prior estimates provided by detailed dynamical modeling. We provide a preliminary interpretability analysis of our models, and demonstrate first steps toward calibrated uncertainty estimation.

[44]:On the Hardness of PAC-learning stabilizer States with Noise
标题:带噪声的PAC学习稳定状态的硬度研究
作者:Aravind Gollakota, Daniel Liang
链接:https://arxiv.org/abs/2102.05174
摘要:We consider the problem of learning stabilizer states with noise in the Probably Approximately Correct (PAC) framework of Aaronson (2007) for learning quantum states. In the noiseless setting, an algorithm for this problem was recently given by Rocchetto (2018), but the noisy case was left open. Motivated by approaches to noise tolerance from classical learning theory, we introduce the Statistical Query (SQ) model for PAC-learning quantum states, and prove that algorithms in this model are indeed resilient to common forms of noise, including classification and depolarizing noise. We prove an exponential lower bound on learning stabilizer states in the SQ model. Even outside the SQ model, we prove that learning stabilizer states with noise is in general as hard as Learning Parity with Noise (LPN) using classical examples. Our results position the problem of learning stabilizer states as a natural quantum analogue of the classical problem of learning parities: easy in the noiseless setting, but seemingly intractable even with simple forms of noise.

[45]:Enhancing Audio Augmentation Methods with Consistency Learning
标题:用一致性学习增强音频增强方法
作者:Turab Iqbal, Karim Helwani, Arvindh Krishnaswamy, Wenwu Wang
备注:Accepted to 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)
链接:https://arxiv.org/abs/2102.05151
摘要:Data augmentation is an inexpensive way to increase training data diversity, and is commonly achieved via transformations of existing data. For tasks such as classification, there is a good case for learning representations of the data that are invariant to such transformations, yet this is not explicitly enforced by classification losses such as the cross-entropy loss. This paper investigates the use of training objectives that explicitly impose this consistency constraint, and how it can impact downstream audio classification tasks. In the context of deep convolutional neural networks in the supervised setting, we show empirically that certain measures of consistency are not implicitly captured by the cross-entropy loss, and that incorporating such measures into the loss function can improve the performance of tasks such as audio tagging. Put another way, we demonstrate how existing augmentation methods can further improve learning by enforcing consistency.

[46]:Regularization Strategies for Quantile Regression
标题:分位数回归的正则化策略
作者:Taman Narayan, Serena Wang, Kevin Canini, Maya Gupta
链接:https://arxiv.org/abs/2102.05135
摘要:We investigate different methods for regularizing quantile regression when predicting either a subset of quantiles or the full inverse CDF. We show that minimizing an expected pinball loss over a continuous distribution of quantiles is a good regularizer even when only predicting a specific quantile. For predicting multiple quantiles, we propose achieving the classic goal of non-crossing quantiles by using deep lattice networks that treat the quantile as a monotonic input feature, and we discuss why monotonicity on other features is an apt regularizer for quantile regression. We show that lattice models enable regularizing the predicted distribution to a location-scale family. Lastly, we propose applying rate constraints to improve the calibration of the quantile predictions on specific subsets of interest and improve fairness metrics. We demonstrate our contributions on simulations, benchmark datasets, and real quantile regression problems.

[47]:Local and Global Uniform Convexity Conditions
标题:局部和全局一致凸性条件
作者:Thomas Kerdreux, Alexandre d'Aspremont, Sebastian Pokutta
链接:https://arxiv.org/abs/2102.05134
摘要:We review various characterizations of uniform convexity and smoothness on norm balls in finite-dimensional spaces and connect results stemming from the geometry of Banach spaces with \textit{scaling inequalities} used in analysing the convergence of optimization methods. In particular, we establish local versions of these conditions to provide sharper insights on a recent body of complexity results in learning theory, online learning, or offline optimization, which rely on the strong convexity of the feasible set. While they have a significant impact on complexity, these strong convexity or uniform convexity properties of feasible sets are not exploited as thoroughly as their functional counterparts, and this work is an effort to correct this imbalance. We conclude with some practical examples in optimization and machine learning where leveraging these conditions and localized assumptions lead to new complexity results.

[48]:Dynamic Mode Decomposition of inertial particle caustics in Taylor-Green  flow
标题:Taylor-Green流动中惯性粒子焦散线的动态模式分解
作者:Omstavan Samant, Jaya Kumar Alageshan, Sarveshwar Sharma, Animesh Kuley
备注:9 pages, 6 figures
链接:https://arxiv.org/abs/2102.05120
摘要:Inertial particles advected by a background flow can show complex structures. We consider inertial particles in a 2D Taylor-Green (TG) flow and characterize particle dynamics as a function of the particle's Stokes number using dynamic mode decomposition (DMD) method from particle image velocimetry (PIV) like-data. We observe the formation of caustic structures and analyze them using DMD to (a) determine the Stokes number of the particles, and (b) estimate the particle Stokes number composition. Our analysis in this idealized flow will provide useful insight to analyze inertial particles in more complex or turbulent flows. We propose that the DMD technique can be used to perform a similar analysis on an experimental system.

[49]:CDPAM: Contrastive learning for perceptual audio similarity
标题:感知音频相似性的对比学习
作者:Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein
备注:Dataset, code and sound examples can be found atthis https URL
链接:https://arxiv.org/abs/2102.05109
摘要:Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbations on which it was trained. This paper introduces CDPAM, a metric that builds on and advances DPAM. The primary improvement is to combine contrastive learning and multi-dimensional representations to build robust models from limited data. In addition, we collect human judgments on triplet comparisons to improve generalization to a broader range of audio perturbations. CDPAM correlates well with human responses across nine varied datasets. We also show that adding this metric to existing speech synthesis and enhancement methods yields significant improvement, as measured by objective and subjective tests.

[50]:Sub-seasonal forecasting with a large ensemble of deep-learning weather  prediction models
标题:大集成深度学习天气预报模型的分季节预报
作者:Jonathan A. Weyn, Dale R. Durran, Rich Caruana, Nathaniel Cresswell-Clay
备注:Submitted to Journal of Advances in Modeling Earth Systems
链接:https://arxiv.org/abs/2102.05107
摘要:We present an ensemble prediction system using a Deep Learning Weather Prediction (DLWP) model that recursively predicts key atmospheric variables with six-hour time resolution. This model uses convolutional neural networks (CNNs) on a cubed sphere grid to produce global forecasts. The approach is computationally efficient, requiring just three minutes on a single GPU to produce a 320-member set of six-week forecasts at 1.4° resolution. Ensemble spread is primarily produced by randomizing the CNN training process to create a set of 32 DLWP models with slightly different learned weights. Although our DLWP model does not forecast precipitation, it does forecast total column water vapor, and it gives a reasonable 4.5-day deterministic forecast of Hurricane Irma. In addition to simulating mid-latitude weather systems, it spontaneously generates tropical cyclones in a one-year free-running simulation. Averaged globally and over a two-year test set, the ensemble mean RMSE retains skill relative to climatology beyond two-weeks, with anomaly correlation coefficients remaining above 0.6 through six days. Our primary application is to subseasonal-to-seasonal (S2S) forecasting at lead times from two to six weeks. Current forecast systems have low skill in predicting one- or 2-week-average weather patterns at S2S time scales. The continuous ranked probability score (CRPS) and the ranked probability skill score (RPSS) show that the DLWP ensemble is only modestly inferior in performance to the European Centre for Medium Range Weather Forecasts (ECMWF) S2S ensemble over land at lead times of 4 and 5-6 weeks. At shorter lead times, the ECMWF ensemble performs better than DLWP.

[51]:Point Cloud Transformers applied to Collider Physics
标题:点云变换器在对撞机物理中的应用
作者:Vinicius Mikuni, Florencia Canelli
备注:12 pages, 3 figures
链接:https://arxiv.org/abs/2102.05073
摘要:Methods for processing point cloud information have seen a great success in collider physics applications. One recent breakthrough in machine learning is the usage of Transformer networks to learn semantic relationships between sequences in language processing. In this work, we apply a modified Transformer network called Point Cloud Transformer as a method to incorporate the advantages of the Transformer architecture to an unordered set of particles resulting from collision events. To compare the performance with other strategies, we study jet-tagging applications for highly-boosted particles.

[52]:Last Query Transformer RNN for knowledge tracing
标题:用于知识跟踪的最后一个查询转换器RNN
作者:SeungKee Jeon
备注:kaggle competition 'Riiid! Answer Correctness Prediction' 1st place solution
链接:https://arxiv.org/abs/2102.05038
摘要:This paper presents an efficient model to predict a student's answer correctness given his past learning activities. Basically, I use both transformer encoder and RNN to deal with time series input. The novel point of the model is that it only uses the last input as query in transformer encoder, instead of all sequence, which makes QK matrix multiplication in transformer Encoder to have O(L) time complexity, instead of O(L^2). It allows the model to input longer sequence. Using this model I achieved the 1st place in the 'Riiid! Answer Correctness Prediction' competition hosted on kaggle.

CV方向重复(14篇)


[1]:Hyperbolic Generative Adversarial Network
标题:双曲生成对抗网络
作者:Diego Lazcano, Nicolás Fredes, Werner Creixell
链接:https://arxiv.org/abs/2102.05567
摘要:Recently, Hyperbolic Spaces in the context of Non-Euclidean Deep Learning have gained popularity because of their ability to represent hierarchical data. We propose that it is possible to take advantage of the hierarchical characteristic present in the images by using hyperbolic neural networks in a GAN architecture. In this study, different configurations using fully connected hyperbolic layers in the GAN, CGAN, and WGAN are tested, in what we call the HGAN, HCGAN, and HWGAN, respectively. The results are measured using the Inception Score (IS) and the Fréchet Inception Distance (FID) on the MNIST dataset. Depending on the configuration and space curvature, better results are achieved for each proposed hyperbolic versions than their euclidean counterpart.

[2]:Robustness in Compressed Neural Networks for Object Detection
标题:压缩神经网络在目标检测中的鲁棒性
作者:Sebastian Cygert, Andrzej Czyzewski
链接:https://arxiv.org/abs/2102.05509
摘要:Model compression techniques allow to significantly reduce the computational cost associated with data processing by deep neural networks with only a minor decrease in average accuracy. Simultaneously, reducing the model size may have a large effect on noisy cases or objects belonging to less frequent classes. It is a crucial problem from the perspective of the models' safety, especially for object detection in the autonomous driving setting, which is considered in this work. It was shown in the paper that the sensitivity of compressed models to different distortion types is nuanced, and some of the corruptions are heavily impacted by the compression methods (i.e., additive noise), while others (blur effect) are only slightly affected. A common way to improve the robustness of models is to use data augmentation, which was confirmed to positively affect models' robustness, also for highly compressed models. It was further shown that while data imbalance methods brought only a slight increase in accuracy for the baseline model (without compression), the impact was more striking at higher compression rates for the structured pruning. Finally, methods for handling data imbalance brought a significant improvement of the pruned models' worst-detected class accuracy.

[3]:BRECQ: Pushing the Limit of Post-Training Quantization by Block  Reconstruction
标题:BRECQ:通过块重建突破训练后量化的极限
作者:Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, Shi Gu
链接:https://arxiv.org/abs/2102.05426
摘要:We study the challenging task of neural network quantization without end-to-end retraining, called Post-training Quantization (PTQ). PTQ usually requires a small subset of training data but produces less powerful quantized models than Quantization-Aware Training (QAT). In this work, we propose a novel PTQ framework, dubbed BRECQ, which pushes the limits of bitwidth in PTQ down to INT2 for the first time. BRECQ leverages the basic building blocks in neural networks and reconstructs them one-by-one. In a comprehensive theoretical study of the second-order error, we show that BRECQ achieves a good balance between cross-layer dependency and generalization error. To further employ the power of quantization, the mixed precision technique is incorporated in our framework by approximating the inter-layer and intra-layer sensitivity. Extensive experiments on various handcrafted and searched neural architectures are conducted for both image classification and object detection tasks. And for the first time we prove that, without bells and whistles, PTQ can attain 4-bit ResNet and MobileNetV2 comparable with QAT and enjoy 240 times faster production of quantized models. Codes are available atthis https URL.

[4]:Input Similarity from the Neural Network Perspective
标题:神经网络视角下的输入相似性
作者:Guillaume Charpiat, Nicolas Girard, Loris Felardos, Yuliya Tarabalka
备注:Published at NeurIPS 2019
链接:https://arxiv.org/abs/2102.05262
摘要:We first exhibit a multimodal image registration task, for which a neural network trained on a dataset with noisy labels reaches almost perfect accuracy, far beyond noise variance. This surprising auto-denoising phenomenon can be explained as a noise averaging effect over the labels of similar input examples. This effect theoretically grows with the number of similar examples; the question is then to define and estimate the similarity of examples.
We express a proper definition of similarity, from the neural network perspective, i.e. we quantify how undissociable two inputs $A$ and $B$ are, taking a machine learning viewpoint: how much a parameter variation designed to change the output for $A$ would impact the output for $B$ as well?
We study the mathematical properties of this similarity measure, and show how to use it on a trained network to estimate sample density, in low complexity, enabling new types of statistical analysis for neural networks. We analyze data by retrieving samples perceived as similar by the network, and are able to quantify the denoising effect without requiring true labels. We also propose, during training, to enforce that examples known to be similar should also be seen as similar by the network, and notice speed-up training effects for certain datasets.

[5]:Policy Augmentation: An Exploration Strategy for Faster Convergence of  Deep Reinforcement Learning Algorithms
标题:策略增强:深度强化学习算法快速收敛的探索策略
作者:Arash Mahyari
备注:proceedings of 46th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
链接:https://arxiv.org/abs/2102.05249
摘要:Despite advancements in deep reinforcement learning algorithms, developing an effective exploration strategy is still an open problem. Most existing exploration strategies either are based on simple heuristics, or require the model of the environment, or train additional deep neural networks to generate imagination-augmented paths. In this paper, a revolutionary algorithm, called Policy Augmentation, is introduced. Policy Augmentation is based on a newly developed inductive matrix completion method. The proposed algorithm augments the values of unexplored state-action pairs, helping the agent take actions that will result in high-value returns while the agent is in the early episodes. Training deep reinforcement learning algorithms with high-value rollouts leads to the faster convergence of deep reinforcement learning algorithms. Our experiments show the superior performance of Policy Augmentation. The code can be found at:this https URL.

[6]:Driver2vec: Driver Identification from Automotive Data
标题:Driver2vec:从汽车数据中识别驾驶员
作者:Jingbo Yang, Ruge Zhao, Meixian Zhu, David Hallac, Jaka Sodnik, Jure Leskovec
备注:7 pages, 3 figures, 6 tables in the main text. First publisehd to 6th Workshop on Mining and Learning from Time Series (2020)
链接:https://arxiv.org/abs/2102.05234
摘要:With increasing focus on privacy protection, alternative methods to identify vehicle operator without the use of biometric identifiers have gained traction for automotive data analysis. The wide variety of sensors installed on modern vehicles enable autonomous driving, reduce accidents and improve vehicle handling. On the other hand, the data these sensors collect reflect drivers' habit. Drivers' use of turn indicators, following distance, rate of acceleration, etc. can be transformed to an embedding that is representative of their behavior and identity. In this paper, we develop a deep learning architecture (Driver2vec) to map a short interval of driving data into an embedding space that represents the driver's behavior to assist in driver identification. We develop a custom model that leverages performance gains of temporal convolutional networks, embedding separation power of triplet loss and classification accuracy of gradient boosting decision trees. Trained on a dataset of 51 drivers provided by Nervtech, Driver2vec is able to accurately identify the driver from a short 10-second interval of sensor data, achieving an average pairwise driver identification accuracy of 83.1% from this 10-second interval, which is remarkably higher than performance obtained in previous studies. We then analyzed performance of Driver2vec to show that its performance is consistent across scenarios and that modeling choices are sound.

[7]:FLOP: Federated Learning on Medical Datasets using Partial Networks
标题:基于部分网络的医学数据集联合学习
作者:Qian Yang, Jianyi Zhang, Weituo Hao, Gregory Spell, Lawrence Carin
链接:https://arxiv.org/abs/2102.05218
摘要:The outbreak of COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources. To aid and accelerate the diagnosis process, automatic diagnosis of COVID-19 via deep learning models has recently been explored by researchers across the world. While different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19, the data itself is still scarce due to patient privacy concerns. Federated Learning (FL) is a natural solution because it allows different organizations to cooperatively learn an effective deep learning model without sharing raw data. However, recent studies show that FL still lacks privacy protection and may cause data leakage. We investigate this challenging problem by proposing a simple yet effective algorithm, named \textbf{F}ederated \textbf{L}earning \textbf{o}n Medical Datasets using \textbf{P}artial Networks (FLOP), that shares only a partial model between the server and clients. Extensive experiments on benchmark data and real-world healthcare tasks show that our approach achieves comparable or better performance while reducing the privacy and security risks. Of particular interest, we conduct experiments on the COVID-19 dataset and find that our FLOP algorithm can allow different hospitals to collaboratively and effectively train a partially shared model without sharing local patients' data.

[8]:Transfer learning based few-shot classification using optimal transport  mapping from preprocessed latent space of backbone neural network
标题:基于转移学习的神经网络预处理潜空间最优传输映射少镜头分类
作者:Tomáš Chobola, Daniel Vašata, Pavel Kordík
链接:https://arxiv.org/abs/2102.05176
摘要:MetaDL Challenge 2020 focused on image classification tasks in few-shot settings. This paper describes second best submission in the competition. Our meta learning approach modifies the distribution of classes in a latent space produced by a backbone network for each class in order to better follow the Gaussian distribution. After this operation which we call Latent Space Transform algorithm, centers of classes are further aligned in an iterative fashion of the Expectation Maximisation algorithm to utilize information in unlabeled data that are often provided on top of few labelled instances. For this task, we utilize optimal transport mapping using the Sinkhorn algorithm. Our experiments show that this approach outperforms previous works as well as other variants of the algorithm, using K-Nearest Neighbour algorithm, Gaussian Mixture Models, etc.

[9]:"What's in the box?!": Deflecting Adversarial Attacks by Randomly  Deploying Adversarially-Disjoint Models
标题:“盒子里有什么?!”:通过随机部署敌方不相交模型来转移敌方攻击
作者:Sahar Abdelnabi, Mario Fritz
链接:https://arxiv.org/abs/2102.05104
摘要:Machine learning models are now widely deployed in real-world applications. However, the existence of adversarial examples has been long considered a real threat to such models. While numerous defenses aiming to improve the robustness have been proposed, many have been shown ineffective. As these vulnerabilities are still nowhere near being eliminated, we propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models. Instead of training a single partially-robust model, one could train a set of same-functionality, yet, adversarially-disjoint models with minimal in-between attack transferability. These models could then be randomly and individually deployed, such that accessing one of them minimally affects the others. Our experiments on CIFAR-10 and a wide range of attacks show that we achieve a significantly lower attack transferability across our disjoint models compared to a baseline of ensemble diversity. In addition, compared to an adversarially trained set, we achieve a higher average robust accuracy while maintaining the accuracy of clean examples.

[10]:Dysplasia grading of colorectal polyps through CNN analysis of WSI
标题:大肠息肉不典型增生分级的CNN分析
作者:Daniele Perlo, Enzo Tartaglione, Luca Bertero, Paola Cassoni, Marco Grangetto
链接:https://arxiv.org/abs/2102.05498
摘要:Colorectal cancer is a leading cause of cancer death for both men and women. For this reason, histopathological characterization of colorectal polyps is the major instrument for the pathologist in order to infer the actual risk for cancer and to guide further follow-up. Colorectal polyps diagnosis includes the evaluation of the polyp type, and more importantly, the grade of dysplasia. This latter evaluation represents a critical step for the clinical follow-up. The proposed deep learning-based classification pipeline is based on state-of-the-art convolutional neural network, trained using proper countermeasures to tackle WSI high resolution and very imbalanced dataset. The experimental results show that one can successfully classify adenomas dysplasia grade with 70% accuracy, which is in line with the pathologists' concordance.

[11]:Two Novel Performance Improvements for Evolving CNN Topologies
标题:进化CNN拓扑的两种新性能改进
作者:Yaron Strauch, Jo Grundy
备注:Accepted to AAAI-21 Workshop W17: Learning Network Architecture during Training. 5 pages, 4 figures
链接:https://arxiv.org/abs/2102.05451
摘要:Convolutional Neural Networks (CNNs) are the state-of-the-art algorithms for the processing of images. However the configuration and training of these networks is a complex task requiring deep domain knowledge, experience and much trial and error. Using genetic algorithms, competitive CNN topologies for image recognition can be produced for any specific purpose, however in previous work this has come at high computational cost. In this work two novel approaches are presented to the utilisation of these algorithms, effective in reducing complexity and training time by nearly 20%. This is accomplished via regularisation directly on training time, and the use of partial training to enable early ranking of individual architectures. Both approaches are validated on the benchmark CIFAR10 data set, and maintain accuracy.

[12]:Reference-based Texture transfer for Single Image Super-resolution of  Magnetic Resonance images
标题:基于参考的单图像超分辨率纹理变换
作者:Madhu Mithra K K, Sriprabha Ramanarayanan, Keerthi Ram, Mohanasankar Sivaprakasam
备注:Accepted at ISBI 2021
链接:https://arxiv.org/abs/2102.05450
摘要:Magnetic Resonance Imaging (MRI) is a valuable clinical diagnostic modality for spine pathologies with excellent characterization for infection, tumor, degenerations, fractures and herniations. However in surgery, image-guided spinal procedures continue to rely on CT and fluoroscopy, as MRI slice resolutions are typically insufficient. Building upon state-of-the-art single image super-resolution, we propose a reference-based, unpaired multi-contrast texture-transfer strategy for deep learning based in-plane and across-plane MRI super-resolution. We use the scattering transform to relate the texture features of image patches to unpaired reference image patches, and additionally a loss term for multi-contrast texture. We apply our scheme in different super-resolution architectures, observing improvement in PSNR and SSIM for 4x super-resolution in most of the cases.

[13]:RoBIC: A benchmark suite for assessing classifiers robustness
标题:RoBIC:一个用于评估分类器健壮性的基准测试套件
作者:Thibault Maho, Benoît Bonnet, Teddy Furon, Erwan Le Merrer
备注:4 pages
链接:https://arxiv.org/abs/2102.05368
摘要:Many defenses have emerged with the development of adversarial attacks. Models must be objectively evaluated accordingly. This paper systematically tackles this concern by proposing a new parameter-free benchmark we coin RoBIC. RoBIC fairly evaluates the robustness of image classifiers using a new half-distortion measure. It gauges the robustness of the network against white and black box attacks, independently of its accuracy. RoBIC is faster than the other available benchmarks. We present the significant differences in the robustness of 16 recent models as assessed by RoBIC.

[14]:Enhancing Real-World Adversarial Patches with 3D Modeling Techniques
标题:利用三维建模技术增强真实世界的对抗性补丁
作者:Yael Mathov, Lior Rokach, Yuval Elovici
链接:https://arxiv.org/abs/2102.05334
摘要:Although many studies have examined adversarial examples in the real world, most of them relied on 2D photos of the attack scene; thus, the attacks proposed cannot address realistic environments with 3D objects or varied conditions. Studies that use 3D objects are limited, and in many cases, the real-world evaluation process is not replicable by other researchers, preventing others from reproducing the results. In this study, we present a framework that crafts an adversarial patch for an existing real-world scene. Our approach uses a 3D digital approximation of the scene as a simulation of the real world. With the ability to add and manipulate any element in the digital scene, our framework enables the attacker to improve the patch's robustness in real-world settings. We use the framework to create a patch for an everyday scene and evaluate its performance using a novel evaluation process that ensures that our results are reproducible in both the digital space and the real world. Our evaluation results show that the framework can generate adversarial patches that are robust to different settings in the real world.

NLP方向重复(5篇)


[1]:Towards More Fine-grained and Reliable NLP Performance Prediction
标题:更细粒度、更可靠的NLP性能预测
作者:Zihuiwen Ye, Pengfei Liu, Jinlan Fu, Graham Neubig
备注:Accepted by EACL 2021
链接:https://arxiv.org/abs/2102.05486
摘要:Performance prediction, the task of estimating a system's performance without performing experiments, allows us to reduce the experimental burden caused by the combinatorial explosion of different datasets, languages, tasks, and models. In this paper, we make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors not only for holistic measures of accuracy like F1 or BLEU but also fine-grained performance measures such as accuracy over individual classes of examples. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration. We perform an analysis of four types of NLP tasks, and both demonstrate the feasibility of fine-grained performance prediction and the necessity to perform reliability analysis for performance prediction methods in the future. We make our code publicly available: \url{this https URL}

[2]:Civil Rephrases Of Toxic Texts With Self-Supervised Transformers
标题:用自我监督变压器对有毒文本进行民事改写
作者:Leo Laugier, John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon
链接:https://arxiv.org/abs/2102.05456
摘要:Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still nascent. This work focuses on models that can help suggest rephrasings of toxic comments in a more civil manner. Inspired by recent progress in unpaired sequence-to-sequence tasks, a self-supervised learning model is introduced, called CAE-T5. CAE-T5 employs a pre-trained text-to-text transformer, which is fine tuned with a denoising and cyclic auto-encoder loss. Experimenting with the largest toxicity detection dataset to date (Civil Comments) our model generates sentences that are more fluent and better at preserving the initial content compared to earlier text style transfer systems which we compare with using several scoring systems and human evaluation.

[3]:Student sentiment Analysis Using Classification With Feature Extraction  Techniques
标题:基于分类和特征提取技术的学生情绪分析
作者:Latika Tamrakar, Dr.Padmavati Shrivastava, Dr. S. M. Ghosh
链接:https://arxiv.org/abs/2102.05439
摘要:Technical growths have empowered, numerous revolutions in the educational system by acquainting with technology into the classroom and by elevating the learning experience. Nowadays Web-based learning is getting much popularity. This paper describes the web-based learning and their effectiveness towards students. One of the prime factors in education or learning system is feedback; it is beneficial to learning if it must be used effectively. In this paper, we worked on how machine learning techniques like Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT) can be applied over Web-based learning, emphasis given on sentiment present in the feedback students. We also work on two types of Feature Extraction Technique (FETs) namely Count Vector (CVr) or Bag of Words) (BoW) and Term Frequency and Inverse Document Frequency (TF-IDF) Vector. In the research study, it is our goal for our proposed LR, SVM, NB, and DT models to classify the presence of Student Feedback Dataset (SFB) with improved accuracy with cleaned dataset and feature extraction techniques. The SFB is one of the significant concerns among the student sentimental analysis.

[4]:Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive  Language Models
标题:Argmax流与多项式扩散:走向非自回归语言模型
作者:Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, Max Welling
链接:https://arxiv.org/abs/2102.05379
摘要:The field of language modelling has been largely dominated by autoregressive models, for which sampling is inherently difficult to parallelize. This paper introduces two new classes of generative models for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our models perform competitively on language modelling and modelling of image segmentation maps.

[5]:AuGPT: Dialogue with Pre-trained Language Models and Data Augmentation
标题:AuGPT:与预先训练的语言模型和数据扩充的对话
作者:Jonáš Kulhánek, Vojtěch Hudeček, Tomáš Nekvinda, Ondřej Dušek
链接:https://arxiv.org/abs/2102.05126
摘要:Attention-based pre-trained language models such as GPT-2 brought considerable progress to end-to-end dialogue modelling. However, they also present considerable risks for task-oriented dialogue, such as lack of knowledge grounding or diversity. To address these issues, we introduce modified training objectives for language model finetuning, and we employ massive data augmentation via back-translation to increase the diversity of the training data. We further examine the possibilities of combining data from multiples sources to improve performance on the target dataset. We carefully evaluate our contributions with both human and automatic methods. Our model achieves state-of-the-art performance on the MultiWOZ data and shows competitive performance in human evaluation.
中文来自机器翻译,仅供参考。



扫描二维码

获取更多精彩

arXiv Daily