今日 cs.LG方向共计58篇文章。
Graph(1篇)
[1]:A Survey on Embedding Dynamic Graphs
标题:动态图嵌入研究综述
作者:Claudio D. T. Barros, Matheus R. F. Mendonça, Alex B. Vieira, Artur Ziviani
备注:40 pages, 10 figures
链接:https://arxiv.org/abs/2101.01229
摘要:Embedding static graphs in low-dimensional vector spaces plays a key role in network analytics and inference, supporting applications like node classification, link prediction, and graph visualization. However, many real-world networks present dynamic behavior, including topological evolution, feature evolution, and diffusion. Therefore, several methods for embedding dynamic graphs have been proposed to learn network representations over time, facing novel challenges, such as time-domain modeling, temporal features to be captured, and the temporal granularity to be embedded. In this survey, we overview dynamic graph embedding, discussing its fundamentals and the recent advances developed so far. We introduce the formal definition of dynamic graph embedding, focusing on the problem setting and introducing a novel taxonomy for dynamic graph embedding input and output. We further explore different dynamic behaviors that may be encompassed by embeddings, classifying by topological evolution, feature evolution, and processes on networks. Afterward, we describe existing techniques and propose a taxonomy for dynamic graph embedding techniques based on algorithmic approaches, from matrix and tensor factorization to deep learning, random walks, and temporal point processes. We also elucidate main applications, including dynamic link prediction, anomaly detection, and diffusion prediction, and we further state some promising research directions in the area.
联邦学习(1篇)
[1]:Federated Learning-Based Risk-Aware Decision toMitigate Fake Task Impacts on CrowdsensingPlatforms
标题:基于联合学习的风险感知决策,以避免虚假任务对众感知平台的影响
作者:Zhiyan Chen, Murat Simsek, Burak Kantarci
备注:This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
链接:https://arxiv.org/abs/2101.01266
摘要:Mobile crowdsensing (MCS) leverages distributed and non-dedicated sensing concepts by utilizing sensors imbedded in a large number of mobile smart devices. However, the openness and distributed nature of MCS leads to various vulnerabilities and consequent challenges to address. A malicious user submitting fake sensing tasks to an MCS platform may be attempting to consume resources from any number of participants' devices; as well as attempting to clog the MCS server. In this paper, a novel approach that is based on horizontal federated learning is proposed to identify fake tasks that contain a number of independent detection devices and an aggregation entity. Detection devices are deployed to operate in parallel with each device equipped with a machine learning (ML) module, and an associated training dataset. Furthermore, the aggregation module collects the prediction results from individual devices and determines the final decision with the objective of minimizing the prediction loss. Loss measurement considers the lost task values with respect to misclassification, where the final decision utilizes a risk-aware approach where the risk is formulated as a function of the utility loss. Experimental results demonstrate that using federated learning-driven illegitimate task detection with a risk aware aggregation function improves the detection performance of the traditional centralized framework. Furthermore, the higher performance of detection and lower loss of utility can be achieved by the proposed framework. This scheme can even achieve 100%detection accuracy using small training datasets distributed across devices, while achieving slightly over an 8% increase in detection improvement over traditional approaches.
对抗样本/GAN(3篇)
[1]:Adversarially trained LSTMs on reduced order models of urban air pollution simulations
标题:城市大气污染模拟降阶模型的对抗训练LSTMs
作者:César Quilodrán-Casas, Rossella Arcucci, Christopher Pain, Yike Guo
备注:6 pages, Third workshop on Machine LEarning and the Physical Sciences at NeurIPS 2020
链接:https://arxiv.org/abs/2101.01568
摘要:This paper presents an approach to improve computational fluid dynamics simulations forecasts of air pollution using deep learning. Our method, which integrates Principal Components Analysis (PCA) and adversarial training, is a way to improve the forecast skill of reduced order models obtained from the original model solution. Once the reduced-order model (ROM) is obtained via PCA, a Long Short-Term Memory network (LSTM) is adversarially trained on the ROM to make forecasts. Once trained, the adversarially trained LSTM outperforms a LSTM trained in a classical way. The study area is in London, including velocities and a concentration tracer that replicates a busy traffic junction. This adversarially trained LSTM-based approach is used on the ROM in order to produce faster forecasts of the air pollution tracer.
[2]:Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization
标题:基于抽取式摘要的CVE信息描述生成
作者:Jiamou Sun, Zhenchang Xing, Hao Guo, Deheng Ye, Xiaohong Li, Xiwei Xu, Liming Zhu
链接:https://arxiv.org/abs/2101.01431
摘要:ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60\% of these vulnerabilities have high- or critical-security risks. Unfortunately, over 73\% of exploits appear publicly earlier than the corresponding CVEs, and about 40\% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we propose an open information method to extract 9 key vulnerability aspects (vulnerable product/version/component, vulnerability type, vendor, attacker type, root cause, attack vector and impact) from the verbose and noisy ExploitDB posts. The extracted aspects from an ExploitDB post are then composed into a CVE description according to the suggested CVE description templates, which is must-provided information for requesting new CVEs. Through the evaluation on 13,017 manually labeled sentences and the statistically sampling of 3,456 extracted aspects, we confirm the high accuracy of our extraction method. Compared with 27,230 reference CVE descriptions. Our composed CVE descriptions achieve high ROUGH-L (0.38), a longest common subsequence based metric for evaluating text summarization methods.
[3]:Adversarial Combinatorial Bandits with General Non-linear Reward Functions
标题:具有一般非线性报酬函数的敌方组合土匪
作者:Xi Chen, Yanjun Han, Yining Wang
链接:https://arxiv.org/abs/2101.01301
摘要:In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback.} We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetilde\Theta_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $\Theta_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N,K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures.} Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.
强化学习(1篇)
[1]:Joint Deep Reinforcement Learning and Unfolding: Beam Selection and Precoding for mmWave Multiuser MIMO with Lens Arrays
标题:联合深度强化学习与展开:基于透镜阵列的毫米波多用户MIMO波束选择与预编码
作者:Qiyu Hu, Yanzhen Liu, Yunlong Cai, Guanding Yu, Zhi Ding
链接:https://arxiv.org/abs/2101.01336
摘要:The millimeter wave (mmWave) multiuser multiple-input multiple-output (MU-MIMO) systems with discrete lens arrays (DLA) have received great attention due to their simple hardware implementation and excellent performance. In this work, we investigate the joint design of beam selection and digital precoding matrices for mmWave MU-MIMO systems with DLA to maximize the sum-rate subject to the transmit power constraint and the constraints of the selection matrix structure. The investigated non-convex problem with discrete variables and coupled constraints is challenging to solve and an efficient framework of joint neural network (NN) design is proposed to tackle it. Specifically, the proposed framework consists of a deep reinforcement learning (DRL)-based NN and a deep-unfolding NN, which are employed to optimize the beam selection and digital precoding matrices, respectively. As for the DRL-based NN, we formulate the beam selection problem as a Markov decision process and a double deep Q-network algorithm is developed to solve it. The base station is considered to be an agent, where the state, action, and reward function are carefully designed. Regarding the design of the digital precoding matrix, we develop an iterative weighted minimum mean-square error algorithm induced deep-unfolding NN, which unfolds this algorithm into a layerwise structure with introduced trainable parameters. Simulation results verify that this jointly trained NN remarkably outperforms the existing iterative algorithms with reduced complexity and stronger robustness.
Neural Networks(1篇)
[1]:Recurrent Neural Networks for Stochastic Control Problems with Delay
标题:时滞随机控制问题的递归神经网络
作者:Jiequn Han, Ruimeng Hu
链接:https://arxiv.org/abs/2101.01385
摘要:Stochastic control problems with delay are challenging due to the path-dependent feature of the system and thus its intrinsic high dimensions. In this paper, we propose and systematically study deep neural networks-based algorithms to solve stochastic control problems with delay features. Specifically, we employ neural networks for sequence modeling (\emph{e.g.}, recurrent neural networks such as long short-term memory) to parameterize the policy and optimize the objective function. The proposed algorithms are tested on three benchmark examples: a linear-quadratic problem, optimal consumption with fixed finite delay, and portfolio optimization with complete memory. Particularly, we notice that the architecture of recurrent neural networks naturally captures the path-dependent feature with much flexibility and yields better performance with more efficient and stable training of the network compared to feedforward networks. The superiority is even evident in the case of portfolio optimization with complete memory, which features infinite delay.
聚类(2篇)
[1]:On the price of explainability for some clustering problems
标题:若干聚类问题的可解释性代价
作者:Eduardo Laber, Lucas Murtinho
备注:19 pages, 1 figure
链接:https://arxiv.org/abs/2101.01576
摘要:Machine learning models and algorithms are used in a number of systems that affect our daily life. Thus, in some settings, methods that are easy to explain or interpret may be highly desirable. The price of explainability can be thought of as the loss in terms of quality that is unavoidable if we restrict these systems to use explainable methods.
We study the price of explainability, under a theoretical perspective, for clustering tasks. We provide upper and lower bounds on this price as well as efficient algorithms to build explainable clustering for the $k$-means, $k$-medians, $k$-center and the maximum-spacing problems in a natural model in which explainability is achieved via decision trees.
[2]:SoS Degree Reduction with Applications to Clustering and Robust Moment Estimation
标题:SoS降阶及其在聚类和鲁棒矩估计中的应用
作者:David Steurer, Stefan Tiegel
备注:32 pages
链接:https://arxiv.org/abs/2101.01509
摘要:We develop a general framework to significantly reduce the degree of sum-of-squares proofs by introducing new variables. To illustrate the power of this framework, we use it to speed up previous algorithms based on sum-of-squares for two important estimation problems, clustering and robust moment estimation. The resulting algorithms offer the same statistical guarantees as the previous best algorithms but have significantly faster running times. Roughly speaking, given a sample of $n$ points in dimension $d$, our algorithms can exploit order-$\ell$ moments in time $d^{O(\ell)}\cdot n^{O(1)}$, whereas a naive implementation requires time $(d\cdot n)^{O(\ell)}$. Since for the aforementioned applications, the typical sample size is $d^{\Theta(\ell)}$, our framework improves running times from $d^{O(\ell^2)}$ to $d^{O(\ell)}$.
时间序列相关(1篇)
[1]:A Trainable Reconciliation Method for Hierarchical Time-Series
标题:一种可训练的层次时间序列协调方法
作者:Davide Burba, Trista Chen
备注:Accepted paper to ITISE 2021 (7th International Conference on Time Series and Forecasting). 12 pages, 3 figures, 3 tables
链接:https://arxiv.org/abs/2101.01329
摘要:In numerous applications, it is required to produce forecasts for multiple time-series at different hierarchy levels. An obvious example is given by the supply chain in which demand forecasting may be needed at a store, city, or country level. The independent forecasts typically do not add up properly because of the hierarchical constraints, so a reconciliation step is needed. In this paper, we propose a new general, flexible, and easy-to-implement reconciliation strategy based on an encoder-decoder neural network. By testing our method on four real-world datasets, we show that it can consistently reach or surpass the performance of existing methods in the reconciliation setting.
其他(36篇)
[1]:Analyzing movies to predict their commercial viability for producers
标题:分析电影以预测制片人的商业可行性
作者:Devendra Swami, Yash Phogat, Aadiraj Batlaw, Ashwin Goyal
备注:6 pages, 5 figures
链接:https://arxiv.org/abs/2101.01697
摘要:Upon film premiere, a major form of speculation concerns the relative success of the film. This relativity is in particular regards to the film's original budget, as many a time have big-budget blockbusters been met with exceptional success as met with abject failure. So how does one predict the success of an upcoming film? In this paper, we explored a vast array of film data in an attempt to develop a model that could predict the expected return of an upcoming film. The approach to this development is as follows: First, we began with the MovieLens dataset having common movie attributes along with genome tags per each film. Genome tags give insight into what particular characteristics of the film are most salient. We then included additional features regarding film content, cast/crew, audience perception, budget, and earnings from TMDB, IMDB, and Metacritic websites. Next, we performed exploratory data analysis and engineered a wide range of new features capturing historical information for the available features. Thereafter, we used singular value decomposition (SVD) for dimensionality reduction of the high dimensional features (ex. genome tags). Finally, we built a Random Forest Classifier and performed hyper-parameter tuning to optimize for model accuracy. A future application of our model could be seen in the film industry, allowing production companies to better predict the expected return of their projects based on their envisioned outline for their production procedure, thereby allowing them to revise their plan in an attempt to achieve optimal returns.
[2]:Label Augmentation via Time-based Knowledge Distillation for Financial Anomaly Detection
标题:基于时间知识提取的标签扩充在金融异常检测中的应用
作者:Hongda Shen, Eren Kursun
链接:https://arxiv.org/abs/2101.01689
摘要:Detecting anomalies has become increasingly critical to the financial service industry. Anomalous events are often indicative of illegal activities such as fraud, identity theft, network intrusion, account takeover, and money laundering. Financial anomaly detection use cases face serious challenges due to the dynamic nature of the underlying patterns especially in adversarial environments such as constantly changing fraud tactics. While retraining the models with the new patterns is absolutely essential; keeping up with the rapid changes introduces other challenges as it moves the model away from older patterns or continuously grows the size of the training data. The resulting data growth is hard to manage and it reduces the agility of the models' response to the latest attacks. Due to the data size limitations and the need to track the latest patterns, older time periods are often dropped in practice, which in turn, causes vulnerabilities. In this study, we propose a label augmentation approach to utilize the learning from older models to boost the latest. Experimental results show that the proposed approach provides a significant reduction in training time, while providing potential performance improvement.
[3]:Characterizing Intersectional Group Fairness with Worst-Case Comparisons
标题:用最坏情况比较刻画交叉群公平性
作者:Avijit Ghosh, Lea Genuit, Mary Reagan
链接:https://arxiv.org/abs/2101.01673
摘要:Machine Learning or Artificial Intelligence algorithms have gained considerable scrutiny in recent times owing to their propensity towards imitating and amplifying existing prejudices in society. This has led to a niche but growing body of work that identifies and attempts to fix these biases. A first step towards making these algorithms more fair is designing metrics that measure unfairness. Most existing work in this field deals with either a binary view of fairness (protected vs. unprotected groups) or politically defined categories (race or gender). Such categorization misses the important nuance of intersectionality - biases can often be amplified in subgroups that combine membership from different categories, especially if such a subgroup is particularly underrepresented in historical platforms of opportunity.
In this paper, we discuss why fairness metrics need to be looked at under the lens of intersectionality, identify existing work in intersectional fairness, suggest a simple worst case comparison method to expand the definitions of existing group fairness metrics to incorporate intersectionality, and finally conclude with the social, legal and political framework to handle intersectional fairness in the modern context.
[4]:Auto-Encoding Molecular Conformations
标题:自动编码分子构象
作者:Robin Winter, Frank Noé, Djork-Arné Clevert
备注:6 pages, 2 figures, presented at Machine Learning for Molecules Workshop at NeurIPS 2020
链接:https://arxiv.org/abs/2101.01618
摘要:In this work we introduce an Autoencoder for molecular conformations. Our proposed model converts the discrete spatial arrangements of atoms in a given molecular graph (conformation) into and from a continuous fixed-sized latent representation. We demonstrate that in this latent representation, similar conformations cluster together while distinct conformations split apart. Moreover, by training a probabilistic model on a large dataset of molecular conformations, we demonstrate how our model can be used to generate diverse sets of energetically favorable conformations for a given molecule. Finally, we show that the continuous representation allows us to utilize optimization methods to find molecules that have conformations with favourable spatial properties.
[5]:Hierarchical Sampler for Probabilistic Programs via Separation of Control and Data
标题:基于控制与数据分离的概率程序分层采样器
作者:Ichiro Hasuo, Yuichiro Oyabu, Clovis Eberhart, Kohei Suenaga, Kenta Cho, Shin-ya Katsumata
备注:11 pages with appendices
链接:https://arxiv.org/abs/2101.01502
摘要:We introduce a novel sampling algorithm for Bayesian inference on imperative probabilistic programs. It features a hierarchical architecture that separates control flows from data: the top-level samples a control flow, and the bottom level samples data values along the control flow picked by the top level. This separation allows us to plug various language-based analysis techniques in probabilistic program sampling; specifically, we use logical backward propagation of observations for sampling efficiency. We implemented our algorithm on top of Anglican. The experimental results demonstrate our algorithm's efficiency, especially for programs with while loops and rare observations.
[6]:Learning Sign-Constrained Support Vector Machines
标题:符号约束支持向量机的学习
作者:Kenya Tajima, Takahiko Henmi, Kohei Tsuchida, Esmeraldo Ronnie R. Zara, Tsuyoshi Kato
链接:https://arxiv.org/abs/2101.01473
摘要:Domain knowledge is useful to improve the generalization performance of learning machines. Sign constraints are a handy representation to combine domain knowledge with learning machine. In this paper, we consider constraining the signs of the weight coefficients in learning the linear support vector machine, and develop two optimization algorithms for minimizing the empirical risk under the sign constraints. One of the two algorithms is based on the projected gradient method, in which each iteration of the projected gradient method takes $O(nd)$ computational cost and the sublinear convergence of the objective error is guaranteed. The second algorithm is based on the Frank-Wolfe method that also converges sublinearly and possesses a clear termination criterion. We show that each iteration of the Frank-Wolfe also requires $O(nd)$ cost. Furthermore, we derive the explicit expression for the minimal iteration number to ensure an $\epsilon$-accurate solution by analyzing the curvature of the objective function. Finally, we empirically demonstrate that the sign constraints are a promising technique when similarities to the training examples compose the feature vector.
[7]:Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data
标题:大规模高维数据的数据质量度量与高效评估算法
作者:Hyeongmin Cho, Sangkyun Lee
链接:https://arxiv.org/abs/2101.01441
摘要:Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.
[8]:Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding
标题:异构多重图嵌入的二阶随机游走抽样
作者:Giorgio Valentini, Elena Casiraghi, Luca Cappelletti, Vida Ravanmehr, Tommaso Fontana, Justin Reese, Peter Robinson
备注:20 pages, 5 figures
链接:https://arxiv.org/abs/2101.01425
摘要:We introduce a set of algorithms (Het-node2vec) that extend the original node2vec node-neighborhood sampling method to heterogeneous multigraphs, i.e. networks characterized by multiple types of nodes and edges. The resulting random walk samples capture both the structural characteristics of the graph and the semantics of the different types of nodes and edges. The proposed algorithms can focus their attention on specific node or edge types, allowing accurate representations also for underrepresented types of nodes/edges that are of interest for the prediction problem under investigation. These rich and well-focused representations can boost unsupervised and supervised learning on heterogeneous graphs.
[9]:Data-Driven Copy-Paste Imputation for Energy Time Series
标题:能量时间序列的数据驱动复制粘贴插补
作者:Moritz Weber, Marian Turowski, Hüseyin K. Çakmak, Ralf Mikut, Uwe Kühnapfel, Veit Hagenmeyer
备注:8 pages, 7 figures, submitted to IEEE Transactions on Smart Grid, the first two authors equally contributed to this work
链接:https://arxiv.org/abs/2101.01423
摘要:A cornerstone of the worldwide transition to smart grids are smart meters. Smart meters typically collect and provide energy time series that are vital for various applications, such as grid simulations, fault-detection, load forecasting, load analysis, and load management. Unfortunately, these time series are often characterized by missing values that must be handled before the data can be used. A common approach to handle missing values in time series is imputation. However, existing imputation methods are designed for power time series and do not take into account the total energy of gaps, resulting in jumps or constant shifts when imputing energy time series. In order to overcome these issues, the present paper introduces the new Copy-Paste Imputation (CPI) method for energy time series. The CPI method copies data blocks with similar properties and pastes them into gaps of the time series while preserving the total energy of each gap. The new method is evaluated on a real-world dataset that contains six shares of artificially inserted missing values between 1 and 30%. It outperforms by far the three benchmark imputation methods selected for comparison. The comparison furthermore shows that the CPI method uses matching patterns and preserves the total energy of each gap while requiring only a moderate run-time.
[10]:To do or not to do: cost-sensitive causal decision-making
标题:做还是不做:成本敏感的因果决策
作者:Diego Olaya, Wouter Verbeke, Jente Van Belle, Marie-Anne Guerry
链接:https://arxiv.org/abs/2101.01407
摘要:Causal classification models are adopted across a variety of operational business processes to predict the effect of a treatment on a categorical business outcome of interest depending on the process instance characteristics. This allows optimizing operational decision-making and selecting the optimal treatment to apply in each specific instance, with the aim of maximizing the positive outcome rate. While various powerful approaches have been presented in the literature for learning causal classification models, no formal framework has been elaborated for optimal decision-making based on the estimated individual treatment effects, given the cost of the various treatments and the benefit of the potential outcomes.
In this article, we therefore extend upon the expected value framework and formally introduce a cost-sensitive decision boundary for double binary causal classification, which is a linear function of the estimated individual treatment effect, the positive outcome probability and the cost and benefit parameters of the problem setting. The boundary allows causally classifying instances in the positive and negative treatment class to maximize the expected causal profit, which is introduced as the objective at hand in cost-sensitive causal classification. We introduce the expected causal profit ranker which ranks instances for maximizing the expected causal profit at each possible threshold for causally classifying instances and differs from the conventional ranking approach based on the individual treatment effect. The proposed ranking approach is experimentally evaluated on synthetic and marketing campaign data sets. The results indicate that the presented ranking method effectively outperforms the cost-insensitive ranking approach and allows boosting profitability.
[11]:A Linearly Convergent Algorithm for Distributed Principal Component Analysis
标题:分布主成分分析的线性收敛算法
作者:Arpita Gang, Waheed U. Bajwa
备注:30 pages; 15 figures; preprint of a journal paper
链接:https://arxiv.org/abs/2101.01300
摘要:Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. This paper focuses on this dual objective of PCA, namely, dimensionality reduction and decorrelation of features, which requires estimating the eigenvectors of a data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors. The ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or communications overhead of these solutions remain a concern. With an eye towards communications efficiency, this paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the eigenvectors of a data covariance matrix when data are distributed across an undirected and arbitrarily connected network of machines. Furthermore, the proposed algorithm is shown to converge linearly to a neighborhood of the true solution. Numerical results are also shown to demonstrate the efficacy of the proposed solution.
[12]:One vs Previous and Similar Classes Learning -- A Comparative Study
标题:一个班与前一个班及类似班学习的比较研究
作者:Daniel Cauchi, Adrian Muscat
备注:10 pages, 6 figures
链接:https://arxiv.org/abs/2101.01294
摘要:When dealing with multi-class classification problems, it is common practice to build a model consisting of a series of binary classifiers using a learning paradigm which dictates how the classifiers are built and combined to discriminate between the individual classes. As new data enters the system and the model needs updating, these models would often need to be retrained from scratch. This work proposes three learning paradigms which allow trained models to be updated without the need of retraining from scratch. A comparative analysis is performed to evaluate them against a baseline. Results show that the proposed paradigms are faster than the baseline at updating, with two of them being faster at training from scratch as well, especially on larger datasets, while retaining a comparable classification performance.
[13]:GeCo: Quality Counterfactual Explanations in Real Time
标题:GeCo:实时的高质量反事实解释
作者:Maximilian Schleich, Zixuan Geng, Yihong Zhang, Dan Suciu
备注:13 pages, 7 figures, 3 tables, 3 algorithms
链接:https://arxiv.org/abs/2101.01292
摘要:Machine learning is increasingly applied in high-stakes decision making that directly affect people's lives, and this leads to an increased demand for systems to explain their decisions. Explanations often take the form of counterfactuals, which consists of conveying to the end user what she/he needs to change in order to improve the outcome. Computing counterfactual explanations is challenging, because of the inherent tension between a rich semantics of the domain, and the need for real time response. In this paper we present GeCo, the first system that can compute plausible and feasible counterfactual explanations in real time. At its core, GeCo relies on a genetic algorithm, which is customized to favor searching counterfactual explanations with the smallest number of changes. To achieve real-time performance, we introduce two novel optimizations: $\Delta$-representation of candidate counterfactuals, and partial evaluation of the classifier. We compare empirically GeCo against four other systems described in the literature, and show that it is the only system that can achieve both high quality explanations and real time answers.
[14]:Multi-Model Least Squares-Based Recomputation Framework for Large Data Analysis
标题:基于多模型最小二乘法的大数据分析重计算框架
作者:Wandong Zhang, QM Jonathan Wu, Yimin Yang, WG Will Zhao, Hui Zhang
链接:https://arxiv.org/abs/2101.01271
摘要:Most multilayer least squares (LS)-based neural networks are structured with two separate stages: unsupervised feature encoding and supervised pattern classification. Once the unsupervised learning is finished, the latent encoding would be fixed without supervised fine-tuning. However, in complex tasks such as handling the ImageNet dataset, there are often many more clues that can be directly encoded, while the unsupervised learning, by definition cannot know exactly what is useful for a certain task. This serves as the motivation to retrain the latent space representations to learn some clues that unsupervised learning has not yet learned. In particular, the error matrix from the output layer is pulled back to each hidden layer, and the parameters of the hidden layer are recalculated with Moore-Penrose (MP) inverse for more generalized representations. In this paper, a recomputation-based multilayer network using MP inverse (RML-MP) is developed. A sparse RML-MP (SRML-MP) model to boost the performance of RML-MP is then proposed. The experimental results with varying training samples (from 3 K to 1.8 M) show that the proposed models provide better generalization performance than most representation learning algorithms.
[15]:Robust Maximum Entropy Behavior Cloning
标题:鲁棒最大熵行为克隆
作者:Mostafa Hussein, Brendan Crowe, Marek Petrik, Momotaz Begum
备注:NeurIPS 2020 3rd Robot Learning Workshop: Grounding Machine Learning Development in the Real World
链接:https://arxiv.org/abs/2101.01251
摘要:Imitation learning (IL) algorithms use expert demonstrations to learn a specific task. Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set? This may result in poor decision-making performance. We propose a novel general frame-work to directly generate a policy from demonstrations that autonomously detect the adversarial demonstrations and exclude them from the data set. At the same time, it's sample, time-efficient, and does not require a simulator. To model such adversarial demonstration we propose a min-max problem that leverages the entropy of the model to assign weights for each demonstration. This allows us to learn the behavior using only the correct demonstrations or a mixture of correct demonstrations.
[16]:A Priori Generalization Analysis of the Deep Ritz Method for Solving High Dimensional Elliptic Equations
标题:求解高维椭圆型方程的Deep-Ritz方法的先验推广分析
作者:Jianfeng Lu, Yulong Lu, Min Wang
链接:https://arxiv.org/abs/2101.01708
摘要:This paper concerns the a priori generalization analysis of the Deep Ritz Method (DRM) [W. E and B. Yu, 2017], a popular neural-network-based method for solving high dimensional partial differential equations. We derive the generalization error bounds of two-layer neural networks in the framework of the DRM for solving two prototype elliptic PDEs: Poisson equation and static Schrödinger equation on the $d$-dimensional unit hypercube. Specifically, we prove that the convergence rates of generalization errors are independent of the dimension $d$, under the a priori assumption that the exact solutions of the PDEs lie in a suitable low-complexity space called spectral Barron space. Moreover, we give sufficient conditions on the forcing term and the potential function which guarantee that the solutions are spectral Barron functions. We achieve this by developing a new solution theory for the PDEs on the spectral Barron space, which can be viewed as an analog of the classical Sobolev regularity theory for PDEs.
[17]:A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning
标题:社区检测方法综述:从统计建模到深度学习
作者:Di Jin, Zhizhi Yu, Pengfei Jiao, Shirui Pan, Philip S. Yu, Weixiong Zhang
链接:https://arxiv.org/abs/2101.01669
摘要:Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures to help reveal their latent functions. Community detection has been extensively studied in and broadly applied to many real-world network problems. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of prior knowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzed become increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deep learning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack of insightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important for future development of the area of network analysis. In this paper, we develop and present a unified architecture of network community-finding methods to characterize the state-of-the-art of the field of community detection. Specifically, we provide a comprehensive review of the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, namely probabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in the two categories. Furthermore, to promote future development of community detection, we release several benchmark datasets from several problem domains and highlight their applications to various network analysis tasks. We conclude with discussions of the challenges of the field and suggestions of possible directions for future research.
[18]:Radio Frequency Fingerprint Identification for LoRa Using Spectrogram and CNN
标题:基于频谱图和CNN的LoRa射频指纹识别
作者:Guanxiong Shen, Junqing Zhang, Alan Marshall, Linning Peng, Xianbin Wang
备注:Accepted for publication in IEEE INFOCOM 2021
链接:https://arxiv.org/abs/2101.01668
摘要:Radio frequency fingerprint identification (RFFI) is an emerging device authentication technique that relies on intrinsic hardware characteristics of wireless devices. We designed an RFFI scheme for Long Range (LoRa) systems based on spectrogram and convolutional neural network (CNN). Specifically, we used spectrogram to represent the fine-grained time-frequency characteristics of LoRa signals. In addition, we revealed that the instantaneous carrier frequency offset (CFO) is drifting, which will result in misclassification and significantly compromise the system stability; we demonstrated CFO compensation is an effective mitigation. Finally, we designed a hybrid classifier that can adjust CNN outputs with the estimated CFO. The mean value of CFO remains relatively stable, hence it can be used to rule out CNN predictions whose estimated CFO falls out of the range. We performed experiments in real wireless environments using 20 LoRa devices under test (DUTs) and a Universal Software Radio Peripheral (USRP) N210 receiver. By comparing with the IQ-based and FFT-based RFFI schemes, our spectrogram-based scheme can reach the best classification accuracy, i.e., 97.61% for 20 LoRa DUTs.
[19]:Incremental learning with online SVMs on LiDAR sensory data
标题:基于在线支持向量机的LiDAR遥感数据增量学习
作者:Le Dinh Van Khoa, Zhiyuan Chen
备注:This paper has been published at the International Conference on Digital Image and Signal Processing (DISP 2019)At: Oxford, United Kingdom
链接:https://arxiv.org/abs/2101.01667
摘要:The pipelines transmission system is one of the growing aspects, which has existed for a long time in the energy industry. The cost of in-pipe exploration for maintaining service always draws lots of attention in this industry. Normally exploration methods (e.g. Magnetic flux leakage and eddy current) will establish the sensors stationary for each pipe milestone or carry sensors to travel inside the pipe. It makes the maintenance process very difficult due to the massive amount of sensors. One of the solutions is to implement machine learning techniques for the analysis of sensory data. Although SVMs can resolve this issue with kernel trick, the problem is that computing the kernel depends on the data size too. It is because the process can be exaggerated quickly if the number of support vectors becomes really large. Particularly LiDAR spins with an extremely rapid rate and the flow of input data might eventually lead to massive expansion. In our proposed approach, each sample is learned in an instant way and the supported kernel is computed simultaneously. In this research, incremental learning approach with online support vector machines (SVMs) is presented, which aims to deal with LiDAR sensory data only.
[20]:Theory-based Habit Modeling for Enhancing Behavior Prediction
标题:基于理论的行为预测习惯建模
作者:Chao Zhang, Joaquin Vanschoren, Arlette van Wissen, Daniel Lakens, Boris de Ruyter, Wijnand A. IJsselsteijn
链接:https://arxiv.org/abs/2101.01637
摘要:Psychological theories of habit posit that when a strong habit is formed through behavioral repetition, it can trigger behavior automatically in the same environment. Given the reciprocal relationship between habit and behavior, changing lifestyle behaviors (e.g., toothbrushing) is largely a task of breaking old habits and creating new and healthy ones. Thus, representing users' habit strengths can be very useful for behavior change support systems (BCSS), for example, to predict behavior or to decide when an intervention reaches its intended effect. However, habit strength is not directly observable and existing self-report measures are taxing for users. In this paper, built on recent computational models of habit formation, we propose a method to enable intelligent systems to compute habit strength based on observable behavior. The hypothesized advantage of using computed habit strength for behavior prediction was tested using data from two intervention studies, where we trained participants to brush their teeth twice a day for three weeks and monitored their behaviors using accelerometers. Through hierarchical cross-validation, we found that for the task of predicting future brushing behavior, computed habit strength clearly outperformed self-reported habit strength (in both studies) and was also superior to models based on past behavior frequency (in the larger second study). Our findings provide initial support for our theory-based approach of modeling user habits and encourages the use of habit computation to deliver personalized and adaptive interventions.
[21]:CASS: Towards Building a Social-Support Chatbot for Online Health Community
标题:中国社会科学院:为在线健康社区建立一个社会支持聊天机器人
作者:Liuping Wang, Dakuo Wang, Feng Tian, Zhenhui Peng, Xiangmin Fan, Zhan Zhang, Shuai Ma, Mo Yu, Xiaojuan Ma, Hongan Wang
链接:https://arxiv.org/abs/2101.01583
摘要:Chatbots systems, despite their popularity in today's HCI and CSCW research, fall short for one of the two reasons: 1) many of the systems use a rule-based dialog flow, thus they can only respond to a limited number of pre-defined inputs with pre-scripted responses; or 2) they are designed with a focus on single-user scenarios, thus it is unclear how these systems may affect other users or the community. In this paper, we develop a generalizable chatbot architecture (CASS) to provide social support for community members in an online health community. The CASS architecture is based on advanced neural network algorithms, thus it can handle new inputs from users and generate a variety of responses to them. CASS is also generalizable as it can be easily migrate to other online communities. With a follow-up field experiment, CASS is proven useful in supporting individual members who seek emotional support. Our work also contributes to fill the research gap on how a chatbot may influence the whole community's engagement.
[22]:Sequential Choice Bandits with Feedback for Personalizing users' experience
标题:基于反馈的用户个性化体验序贯选择算法
作者:Anshuka Rangi, Massimo Franceschetti, Long Tran-Thanh
链接:https://arxiv.org/abs/2101.01572
摘要:In this work, we study sequential choice bandits with feedback. We propose bandit algorithms for a platform that personalizes users' experience to maximize its rewards. For each action directed to a given user, the platform is given a positive reward, which is a non-decreasing function of the action, if this action is below the user's threshold. Users are equipped with a patience budget, and actions that are above the threshold decrease the user's patience. When all patience is lost, the user abandons the platform. The platform attempts to learn the thresholds of the users in order to maximize its rewards, based on two different feedback models describing the information pattern available to the platform at each action. We define a notion of regret by determining the best action to be taken when the platform knows that the user's threshold is in a given interval. We then propose bandit algorithms for the two feedback models and show that upper and lower bounds on the regret are of the order of $\tilde{O}(N^{2/3})$ and $\tilde\Omega(N^{2/3})$, respectively, where $N$ is the total number of users. Finally, we show that the waiting time of any user before receiving a personalized experience is uniform in $N$.
[23]:Exact solution to the random sequential dynamics of a message passing algorithm
标题:消息传递算法随机序列动力学的精确解
作者:Burak Çakmak, Manfred Opper
备注:5 pages
链接:https://arxiv.org/abs/2101.01571
摘要:We analyze the random sequential dynamics of a message passing algorithm for Ising models with random interactions in the large system limit. We derive exact results for the two-time correlation functions and the speed of convergence. The {\em de Almedia-Thouless} stability criterion of the static problem is found to be necessary and sufficient for the global convergence of the random sequential dynamics.
[24]:"Brilliant AI Doctor" in Rural China: Tensions and Challenges in AI-Powered CDSS Deployment
标题:中国农村的“杰出人工智能医生”:人工智能驱动的CDS部署的紧张与挑战
作者:Dakuo Wang, Liuping Wang, Zhan Zhang, Ding Wang, Haiyi Zhu, Yvonne Gao, Xiangmin Fan, Feng Tian
链接:https://arxiv.org/abs/2101.01524
摘要:Artificial intelligence (AI) technology has been increasingly used in the implementation of advanced Clinical Decision Support Systems (CDSS). Research demonstrated the potential usefulness of AI-powered CDSS (AI-CDSS) in clinical decision making scenarios. However, post-adoption user perception and experience remain understudied, especially in developing countries. Through observations and interviews with 22 clinicians from 6 rural clinics in China, this paper reports the various tensions between the design of an AI-CDSS system (``Brilliant Doctor'') and the rural clinical context, such as the misalignment with local context and workflow, the technical limitations and usability barriers, as well as issues related to transparency and trustworthiness of AI-CDSS. Despite these tensions, all participants expressed positive attitudes toward the future of AI-CDSS, especially acting as ``a doctor's AI assistant'' to realize a Human-AI Collaboration future in clinical settings. Finally we draw on our findings to discuss implications for designing AI-CDSS interventions for rural clinical contexts in developing countries.
[25]:Handling Hard Affine SDP Shape Constraints in RKHSs
标题:RKHSs中硬仿射SDP形状约束的处理
作者:Pierre-Cyril Aubin-Frankowski, Zoltan Szabo
链接:https://arxiv.org/abs/2101.01519
摘要:Shape constraints, such as non-negativity, monotonicity, convexity or supermodularity, play a key role in various applications of machine learning and statistics. However, incorporating this side information into predictive models in a hard way (for example at all points of an interval) for rich function classes is a notoriously challenging problem. We propose a unified and modular convex optimization framework, relying on second-order cone (SOC) tightening, to encode hard affine SDP constraints on function derivatives, for models belonging to vector-valued reproducing kernel Hilbert spaces (vRKHSs). The modular nature of the proposed approach allows to simultaneously handle multiple shape constraints, and to tighten an infinite number of constraints into finitely many. We prove the consistency of the proposed scheme and that of its adaptive variant, leveraging geometric properties of vRKHSs. The efficiency of the approach is illustrated in the context of shape optimization, safety-critical control and econometrics.
[26]:Structured Machine Learning Tools for Modelling Characteristics of Guided Waves
标题:用于导波特性建模的结构化机器学习工具
作者:Marcus Haywood-Alexander, Nikolaos Dervilis, Keith Worden, Elizabeth J. Cross, Robin S. Mills, Timothy J. Rogers
备注:33 pages, 11 figures
链接:https://arxiv.org/abs/2101.01506
摘要:The use of ultrasonic guided waves to probe the materials/structures for damage continues to increase in popularity for non-destructive evaluation (NDE) and structural health monitoring (SHM). The use of high-frequency waves such as these offers an advantage over low-frequency methods from their ability to detect damage on a smaller scale. However, in order to assess damage in a structure, and implement any NDE or SHM tool, knowledge of the behaviour of a guided wave throughout the material/structure is important (especially when designing sensor placement for SHM systems). Determining this behaviour is extremely diffcult in complex materials, such as fibre-matrix composites, where unique phenomena such as continuous mode conversion takes place. This paper introduces a novel method for modelling the feature-space of guided waves in a composite material. This technique is based on a data-driven model, where prior physical knowledge can be used to create structured machine learning tools; where constraints are applied to provide said structure. The method shown makes use of Gaussian processes, a full Bayesian analysis tool, and in this paper it is shown how physical knowledge of the guided waves can be utilised in modelling using an ML tool. This paper shows that through careful consideration when applying machine learning techniques, more robust models can be generated which offer advantages such as extrapolation ability and physical interpretation.
[27]:Delayed Projection Techniques for Linearly Constrained Problems: Convergence Rates, Acceleration, and Applications
标题:线性约束问题的延迟投影技术:收敛速度、加速及应用
作者:Xiang Li, Zhihua Zhang
链接:https://arxiv.org/abs/2101.01505
摘要:In this work, we study a novel class of projection-based algorithms for linearly constrained problems (LCPs) which have a lot of applications in statistics, optimization, and machine learning. Conventional primal gradient-based methods for LCPs call a projection after each (stochastic) gradient descent, resulting in that the required number of projections equals that of gradient descents (or total iterations). Motivated by the recent progress in distributed optimization, we propose the delayed projection technique that calls a projection once for a while, lowering the projection frequency and improving the projection efficiency. Accordingly, we devise a series of stochastic methods for LCPs using the technique, including a variance reduced method and an accelerated one. We theoretically show that it is feasible to improve projection efficiency in both strongly convex and generally convex cases. Our analysis is simple and unified and can be easily extended to other methods using delayed projections. When applying our new algorithms to federated optimization, a newfangled and privacy-preserving subfield in distributed optimization, we obtain not only a variance reduced federated algorithm with convergence rates better than previous works, but also the first accelerated method able to handle data heterogeneity inherent in federated optimization.
[28]:Weight-of-evidence 2.0 with shrinkage and spline-binning
标题:证据权重2.0,带收缩和样条装箱
作者:Jakob Raymaekers, Wouter Verbeke, Tim Verdonck
链接:https://arxiv.org/abs/2101.01494
摘要:In many practical applications, such as fraud detection, credit risk modeling or medical decision making, classification models for assigning instances to a predefined set of classes are required to be both precise as well as interpretable. Linear modeling methods such as logistic regression are often adopted, since they offer an acceptable balance between precision and interpretability. Linear methods, however, are not well equipped to handle categorical predictors with high-cardinality or to exploit non-linear relations in the data. As a solution, data preprocessing methods such as weight-of-evidence are typically used for transforming the predictors. The binning procedure that underlies the weight-of-evidence approach, however, has been little researched and typically relies on ad-hoc or expert driven procedures. The objective in this paper, therefore, is to propose a formalized, data-driven and powerful method.
To this end, we explore the discretization of continuous variables through the binning of spline functions, which allows for capturing non-linear effects in the predictor variables and yields highly interpretable predictors taking only a small number of discrete values. Moreover, we extend upon the weight-of-evidence approach and propose to estimate the proportions using shrinkage estimators. Together, this offers an improved ability to exploit both non-linear and categorical predictors for achieving increased classification precision, while maintaining interpretability of the resulting model and decreasing the risk of overfitting.
We present the results of a series of experiments in a fraud detection setting, which illustrate the effectiveness of the presented approach. We facilitate reproduction of the presented results and adoption of the proposed approaches by providing both the dataset and the code for implementing the experiments and the presented approach.
[29]:Convergence and finite sample approximations of entropic regularized Wasserstein distances in Gaussian and RKHS settings
标题:高斯和RKHS环境下熵正则化Wasserstein距离的收敛性和有限样本逼近
作者:Minh Ha Quang
链接:https://arxiv.org/abs/2101.01429
摘要:This work studies the convergence and finite sample approximations of entropic regularized Wasserstein distances in the Hilbert space setting. Our first main result is that for Gaussian measures on an infinite-dimensional Hilbert space, convergence in the 2-Sinkhorn divergence is {\it strictly weaker} than convergence in the exact 2-Wasserstein distance. Specifically, a sequence of centered Gaussian measures converges in the 2-Sinkhorn divergence if the corresponding covariance operators converge in the Hilbert-Schmidt norm. This is in contrast to the previous known result that a sequence of centered Gaussian measures converges in the exact 2-Wasserstein distance if and only if the covariance operators converge in the trace class norm. In the reproducing kernel Hilbert space (RKHS) setting, the {\it kernel Gaussian-Sinkhorn divergence}, which is the Sinkhorn divergence between Gaussian measures defined on an RKHS, defines a semi-metric on the set of Borel probability measures on a Polish space, given a characteristic kernel on that space. With the Hilbert-Schmidt norm convergence, we obtain {\it dimension-independent} convergence rates for finite sample approximations of the kernel Gaussian-Sinkhorn divergence, with the same order as the Maximum Mean Discrepancy. These convergence rates apply in particular to Sinkhorn divergence between Gaussian measures on Euclidean and infinite-dimensional Hilbert spaces. The sample complexity for the 2-Wasserstein distance between Gaussian measures on Euclidean space, while dimension-dependent and larger than that of the Sinkhorn divergence, is exponentially faster than the worst case scenario in the literature.
[30]:A Symmetric Loss Perspective of Reliable Machine Learning
标题:可靠机器学习的对称损失视角
作者:Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama
备注:Preprint of an Invited Review Article
链接:https://arxiv.org/abs/2101.01366
摘要:When minimizing the empirical risk in binary classification, it is a common practice to replace the zero-one loss with a surrogate loss to make the learning objective feasible to optimize. Examples of well-known surrogate losses for binary classification include the logistic loss, hinge loss, and sigmoid loss. It is known that the choice of a surrogate loss can highly influence the performance of the trained classifier and therefore it should be carefully chosen. Recently, surrogate losses that satisfy a certain symmetric condition (aka., symmetric losses) have demonstrated their usefulness in learning from corrupted labels. In this article, we provide an overview of symmetric losses and their applications. First, we review how a symmetric loss can yield robust classification from corrupted labels in balanced error rate (BER) minimization and area under the receiver operating characteristic curve (AUC) maximization. Then, we demonstrate how the robust AUC maximization method can benefit natural language processing in the problem where we want to learn only from relevant keywords and unlabeled documents. Finally, we conclude this article by discussing future directions, including potential applications of symmetric losses for reliable machine learning and the design of non-symmetric losses that can benefit from the symmetric condition.
[31]:Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition
标题:多语种语音识别中的固定镜头分类
作者:Anugunj Naman, Liliana Mancini
备注:Code atthis https URL
链接:https://arxiv.org/abs/2101.01356
摘要:In this paper, we analyze the feasibility of applying few-shot learning to speech emotion recognition task (SER). The current speech emotion recognition models work exceptionally well but fail when then input is multilingual. Moreover, when training such models, the models' performance is suitable only when the training corpus is vast. This availability of a big training corpus is a significant problem when choosing a language that is not much popular or obscure. We attempt to solve this challenge of multilingualism and lack of available data by turning this problem into a few-shot learning problem. We suggest relaxing the assumption that all N classes in an N-way K-shot problem be new and define an N+F way problem where N and F are the number of emotion classes and predefined fixed classes, respectively. We propose this modification to the Model-Agnostic MetaLearning (MAML) algorithm to solve the problem and call this new model F-MAML. This modification performs better than the original MAML and outperforms on EmoFilm dataset.
[32]:Effcient Projection Onto the Nonconvex $\ell_p$-ball
标题:非凸球的有效投影
作者:Hao Wang, Xiangyu Yang, Jiashan Wang
备注:This work has been submitted and may be published. Copyright may be transferred without notice, after which this version may no longer be accessible
链接:https://arxiv.org/abs/2101.01350
摘要:This paper primarily focuses on computing the Euclidean projection of a vector onto the $\ell_{p}$-ball with $p\in(0,1)$. Such a problem emerges as the core building block in many signal processing and machine learning applications because of its ability to promote sparsity, yet it is challenging to solve due to its nonconvex and nonsmooth nature. First-order necessary optimality conditions of this problem are derived using Fréchet normal cone. We develop a novel numerical approach for computing the stationary point through solving a sequence of projections onto the reweighted $\ell_{1}$-balls. This method is shown to converge uniquely under mild conditions and has a worst-case $O(1/\sqrt{k})$ convergence rate. Numerical experiments demonstrate the efficiency of our proposed algorithm.
[33]:Practical Blind Membership Inference Attack via Differential Comparisons
标题:基于差分比较的实用盲隶属度推理攻击
作者:Bo Hui, Yuchen Yang, Haolin Yuan, Philippe Burlina, Neil Zhenqiang Gong, Yinzhi Cao
链接:https://arxiv.org/abs/2101.01341
摘要:Membership inference (MI) attacks affect user privacy by inferring whether given data samples have been used to train a target learning model, e.g., a deep neural network. There are two types of MI attacks in the literature, i.e., these with and without shadow models. The success of the former heavily depends on the quality of the shadow model, i.e., the transferability between the shadow and the target; the latter, given only blackbox probing access to the target model, cannot make an effective inference of unknowns, compared with MI attacks using shadow models, due to the insufficient number of qualified samples labeled with ground truth membership information.
In this paper, we propose an MI attack, called BlindMI, which probes the target model and extracts membership semantics via a novel approach, called differential comparison. The high-level idea is that BlindMI first generates a dataset with nonmembers via transforming existing samples into new samples, and then differentially moves samples from a target dataset to the generated, non-member set in an iterative manner. If the differential move of a sample increases the set distance, BlindMI considers the sample as non-member and vice versa.
BlindMI was evaluated by comparing it with state-of-the-art MI attack algorithms. Our evaluation shows that BlindMI improves F1-score by nearly 20\% when compared to state-of-the-art on some datasets, such as Purchase-50 and Birds-200, in the blind setting where the adversary does not know the target model's architecture and the target dataset's ground truth labels. We also show that BlindMI can defeat state-of-the-art defenses. % under a reasonable privacy-to-utility budget.
[34]:Stochastic Optimization for Vaccine and Testing Kit Allocation for the COVID-19 Pandemic
标题:COVID-19大流行疫苗和试剂盒分配的随机优化
作者:Lawrence Thul, Warren Powell
链接:https://arxiv.org/abs/2101.01204
摘要:The pandemic caused by the SARS-CoV-2 virus has exposed many flaws in the decision-making strategies used to distribute resources to combat global health crises. In this paper, we leverage reinforcement learning and optimization to improve upon the allocation strategies for various resources. In particular, we consider a problem where a central controller must decide where to send testing kits to learn about the uncertain states of the world (active learning); then, use the new information to construct beliefs about the states and decide where to allocate resources. We propose a general model coupled with a tunable lookahead policy for making vaccine allocation decisions without perfect knowledge about the state of the world. The lookahead policy is compared to a population-based myopic policy which is more likely to be similar to the present strategies in practice. Each vaccine allocation policy works in conjunction with a testing kit allocation policy to perform active learning. Our simulation results demonstrate that an optimization-based lookahead decision making strategy will outperform the presented myopic policy.
[35]:Control of Stochastic Quantum Dynamics with Differentiable Programming
标题:随机量子动力学的可微规划控制
作者:Frank Schäfer, Pavel Sekatski, Martin Koppenhöfer, Christoph Bruder, Michal Kloc
备注:18+16 pages, 5+2 figures
链接:https://arxiv.org/abs/2101.01190
摘要:Controlling stochastic dynamics of a quantum system is an indispensable task in fields such as quantum information processing and metrology. Yet, there is no general ready-made approach to design efficient control strategies. Here, we propose a framework for the automated design of control schemes based on differentiable programming ($\partial \mathrm{P}$). We apply this approach to state preparation and stabilization of a qubit subjected to homodyne detection. To this end, we formulate the control task as an optimization problem where the loss function quantifies the distance from the target state and we employ neural networks (NNs) as controllers. The system's time evolution is governed by a stochastic differential equation (SDE). To implement efficient training, we backpropagate the gradient information from the loss function through the SDE solver using adjoint sensitivity methods. As a first example, we feed the quantum state to the controller and focus on different methods to obtain gradients. As a second example, we directly feed the homodyne detection signal to the controller. The instantaneous value of the homodyne current contains only very limited information on the actual state of the system, covered in unavoidable photon-number fluctuations. Despite the resulting poor signal-to-noise ratio, we can train our controller to prepare and stabilize the qubit to a target state with a mean fidelity around 85%. We also compare the solutions found by the NN to a hand-crafted control strategy.
[36]:CNN-Driven Quasiconformal Model for Large Deformation Image Registration
标题:CNN驱动的准共形大变形图像配准模型
作者:Ho Law, Gary P. T. Choi, Ka Chun Lam, Lok Ming Lui
链接:https://arxiv.org/abs/2011.00731
摘要:Image registration has been widely studied over the past several decades, with numerous applications in science, engineering and medicine. Most of the conventional mathematical models for large deformation image registration rely on prescribed landmarks, which usually require tedious manual labeling and are prone to error. In recent years, there has been a surge of interest in the use of machine learning for image registration. However, most learning-based methods cannot ensure the bijectivity of the registration, which makes it difficult to establish a 1-1 correspondence between the images. In this paper, we develop a novel method for large deformation image registration by a fusion of convolutional neural network (CNN) and quasiconformal theory. More specifically, we propose a new fidelity term for incorporating the CNN features in our quasiconformal energy minimization model, which enables us to obtain meaningful registration results without prescribing any landmarks. Moreover, unlike other learning-based methods, the bijectivity of our method is guaranteed by quasiconformal theory. Experimental results are presented to demonstrate the effectiveness of the proposed method. More broadly, our work sheds light on how rigorous mathematical theories and practical machine learning approaches can be integrated for developing computational methods with improved performance.
CV方向重复(10篇)
[1]:Robust R-Peak Detection in Low-Quality Holter ECGs using 1D Convolutional Neural Network
标题:一维卷积神经网络在低质量Holter心电图R峰检测中的应用
作者:Muhammad Uzair Zahid, Serkan Kiranyaz, Turker Ince, Ozer Can Devecioglu, Muhammad E. H. Chowdhury, Amith Khandakar, Anas Tahir, Moncef Gabbouj
链接:https://arxiv.org/abs/2101.01666
摘要:Noise and low quality of ECG signals acquired from Holter or wearable devices deteriorate the accuracy and robustness of R-peak detection algorithms. This paper presents a generic and robust system for R-peak detection in Holter ECG signals. While many proposed algorithms have successfully addressed the problem of ECG R-peak detection, there is still a notable gap in the performance of these detectors on such low-quality ECG records. Therefore, in this study, a novel implementation of the 1D Convolutional Neural Network (CNN) is used integrated with a verification model to reduce the number of false alarms. This CNN architecture consists of an encoder block and a corresponding decoder block followed by a sample-wise classification layer to construct the 1D segmentation map of R- peaks from the input ECG signal. Once the proposed model has been trained, it can solely be used to detect R-peaks possibly in a single channel ECG data stream quickly and accurately, or alternatively, such a solution can be conveniently employed for real-time monitoring on a lightweight portable device. The model is tested on two open-access ECG databases: The China Physiological Signal Challenge (2020) database (CPSC-DB) with more than one million beats, and the commonly used MIT-BIH Arrhythmia Database (MIT-DB). Experimental results demonstrate that the proposed systematic approach achieves 99.30% F1-score, 99.69% recall, and 98.91% precision in CPSC-DB, which is the best R-peak detection performance ever achieved. Compared to all competing methods, the proposed approach can reduce the false-positives and false-negatives in Holter ECG signals by more than 54% and 82%, respectively. Results also demonstrate similar or better performance than most competing algorithms on MIT-DB with 99.83% F1-score, 99.85% recall, and 99.82% precision.
[2]:Learning the Predictability of the Future
标题:学习未来的可预测性
作者:Dídac Surís, Ruoshi Liu, Carl Vondrick
备注:Website:this https URL
链接:https://arxiv.org/abs/2101.01600
摘要:We introduce a framework for learning from unlabeled video what is predictable in the future. Instead of committing up front to features to predict, our approach learns from data which features are predictable. Based on the observation that hyperbolic geometry naturally and compactly encodes hierarchical structure, we propose a predictive model in hyperbolic space. When the model is most confident, it will predict at a concrete level of the hierarchy, but when the model is not confident, it learns to automatically select a higher level of abstraction. Experiments on two established datasets show the key role of hierarchical representations for action prediction. Although our representation is trained with unlabeled video, visualizations show that action hierarchies emerge in the representation.
[3]:Density Compensated Unrolled Networks for Non-Cartesian MRI Reconstruction
标题:密度补偿展开网络在非笛卡尔磁共振重建中的应用
作者:Zaccharie Ramzi, Philippe Ciuciu, Jean-Luc Starck
链接:https://arxiv.org/abs/2101.01570
摘要:Deep neural networks have recently been thoroughly investigated as a powerful tool for MRI reconstruction. There is a lack of research however regarding their use for a specific setting of MRI, namely non-Cartesian acquisitions. In this work, we introduce a novel kind of deep neural networks to tackle this problem, namely density compensated unrolled neural networks. We assess their efficiency on the publicly available fastMRI dataset, and perform a small ablation study. We also open source our code, in particular a Non-Uniform Fast Fourier transform for TensorFlow.
[4]:On the Control of Attentional Processes in Vision
标题:视觉注意过程的控制
作者:John K. Tsotsos, Omar Abid, Iuliia Kotseruba, Markus D. Solbach
链接:https://arxiv.org/abs/2101.01533
摘要:The study of attentional processing in vision has a long and deep history. Recently, several papers have presented insightful perspectives into how the coordination of multiple attentional functions in the brain might occur. These begin with experimental observations and the authors propose structures, processes, and computations that might explain those observations. Here, we consider a perspective that past works have not, as a complementary approach to the experimentally-grounded ones. We approach the same problem as past authors but from the other end of the computational spectrum, from the problem nature, as Marr's Computational Level would prescribe. What problem must the brain solve when orchestrating attentional processes in order to successfully complete one of the myriad possible visuospatial tasks at which we as humans excel? The hope, of course, is for the approaches to eventually meet and thus form a complete theory, but this is likely not soon. We make the first steps towards this by addressing the necessity of attentional control, examining the breadth and computational difficulty of the visuospatial and attentional tasks seen in human behavior, and suggesting a sketch of how attentional control might arise in the brain. The key conclusions of this paper are that an executive controller is necessary for human attentional function in vision, and that there is a 'first principles' computational approach to its understanding that is complementary to the previous approaches that focus on modelling or learning from experimental observations directly.
[5]:WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection
标题:WildDeepfake:一个具有挑战性的Deepfake检测真实世界数据集
作者:Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, Yu-Gang Jiang
链接:https://arxiv.org/abs/2101.01456
摘要:In recent years, the abuse of a face swap technique called deepfake Deepfake has raised enormous public concerns. So far, a large number of deepfake videos (known as "deepfakes") have been crafted and uploaded to the internet, calling for effective countermeasures. One promising countermeasure against deepfakes is deepfake detection. Several deepfake datasets have been released to support the training and testing of deepfake detectors, such as DeepfakeDetection and FaceForensics++. While this has greatly advanced deepfake detection, most of the real videos in these datasets are filmed with a few volunteer actors in limited scenes, and the fake videos are crafted by researchers using a few popular deepfake softwares. Detectors developed on these datasets may become less effective against real-world deepfakes on the internet. To better support detection against real-world deepfakes, in this paper, we introduce a new dataset WildDeepfake, which consists of 7,314 face sequences extracted from 707 deepfake videos collected completely from the internet. WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes. We conduct a systematic evaluation of a set of baseline detection networks on both existing and our WildDeepfake datasets, and show that WildDeepfake is indeed a more challenging dataset, where the detection performance can decrease drastically. We also propose two (eg. 2D and 3D) Attention-based Deepfake Detection Networks (ADDNets) to leverage the attention masks on real/fake faces for improved detection. We empirically verify the effectiveness of ADDNets on both existing datasets and WildDeepfake. The dataset is available at:this https URL.
[6]:Support Vector Machine and YOLO for a Mobile Food Grading System
标题:基于支持向量机和YOLO的移动食品分级系统
作者:Lili Zhu, Petros Spachos
链接:https://arxiv.org/abs/2101.01418
摘要:Food quality and safety are of great concern to society since it is an essential guarantee not only for human health but also for social development, and stability. Ensuring food quality and safety is a complex process. All food processing stages should be considered, from cultivating, harvesting and storage to preparation and consumption. Grading is one of the essential processes to control food quality. This paper proposed a mobile visual-based system to evaluate food grading. Specifically, the proposed system acquires images of bananas when they are on moving conveyors. A two-layer image processing system based on machine learning is used to grade bananas, and these two layers are allocated on edge devices and cloud servers, respectively. Support Vector Machine (SVM) is the first layer to classify bananas based on an extracted feature vector composed of color and texture features. Then, the a You Only Look Once (YOLO) v3 model further locating the peel's defected area and determining if the inputs belong to the mid-ripened or well-ripened class. According to experimental results, the first layer's performance achieved an accuracy of 98.5% while the accuracy of the second layer is 85.7%, and the overall accuracy is 96.4%.
[7]:Relaxed Conditional Image Transfer for Semi-supervised Domain Adaptation
标题:半监督域自适应的松弛条件图像传输
作者:Qijun Luo, Zhili Liu, Lanqing Hong, Chongxuan Li, Kuo Yang, Liyuan Wang, Fengwei Zhou, Guilin Li, Zhenguo Li, Jun Zhu
链接:https://arxiv.org/abs/2101.01400
摘要:Semi-supervised domain adaptation (SSDA), which aims to learn models in a partially labeled target domain with the assistance of the fully labeled source domain, attracts increasing attention in recent years. To explicitly leverage the labeled data in both domains, we naturally introduce a conditional GAN framework to transfer images without changing the semantics in SSDA. However, we identify a label-domination problem in such an approach. In fact, the generator tends to overlook the input source image and only memorizes prototypes of each class, which results in unsatisfactory adaptation performance. To this end, we propose a simple yet effective Relaxed conditional GAN (Relaxed cGAN) framework. Specifically, we feed the image without its label to our generator. In this way, the generator has to infer the semantic information of input data. We formally prove that its equilibrium is desirable and empirically validate its practical convergence and effectiveness in image transfer. Additionally, we propose several techniques to make use of unlabeled data in the target domain, enhancing the model in SSDA settings. We validate our method on the well-adopted datasets: Digits, DomainNet, and Office-Home. We achieve state-of-the-art performance on DomainNet, Office-Home and most digit benchmarks in low-resource and high-resource settings.
[8]:Understanding the Ability of Deep Neural Networks to Count Connected Components in Images
标题:理解深层神经网络计算图像中连通成分的能力
作者:Shuyue Guan, Murray Loew
备注:7 pages, 12 figures. Accepted by IEEE AIPR 2020 (Oral)
链接:https://arxiv.org/abs/2101.01386
摘要:Humans can count very fast by subitizing, but slow substantially as the number of objects increases. Previous studies have shown a trained deep neural network (DNN) detector can count the number of objects in an amount of time that increases slowly with the number of objects. Such a phenomenon suggests the subitizing ability of DNNs, and unlike humans, it works equally well for large numbers. Many existing studies have successfully applied DNNs to object counting, but few studies have studied the subitizing ability of DNNs and its interpretation. In this paper, we found DNNs do not have the ability to generally count connected components. We provided experiments to support our conclusions and explanations to understand the results and phenomena of these experiments. We proposed three ML-learnable characteristics to verify learnable problems for ML models, such as DNNs, and explain why DNNs work for specific counting problems but cannot generally count connected components.
[9]:Semantic Video Segmentation for Intracytoplasmic Sperm Injection Procedures
标题:胞浆内单精子注射过程的语义视频分割
作者:Peter He, Raksha Jain, Jérôme Chambost, Céline Jacques, Cristina Hickman
备注:Accepted at the 'Medical Imaging meets NeurIPS Workshop' at the 34th Conference on Neural Information Processing Systems
链接:https://arxiv.org/abs/2101.01207
摘要:We present the first deep learning model for the analysis of intracytoplasmic sperm injection (ICSI) procedures. Using a dataset of ICSI procedure videos, we train a deep neural network to segment key objects in the videos achieving a mean IoU of 0.962, and to localize the needle tip achieving a mean pixel error of 3.793 pixels at 14 FPS on a single GPU. We further analyze the variation between the dataset's human annotators and find the model's performance to be comparable to human experts.
[10]:Advances in Electron Microscopy with Deep Learning
标题:电子显微镜与深度学习的进展
作者:Jeffrey M. Ede
备注:295 pages, phd thesis, 100 figures + 12 tables, papers are compressed
链接:https://arxiv.org/abs/2101.01178
摘要:This doctoral thesis covers some of my advances in electron microscopy with deep learning. Highlights include a comprehensive review of deep learning in electron microscopy; large new electron microscopy datasets for machine learning, dataset search engines based on variational autoencoders, and automatic data clustering by t-distributed stochastic neighbour embedding; adaptive learning rate clipping to stabilize learning; generative adversarial networks for compressed sensing with spiral, uniformly spaced and other fixed sparse scan paths; recurrent neural networks trained to piecewise adapt sparse scan paths to specimens by reinforcement learning; improving signal-to-noise; and conditional generative adversarial networks for exit wavefunction reconstruction from single transmission electron micrographs. This thesis adds to my publications by presenting their relationships, reflections, and holistic conclusions. This copy of my thesis is typeset for online dissemination to improve readability, whereas the thesis submitted to the University of Warwick in support of my application for the degree of Doctor of Philosophy in Physics will be typeset for physical printing and binding.
NLP方向重复(2篇)
[1]:Local Translation Services for Neglected Languages
标题:为被忽视的语言提供当地翻译服务
作者:David Noever, Josh Kalin, Matt Ciolino, Dom Hambrick, Gerry Dozier
链接:https://arxiv.org/abs/2101.01628
摘要:Taking advantage of computationally lightweight, but high-quality translators prompt consideration of new applications that address neglected languages. Locally run translators for less popular languages may assist data projects with protected or personal data that may require specific compliance checks before posting to a public translation API, but which could render reasonable, cost-effective solutions if done with an army of local, small-scale pair translators. Like handling a specialist's dialect, this research illustrates translating two historically interesting, but obfuscated languages: 1) hacker-speak ("l33t") and 2) reverse (or "mirror") writing as practiced by Leonardo da Vinci. The work generalizes a deep learning architecture to translatable variants of hacker-speak with lite, medium, and hard vocabularies. The original contribution highlights a fluent translator of hacker-speak in under 50 megabytes and demonstrates a generator for augmenting future datasets with greater than a million bilingual sentence pairs. The long short-term memory, recurrent neural network (LSTM-RNN) extends previous work demonstrating an English-to-foreign translation service built from as little as 10,000 bilingual sentence pairs. This work further solves the equivalent translation problem in twenty-six additional (non-obfuscated) languages and rank orders those models and their proficiency quantitatively with Italian as the most successful and Mandarin Chinese as the most challenging. For neglected languages, the method prototypes novel services for smaller niche translations such as Kabyle (Algerian dialect) which covers between 5-7 million speakers but one which for most enterprise translators, has not yet reached development. One anticipates the extension of this approach to other important dialects, such as translating technical (medical or legal) jargon and processing health records.
[2]:Integration of Domain Knowledge using Medical Knowledge Graph Deep Learning for Cancer Phenotyping
标题:基于医学知识图的领域知识整合癌症表型的深度学习
作者:Mohammed Alawad, Shang Gao, Mayanka Chandra Shekar, S.M.Shamimul Hasan, J. Blair Christian, Xiao-Cheng Wu, Eric B. Durbin, Jennifer Doherty, Antoinette Stroup, Linda Coyle, Lynne Penberthy, Georgia Tourassi
链接:https://arxiv.org/abs/2101.01337
摘要:A key component of deep learning (DL) for natural language processing (NLP) is word embeddings. Word embeddings that effectively capture the meaning and context of the word that they represent can significantly improve the performance of downstream DL models for various NLP tasks. Many existing word embeddings techniques capture the context of words based on word co-occurrence in documents and text; however, they often cannot capture broader domain-specific relationships between concepts that may be crucial for the NLP task at hand. In this paper, we propose a method to integrate external knowledge from medical terminology ontologies into the context captured by word embeddings. Specifically, we use a medical knowledge graph, such as the unified medical language system (UMLS), to find connections between clinical terms in cancer pathology reports. This approach aims to minimize the distance between connected clinical concepts. We evaluate the proposed approach using a Multitask Convolutional Neural Network (MT-CNN) to extract six cancer characteristics -- site, subsite, laterality, behavior, histology, and grade -- from a dataset of ~900K cancer pathology reports. The results show that the MT-CNN model which uses our domain informed embeddings outperforms the same MT-CNN using standard word2vec embeddings across all tasks, with an improvement in the overall micro- and macro-F1 scores by 4.97\%and 22.5\%, respectively.
中文来自机器翻译,仅供参考。