Multi-label && Multi-label classification

Posted 2020-11-19 ranjiewen

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Multi-label && Multi-label classification相关的知识，希望对你有一定的参考价值。

Multi-label classification with Keras

In today’s blog post you learned how to perform multi-label classification with Keras.

Performing multi-label classification with Keras is straightforward and includes two primary steps:

Replace the softmax activation at the end of your network with a sigmoid activation
Swap out categorical cross-entropy for binary cross-entropy for your loss function

From there you can train your network as you normally would.

The end result of applying the process above is a multi-class classifier.

You can use your Keras multi-class classifier to predict multiple labels with just a singleforward pass.

However, there is a difficulty you need to consider:

You need training data for each combination of categories you would like to predict.

Just like a neural network cannot predict classes it was never trained on, your neural network cannot predict multiple class labels for combinations it has never seen. The reason for this behavior is due to activations of neurons inside the network.

If your network is trained on examples of both (1) black pants and (2) red shirts and now you want to predict “red pants” (where there are no “red pants” images in your dataset), the neurons responsible for detecting “red” and “pants” will fire, but since the network has never seen this combination of data/activations before once they reach the fully-connected layers, your output predictions will very likely be incorrect (i.e., you may encounter “red” or “pants”but very unlikely both).

Again, your network cannot correctly make predictions on data it was never trained on(and you shouldn’t expect it to either). Keep this caveat in mind when training your own Keras networks for multi-label classification.

多类分类(multiclass classification)学习的分类器旨在对一个新的实例指定唯一的分类类别,常用的策略有两类:基于后验概率或距离一次给出所有类别的度量,选择度量值最大的类别作为预测类别;将多类分类分解为许多二元分类问题,然后组合所有二元分类的结果。

多标签分类(multilabel classification)分类器给一个新的实例指定多个类别。这个分类模型有很广泛的实际应用,如:一个文档可能同时属于多个分类;一个蛋白质可能具有多个功能。并且,多个标签之间可能存在一定的依赖或约束关系,如蛋白质的所有功能组成的Go(gene ontology)。这个依赖或约束关系具有层次特性,经常可以描述为树或有向无环图结构,机器学习社团称之为层次多标签分类。由于模型的输出具有层次结构,因此层次多标签分类又属于另外一个近来非常活跃的研究领域:结构预测。层次多标签分类和结构预测都是崭新的、富有挑战性的研究领域。
使用scikit-learn实现多类别及多标签分类算法。多标签分类格式 :对于多标签分类问题而言,一个样本可能同时属于多个类别。如一个新闻属于多个话题。这种情况下,因变量yy需要使用一个矩阵表达出来。而多类别分类指的是y的可能取值大于2,但是y所属类别是唯一的。它与多标签分类问题是有严格区别的。所有的scikit-learn分类器都是默认支持多类别分类的。但是,当你需要自己修改算法的时候,也是可以使用scikit-learn实现多类别分类的前期数据准备的。多类别或多标签分类问题,有两种构建分类器的策略:One-vs-All及One-vs-One。

多类分类（Multiclass Classification）

一个样本属于且只属于多个类中的一个，一个样本只能属于一个类，不同类之间是互斥的。

典型方法：

One-vs-All or One-vs.-rest：

将多类问题分成N个二类分类问题，训练N个二类分类器，对第i个类来说，所有属于第i个类的样本为正（positive）样本，其他样本为负（negative）样本，每个二类分类器将属于i类的样本从其他类中分离出来。

one-vs-one or All-vs-All：

训练出N(N-1)个二类分类器，每个分类器区分一对类(i,j)。

多标签分类(multilabel classification)

又称，多标签学习、多标记学习，不同于多类分类，一个样本可以属于多个类别（或标签），不同类之间是有关联的。

典型方法

问题转换方法

问题转换方法的核心是“改造样本数据使其适应现有学习算法”。该类方法的思路是通过处理多标记训练样本，使其适应现有的学习算法，也就是将多标记学习问题转换为现有的学习问题进行求解。

代表性学习算法有一阶方法Binary Relevance，该方法将多标记学习问题转化为“二类分类( binary classification )”问题求解；二阶方法Calibrated Label Ranking，该方法将多标记学习问题转化为“标记排序( labelranking )问题求解；高阶方法Random k-labelset，该方法将多标记学习问题转化为“多类分类(Multiclass classification)”问题求解。

算法适应方法

算法适应方法的核心是“改造现有的单标记学习算法使其适应多标记数据”。该类方法的基本思想是通过对传统的机器学习方法的改进，使其能够解决多标记问题。

代表性学习算法有一阶方法ML-kNN}，该方法将“惰性学习(lazy learning )”算法k近邻进行改造以适应多标记数据；二阶方法Rank-SVM，该方法将“核学习(kernel learning )”算法SVM进行改造以适应多标记数据；高阶方法LEAD，该方法将“贝叶斯学习(Bayes learning)算法”Bayes网络进行改造以适应多标记数据。