RASA-意图识别组件Classifier
Posted Hank0317
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了RASA-意图识别组件Classifier相关的知识,希望对你有一定的参考价值。
MitieIntentClassifier
该分类器主要是使用 MitieNLP 进行意图分类。底层分类器使用具有稀疏线性内核的多类线性 SVM(参见MITIE 训练代码train_text_categorizer_classifier中的函数 )。输入就是单论用户的对话文本,输出就是单论用户的意图类型,并且带有置信度。
"intent": "name": "greet", "confidence": 0.98343
注意:这个分类器不依赖于任何特征化器,因为它自己提取特征。
配置如下:
pipeline:
- name: "MitieIntentClassifier"
LogisticRegressionClassifier
该分类器使用 scikit-learn 的逻辑回归实现来执行意图分类。它即可以使用稀疏特征,也可以使用密集特征。输出识别的意图和意图置信度排序。相较于DIET分类器,虽然该分类器准确度不如DIET,但是这个分类器训练得更快。
"intent": "name": "greet", "confidence": 0.780,
"intent_ranking": [
"confidence": 0.780,
"name": "greet"
,
"confidence": 0.140,
"name": "goodbye"
,
"confidence": 0.080,
"name": "restaurant_search"
]
配置:
pipeline:
- name: LogisticRegressionClassifier
max_iter: 100
solver: lbfgs
tol: 0.0001
random_state: 42
ranking_length: 10
配置参数:
max_iter:求解器收敛所需的最大迭代次数。
solver:要使用的求解器。对于非常小的数据集,您可能会考虑liblinear.
tol:优化器停止标准的容忍度。
random_state:用于在训练前打乱数据。
ranking_length:要报告的首要意图数。设置为 0 以报告所有意图
SklearnIntentClassifier
Sklearn 意图分类器训练一个线性 SVM,它使用网格搜索进行优化。它还提供了没有“获胜”的标签的排名。SklearnIntentClassifier之前需要在管道中使用密集的特征化器。这个密集的特征化器创建用于分类的特征。
Sklearn 意图分类器主要是基于scikit-learn来实现,输出是意图以及意图排序。在 SVM 的训练过程中,运行超参数搜索以找到最佳参数集。在配置中,开发者可以指定要尝试的参数。
"intent": "name": "greet", "confidence": 0.780,
"intent_ranking": [
"confidence": 0.780,
"name": "greet"
,
"confidence": 0.140,
"name": "goodbye"
,
"confidence": 0.080,
"name": "restaurant_search"
]
配置:
pipeline:
- name: "SklearnIntentClassifier"
# Specifies the list of regularization values to
# cross-validate over for C-SVM.
# This is used with the ``kernel`` hyperparameter in GridSearchCV.
C: [1, 2, 5, 10, 20, 100]
# Specifies the kernel to use with C-SVM.
# This is used with the ``C`` hyperparameter in GridSearchCV.
kernels: ["linear"]
# Gamma parameter of the C-SVM.
"gamma": [0.1]
# We try to find a good number of cross folds to use during
# intent training, this specifies the max number of folds.
"max_cross_validation_folds": 5
# Scoring function used for evaluating the hyper parameters.
# This can be a name or a function.
"scoring_function": "f1_weighted"
KeywordIntentClassifier
简单的关键字匹配意图分类器,适用于小型、短期项目。该分类器通过在消息中搜索关键字来工作。默认情况下,匹配区分大小写,并且仅搜索与用户消息中的关键字字符串完全匹配的内容。意图的关键字是 NLU 训练数据中该意图的示例。这意味着整个示例都是关键字,而不是示例中的单个单词。
"intent": "name": "greet", "confidence": 1.0
此分类器仅适用于小型项目或入门。
配置:
pipeline:
- name: "KeywordIntentClassifier"
case_sensitive: True
DIETClassifier
DIET(Dual Intent and Entity Transformer)是一种用于意图分类和实体识别的多任务架构。该架构基于为两个任务共享的转换器。一系列实体标签是通过条件随机场 (CRF) 标记层预测的,位于对应于令牌输入序列的转换器输出序列之上。对于意图标签,完整话语和意图标签的转换器输出被嵌入到单个语义向量空间中。我们使用点积损失来最大化与目标标签的相似性并最小化与负样本的相似性。
DIET 不提供预训练的词嵌入或预训练的语言模型,但如果将它们添加到管道中,它就能够使用这些功能。
该分类器的输出主要是实体、意图以及意图排序集合。
"intent": "name": "greet", "confidence": 0.7800,
"intent_ranking": [
"confidence": 0.7800,
"name": "greet"
,
"confidence": 0.1400,
"name": "goodbye"
,
"confidence": 0.0800,
"name": "restaurant_search"
],
"entities": [
"end": 53,
"entity": "time",
"start": 48,
"value": "2017-04-10T00:00:00.000+02:00",
"confidence": 1.0,
"extractor": "DIETClassifier"
]
如果要使用DIETClassifier仅用于意图分类,请设置entity_recognition为False。如果只想做实体识别,设置intent_classification为False。默认情况下DIETClassifier两者都做,即entity_recognition和intent_classification都设置为 True。开发者可以定义许多超参数来调整模型。如果要调整模型,可以修改以下参数:
epochs:此参数设置算法将看到训练数据的次数(默认值:300)。一次epoch等于所有训练样例的一次前向传播和一次反向传播。有时模型需要更多的 epoch 才能正确学习。有时更多的时代不会影响性能。epoch 的数量越少,模型训练得越快。
hidden_layers_sizes:此参数允许您定义前馈层的数量及其用户消息和意图的输出维度(默认值:text: ], label: [])。列表中的每个条目都对应一个前馈层。例如,如果你设置text: [256, 128],我们将在transformer前面添加两个前馈层。输入标记的向量(来自用户消息)将传递到这些层。第一层的输出维度为 256,第二层的输出维度为 128。如果使用空列表(默认行为),则不会添加前馈层。确保仅使用正整数值。通常,使用 2 的幂数。此外,通常的做法是列表中的值递减:下一个值小于或等于之前的值。
embedding_dimension:此参数定义模型内部使用的嵌入层的输出维度(默认值:20)。我们在模型架构中使用多个嵌入层。例如,完整话语和意图的向量在比较和计算损失之前被传递到嵌入层。
number_of_transformer_layers:此参数设置要使用的transformer层数(默认值:2)。transformer层数对应于用于模型的 transformer块。
transformer_size:此参数设置 transformer中的单元数(默认值:256)。来自transformer的矢量将具有给定的transformer_size.
connection_density:此参数定义了模型中所有前馈层设置为非零值的内核权重的分数(默认值:0.2)。该值应介于 0 和 1 之间。如果设置connection_density 为 1,则不会将任何内核权重设置为 0,该层充当标准前馈层。您不应设置connection_density为 0,因为这会导致所有内核权重为 0,即模型无法学习。
constrain_similarities:此参数设置为True对所有相似项应用 sigmoid 交叉熵损失。这有助于将输入标签和负标签之间的相似性保持在较小的值。这应该有助于更好地将模型推广到现实世界的测试集。
model_confidence:此参数允许用户配置在推理期间如何计算置信度。它只能将一个值作为输入,即softmax。在softmax中,置信度在范围内[0, 1]。计算出的相似度用softmax激活函数归一化。
---------------------------------+------------------+--------------------------------------------------------------+
| Parameter | Default Value | Description |
+=================================+==================+==============================================================+
| hidden_layers_sizes | text: [] | Hidden layer sizes for layers before the embedding layers |
| | label: [] | for user messages and labels. The number of hidden layers is |
| | | equal to the length of the corresponding list. |
+---------------------------------+------------------+--------------------------------------------------------------+
| share_hidden_layers | False | Whether to share the hidden layer weights between user |
| | | messages and labels. |
+---------------------------------+------------------+--------------------------------------------------------------+
| transformer_size | 256 | Number of units in transformer. |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_transformer_layers | 2 | Number of transformer layers. |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_attention_heads | 4 | Number of attention heads in transformer. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_key_relative_attention | False | If 'True' use key relative embeddings in attention. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_value_relative_attention | False | If 'True' use value relative embeddings in attention. |
+---------------------------------+------------------+--------------------------------------------------------------+
| max_relative_position | None | Maximum position for relative embeddings. |
+---------------------------------+------------------+--------------------------------------------------------------+
| unidirectional_encoder | False | Use a unidirectional or bidirectional encoder. |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_size | [64, 256] | Initial and final value for batch sizes. |
| | | Batch size will be linearly increased for each epoch. |
| | | If constant `batch_size` is required, pass an int, e.g. `8`. |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_strategy | "balanced" | Strategy used when creating batches. |
| | | Can be either 'sequence' or 'balanced'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| epochs | 300 | Number of epochs to train. |
+---------------------------------+------------------+--------------------------------------------------------------+
| random_seed | None | Set random seed to any 'int' to get reproducible results. |
+---------------------------------+------------------+--------------------------------------------------------------+
| learning_rate | 0.001 | Initial learning rate for the optimizer. |
+---------------------------------+------------------+--------------------------------------------------------------+
| embedding_dimension | 20 | Dimension size of embedding vectors. |
+---------------------------------+------------------+--------------------------------------------------------------+
| dense_dimension | text: 128 | Dense dimension for sparse features to use. |
| | label: 20 | |
+---------------------------------+------------------+--------------------------------------------------------------+
| concat_dimension | text: 128 | Concat dimension for sequence and sentence features. |
| | label: 20 | |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_negative_examples | 20 | The number of incorrect labels. The algorithm will minimize |
| | | their similarity to the user input during training. |
+---------------------------------+------------------+--------------------------------------------------------------+
| similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
| | | or 'inner'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| loss_type | "cross_entropy" | The type of the loss function, either 'cross_entropy' |
| | | or 'margin'. If type 'margin' is specified, |
| | | "model_confidence=cosine" will be used which is deprecated |
| | | as of 2.3.4. See footnote (1). |
+---------------------------------+------------------+--------------------------------------------------------------+
| ranking_length | 10 | Number of top intents to report. Set to 0 to report all |
| | | intents. |
+---------------------------------+------------------+--------------------------------------------------------------+
| renormalize_confidences | False | Normalize the reported top intents. Applicable only with loss|
| | | type 'cross_entropy' and 'softmax' confidences. |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
| | | embedding vectors for correct labels. |
| | | Should be 0.0 < ... < 1.0 for 'cosine' similarity type. |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_negative_similarity | -0.4 | Maximum negative similarity for incorrect labels. |
| | | Should be -1.0 < ... < 1.0 for 'cosine' similarity type. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_maximum_negative_similarity | True | If 'True' the algorithm only minimizes maximum similarity |
| | | over incorrect intent labels, used only if 'loss_type' is |
| | | set to 'margin'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| scale_loss | False | Scale loss inverse proportionally to confidence of correct |
| | | prediction. |
+---------------------------------+------------------+--------------------------------------------------------------+
| regularization_constant | 0.002 | The scale of regularization. |
+---------------------------------+------------------+--------------------------------------------------------------+
| negative_margin_scale | 0.8 | The scale of how important it is to minimize the maximum |
| | | similarity between embeddings of different labels. |
+---------------------------------+------------------+--------------------------------------------------------------+
| connection_density | 0.2 | Connection density of the weights in dense layers. |
| | | Value should be between 0 and 1. |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate | 0.2 | Dropout rate for encoder. Value should be between 0 and 1. |
| | | The higher the value the higher the regularization effect. |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate_attention | 0.0 | Dropout rate for attention. Value should be between 0 and 1. |
| | | The higher the value the higher the regularization effect. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_sparse_input_dropout | True | If 'True' apply dropout to sparse input tensors. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_dense_input_dropout | True | If 'True' apply dropout to dense input tensors. |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_every_number_of_epochs | 20 | How often to calculate validation accuracy. |
| | | Set to '-1' to evaluate just once at the end of training. |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_on_number_of_examples | 0 | How many examples to use for hold out validation set. |
| | | Large values may hurt performance, e.g. model accuracy. |
+---------------------------------+------------------+--------------------------------------------------------------+
| intent_classification | True | If 'True' intent classification is trained and intents are |
| | | predicted. |
+---------------------------------+------------------+--------------------------------------------------------------+
| entity_recognition | True | If 'True' entity recognition is trained and entities are |
| | | extracted. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_masked_language_model | False | If 'True' random tokens of the input message will be masked |
| | | and the model has to predict those tokens. It acts like a |
| | | regularizer and should help to learn a better contextual |
| | | representation of the input. |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_directory | None | If you want to use tensorboard to visualize training |
| | | metrics, set this option to a valid output directory. You |
| | | can view the training metrics after training in tensorboard |
| | | via 'tensorboard --logdir <path-to-given-directory>'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_level | "epoch" | Define when training metrics for tensorboard should be |
| | | logged. Either after every epoch ('epoch') or for every |
| | | training step ('batch'). |
+---------------------------------+------------------+--------------------------------------------------------------+
| featurizers | [] | List of featurizer names (alias names). Only features |
| | | coming from the listed names are used. If list is empty |
| | | all available features are used. |
+---------------------------------+------------------+--------------------------------------------------------------+
| checkpoint_model | False | Save the best performing model during training. Models are |
| | | stored to the location specified by `--out`. Only the one |
| | | best model will be saved. |
| | | Requires `evaluate_on_number_of_examples > 0` and |
| | | `evaluate_every_number_of_epochs > 0` |
+---------------------------------+------------------+--------------------------------------------------------------+
| split_entities_by_comma | True | Splits a list of extracted entities by comma to treat each |
| | | one of them as a single entity. Can either be `True`/`False` |
| | | globally, or set per entity type, such as: |
| | | ``` |
| | | ... |
| | | - name: DIETClassifier |
| | | split_entities_by_comma: |
| | | address: True |
| | | ... |
| | | ... |
| | | ``` |
+---------------------------------+------------------+--------------------------------------------------------------+
| constrain_similarities | False | If `True`, applies sigmoid on all similarity terms and adds |
| | | it to the loss function to ensure that similarity values are |
| | | approximately bounded. Used only if `loss_type=cross_entropy`|
+---------------------------------+------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each intent |
| | | is computed. Currently, only one value is supported: |
| | | 1. `softmax` - Similarities between input and intent |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all intents sum up to 1. |
| | | This parameter does not affect the confidence for entity |
| | | prediction. |
+---------------------------------+------------------+--------------------------------------------------------------+
FallbackClassifier
如果NLU意图分类不是很明确的话,那也可以使用nlu_fallback对消息进行分类。置信度与 fallback threshold 设置相同。该分类器的输出也是实体、意图以及意图类型集合。
"intent": "name": "nlu_fallback", "confidence": 0.7183846840434321,
"intent_ranking": [
"confidence": 0.7183846840434321,
"name": "nlu_fallback"
,
"confidence": 0.28161531595656784,
"name": "restaurant_search"
],
"entities": [
"end": 53,
"entity": "time",
"start": 48,
"value": "2017-04-10T00:00:00.000+02:00",
"confidence": 1.0,
"extractor": "DIETClassifier"
]
如果先前的意图分类器无法正常对意图进行分类,则可以使用FallbackClassifier对用户消息进行分类。当两个排名靠前的意图的置信度分数接近于 ambiguity_threshold 时,它还可以预测回退意图。开发者可以使用FallbackClassifier来处理具有不确定 NLU 预测的回退操作。
rules:
- rule: Ask the user to rephrase in case of low NLU confidence
steps:
- intent: nlu_fallback
- action: utter_please_rephrase
开发者可以定义许多超参数来调整模型。如果要调整模型,可以修改以下参数,只有在没有预测出来其他意图或者置信度没有大于或等于threshold的情况下,FallbackClassifier分类器通过可以使用nlu_fallback来进行意图分类。
threshold:此参数设置预nlu_fallback意图的阈值。如果先前意图分类器预测的意图没有大于或等于threshold的置信度,FallbackClassifier则将设置一个nlu_fallback意图并且置信度为1.0。
ambiguity_threshold:如果开发者配置了 ambiguity_threshold,则 FallbackClassifier 还将预测 nlu_fallback 意图,以防两个排名最高意图的置信度分数之差小于 ambiguity_threshold。
以上是关于RASA-意图识别组件Classifier的主要内容,如果未能解决你的问题,请参考以下文章
RASADIET:Dual Intent and Entity Transformer