在 biLSTM 之上使用 tfa.layers.crf

Posted

技术标签:

【中文标题】在 biLSTM 之上使用 tfa.layers.crf【英文标题】:using tfa.layers.crf on top of biLSTM 【发布时间】:2021-09-07 01:10:50 【问题描述】:

我正在尝试使用 tensorflow-addons 库实现基于 CRF 的 NER 模型。该模型以单词到索引和字符级格式获取单词序列,并将它们连接起来并将它们提供给 BiLSTM 层。下面是实现代码:

import tensorflow as tf
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Conv1D
from tensorflow.keras.layers import Bidirectional, concatenate, SpatialDropout1D, GlobalMaxPooling1D
from tensorflow_addons.layers import CRF

word_input = Input(shape=(max_sent_len,))
word_emb = Embedding(input_dim=n_words + 2, output_dim=dim_word_emb,
                     input_length=max_sent_len, mask_zero=True)(word_input)

char_input = Input(shape=(max_sent_len, max_word_len,))
char_emb = TimeDistributed(Embedding(input_dim=n_chars + 2, output_dim=dim_char_emb,
                           input_length=max_word_len, mask_zero=True))(char_input)

char_emb = TimeDistributed(LSTM(units=20, return_sequences=False,
                                recurrent_dropout=0.5))(char_emb)

# main LSTM
main_input = concatenate([word_emb, char_emb])
main_input = SpatialDropout1D(0.3)(main_input)
main_lstm = Bidirectional(LSTM(units=50, return_sequences=True,
                               recurrent_dropout=0.6))(main_input)
kernel = TimeDistributed(Dense(50, activation="relu"))(main_lstm)  
crf = CRF(n_tags+1)  # CRF layer
decoded_sequence, potentials, sequence_length, chain_kernel = crf(kernel)  # output

model = Model([word_input, char_input], potentials)
model.add_loss(tf.abs(tf.reduce_mean(kernel)))
model.compile(optimizer="rmsprop", loss='categorical_crossentropy')

当我开始拟合模型时,我会收到以下警告:

WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.

训练过程是这样的:

438/438 [==============================] - 80s 163ms/step - loss: nan - val_loss: nan
Epoch 2/10
438/438 [==============================] - 71s 163ms/step - loss: nan - val_loss: nan
Epoch 3/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 4/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 5/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 6/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 7/10
438/438 [==============================] - 70s 161ms/step - loss: nan - val_loss: nan
Epoch 8/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 9/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 10/10
438/438 [==============================] - 70s 159ms/step - loss: nan - val_loss: nan

我几乎可以肯定问题出在我设置损失函数的方式上,但我不知道我应该如何设置它们。我也搜索了我的问题,但没有得到任何答案。 此外,当我测试我的模型时,它无法正确预测标签并为它们提供相同的标签。谁能帮我描述一下我应该如何解决这个问题?

【问题讨论】:

【参考方案1】:

将损失函数更改为 tensorflow_addons.losses.SigmoidFocalCrossEntropy()。我猜分类交叉熵不是一个好的选择。

【讨论】:

它提高了准确性,现在我的模型预测得更好,但警告仍然存在。 @Abolfazl 你能分享你更新的代码吗...添加 SigmoidFocalCrossEntropy() 后我面临形状不匹配错误

以上是关于在 biLSTM 之上使用 tfa.layers.crf的主要内容,如果未能解决你的问题,请参考以下文章

序列标注(BiLSTM-CRF/Lattice LSTM)

如何为 biLSTM 层 Keras 设置自定义初始权重?

论文使用bilstm在中文分词上的SOTA模型

Bert加bilstm和crf做ner的意义

文本分类-04BiLSTM

BERT知识蒸馏Distilled BiLSTM