在 Keras 中将变压器输出连接到 CNN 输入的问题

Posted

技术标签:

【中文标题】在 Keras 中将变压器输出连接到 CNN 输入的问题【英文标题】:Problem connecting transformer output to CNN input in Keras 【发布时间】:2021-09-20 03:42:52 【问题描述】:

我需要按照编码器-解码器方法在 Tensorflow 中构建基于转换器的架构,其中编码器是预先存在的 Huggingface Distilbert 模型,解码器是 CNN。

输入:包含连续多个短语的文本的文本。输出:根据分类标准编码。我的数据文件有 7387 对 TSV 格式的文本标签:

text \t code
This is example text number one. It might contain some other phrases. \t C21
This is example text number two. It might contain some other phrases. \t J45.1
This is example text number three. It might contain some other phrases. \t A27

剩下的代码是这样的:

        text_file = "data/datafile.tsv"
        with open(text_file) as f:
                lines = f.read().split("\n")[:-1]
                text_and_code_pairs = []
                for line in lines:
                        text, code = line.split("\t")
                        text_and_code_pairs.append((text, code))


        random.shuffle(text_and_code_pairs)
        num_val_samples = int(0.10 * len(text_and_code_pairs))
        num_train_samples = len(text_and_code_pairs) - 3 * num_val_samples
        train_pairs = text_and_code_pairs[:num_train_samples]
        val_pairs = text_and_code_pairs[num_train_samples : num_train_samples + num_val_samples]
        test_pairs = text_and_code_pairs[num_train_samples + num_val_samples :]

        train_texts = [fst for (fst,snd) in train_pairs]
        train_labels = [snd for (fst,snd) in train_pairs]
        val_texts = [fst for (fst,snd) in val_pairs]
        val_labels = [snd for (fst,snd) in val_pairs]
        test_texts = [fst for (fst,snd) in test_pairs]
        test_labels = [snd for (fst,snd) in test_pairs]

        distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
        tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-multilingual-cased")

        train_encodings = tokenizer(train_texts, truncation=True, padding=True)
        val_encodings = tokenizer(val_texts, truncation=True, padding=True)
        test_encodings = tokenizer(test_texts, truncation=True, padding=True)

        train_dataset = tf.data.Dataset.from_tensor_slices((
                dict(train_encodings),
                train_labels
        ))
        val_dataset = tf.data.Dataset.from_tensor_slices((
                dict(val_encodings),
                val_labels
        ))
        test_dataset = tf.data.Dataset.from_tensor_slices((
                dict(test_encodings),
                test_labels
        ))

        model = build_model(distilbert_encoder)
        model.fit(train_dataset.batch(64), validation_data=val_dataset, epochs=3, batch_size=64)
        model.predict(test_dataset, verbose=1)

最后,build_model 函数:

def build_model(transformer, max_len=512):
        model = tf.keras.models.Sequential()
        # Encoder
        inputs = layers.Input(shape=(max_len,), dtype=tf.int32)
        distilbert = transformer(inputs)
        # LAYER - something missing here?
        # Decoder
        conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
        pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
        flat = tf.keras.layers.Flatten()(pooling)
        fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
        softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
        model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
        model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
        print(model.summary())
        return model

我设法缩小了问题的可能位置。从顺序更改为功能性 Keras API 后,我收到以下错误:

Traceback (most recent call last):
  File "keras_transformer.py", line 99, in <module>
    main()
  File "keras_transformer.py", line 94, in main
    model = build_model(distilbert_encoder)
  File "keras_transformer.py", line 23, in build_model
    conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 897, in __call__
    self._maybe_build(inputs)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2416, in _maybe_build
    self.build(input_shapes)  # pylint:disable=not-callable
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 152, in build
    input_shape = tensor_shape.TensorShape(input_shape)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in __init__
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in <listcomp>
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 716, in as_dimension
    return Dimension(value)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 200, in __init__
    None)
  File "<string>", line 3, in raise_from
TypeError: Dimension value must be integer or None or have an __index__ method, got 'last_hidden_state'

似乎错误在于变压器的输出和卷积层的输入之间的连接。我是否应该在它们之间包含另一层以适应变压器的输出?如果是这样,最好的选择是什么?我使用的是 tensorflow==2.2.0、transformers==4.5.1 和 Python 3.6.9

【问题讨论】:

【参考方案1】:

我认为问题是在dilbert 实例之后为张量流层调用正确的张量。因为distilbert = transformer(inputs) 返回一个实例而不是像tensorflow 中的张量,例如pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)poolingMaxPooling1D 层的输出张量。

我通过调用distilbert 实例的last_hidden_state 变量(即dilbert 模型的输出)来解决您的问题,这将是您对下一个Conv1D 层的输入。

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages

from transformers import TFDistilBertModel, DistilBertModel
import tensorflow as tf

distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")


def build_model(transformer, max_len=512):
        # model = tf.keras.models.Sequential()
        # Encoder
        inputs = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32)
        distilbert = transformer(inputs)
        # Decoder
        ###### !!!!!! #########
        conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert.last_hidden_state) 
        ###### !!!!!! #########        
        pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
        flat = tf.keras.layers.Flatten()(pooling)
        fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
        softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
        model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
        model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
        print(model.summary())
        return model


model = build_model(distilbert_encoder)

返回,

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 512)]             0         
_________________________________________________________________
tf_distil_bert_model (TFDist TFBaseModelOutput(last_hi 134734080 
_________________________________________________________________
conv1d (Conv1D)              (None, 503, 5)            38405     
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 251, 5)            0         
_________________________________________________________________
flatten (Flatten)            (None, 1255)              0         
_________________________________________________________________
dense (Dense)                (None, 1255)              1576280   
_________________________________________________________________
dense_1 (Dense)              (None, 1255)              1576280   
=================================================================
Total params: 137,925,045
Trainable params: 137,925,045
Non-trainable params: 0

注意:我假设您在 build_model 函数中的意思是 tf.keras.layers.Inputlayers.Input

【讨论】:

【参考方案2】:

我认为你是对的。问题似乎出在 Conv1D 层的输入上。

根据the documentation,outputs.last_hidden_​​state 的形状为 (batch_size, sequence_length, hidden_​​size)。 Conv1D 期望输入形状 (batch_size, sequence_length)。 也许您可以通过将 Conv1D 更改为 Conv2D 或在两者之间添加 Conv2D 层来解决问题。

【讨论】:

如果添加一个 Conv2D 层,或者如果我将现有的 Conv1D 层换成一个 Conv2D 层,我会收到以下与定义 Conv2D 层的行相关的错误:ValueError: Input 0 of layer conv2d is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, 512, 768]

以上是关于在 Keras 中将变压器输出连接到 CNN 输入的问题的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 PyTorch 中的单个全连接层直接将输入连接到输出?

如何在keras tensorflow中将图像作为输入并获取另一个图像作为输出

基于 BERT 的 CNN - 卷积和 Maxpooling

如何使用 Tensorflow 张量设置 Keras 层的输入?

如何在 Keras 中使用 CNN 处理马萨诸塞州道路数据集?

使用 Keras 维度误差的 CNN-1D(Seq2 点)时间序列预测