尝试在多标签分类问题中应用自定义损失时出现 TypeError
Posted
技术标签:
【中文标题】尝试在多标签分类问题中应用自定义损失时出现 TypeError【英文标题】:TypeError when trying to apply custom loss in a multilabel classification problem 【发布时间】:2021-10-20 10:35:21 【问题描述】:我正在尝试使用来自 huggingface 转换器库的 BERT 解决多标签文本分类问题。模型定义如下:
def create_model(encoder, nb_classes=3, lr=1e-5):
# inputs
input_ids = tf.keras.Input(shape=(512,), ragged=False,
dtype=tf.int32, name='input_ids')
input_attention_mask = tf.keras.Input(shape=(512,), ragged=False,
dtype=tf.int32, name='attention_mask')
# transformer
output = encoder('input_ids': input_ids,
'attention_mask': input_attention_mask)[0]
Y = tf.keras.layers.BatchNormalization()(output)
Y = tf.keras.layers.Dense(nb_classes, activation='sigmoid')(Y)
# compilation
model = tf.keras.Model(inputs=[input_ids, input_attention_mask],
outputs=[Y])
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
# losses
# loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
# loss = tf.keras.losses.BinaryCrossentropy(from_logits=False)
model.compile(optimizer=optimizer,
loss=multilabel_loss, metrics=['acc'])
model.summary()
return model
如您所见,我尝试使用 tf.keras.losses,但它不起作用(抛出AttributeError: 'Tensor' object has no attribute 'nested_row_splits'
),所以我手动定义了一个简单的交叉熵:
def multilabel_loss(y_true, y_pred):
y_pred = tf.convert_to_tensor(y_pred)
y_true = tf.cast(y_true, y_pred.dtype)
cross_entropy = -tf.reduce_sum((y_true*tf.math.log(y_pred + 1e-8) + (1 - y_true) * tf.math.log(1 - y_pred + 1e-8)),
name='xentropy')
return cross_entropy
模型是使用 strategy.scope() 创建的,如下所示,使用 'distil-bert-uncased' 作为检查点:
with strategy.scope():
encoder = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
#encoder = TFRobertaForSequenceClassification.from_pretrained(checkpoint)
model = create_model(encoder)
标签是二进制数组:
163350 [0, 0, 1]
118940 [0, 0, 1]
65243 [0, 0, 1]
30011 [0, 0, 1]
189713 [0, 1, 0]
它们在下一个函数中与标记化文本组合成一个 tf.dataset:
def tf_text_data_prep(df):
"""
input: takes pandas dataframe
output: returns tokenized tf.Dataset
"""
hugging_ds = Dataset.from_pandas(df)
tokenized_ds = hugging_ds.map(
tokenize_function,
batched=True,
num_proc=strategy.num_replicas_in_sync,
remove_columns=["Text", '__index_level_0__'],
load_from_cache_file=True
)
# Convert to tensorflow
tf_dataset = tokenized_ds.with_format("tensorflow")
features = x: tf_dataset[x].to_tensor() for x in tokenizer.model_input_names
tf_data = tf.data.Dataset.from_tensor_slices((features, tf_dataset["label"]))
return tf_data
问题是当我启动训练时,我得到了错误:
TypeError Traceback (most recent call last)
<ipython-input-62-720b4634d50e> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', 'steps_per_epoch = int(BUFFER_SIZE // BATCH_SIZE)\nprint(\n f"Model Params:\\nbatch_size: BATCH_SIZE\\nEpochs: EPOCHS\\n"\n f"Step p. Epoch: steps_per_epoch\\n"\n f"Initial Learning rate: INITAL_LEARNING_RATE"\n)\nhistory = model.fit(\n train_ds,\n validation_data=val_ds,\n batch_size=BATCH_SIZE,\n epochs=EPOCHS,\n callbacks=callbacks,\n verbose=1,\n)')
12 frames
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2115 magic_arg_s = self.var_expand(line, stack_depth)
2116 with self.builtin_trap:
-> 2117 result = fn(magic_arg_s, cell)
2118 return result
2119
<decorator-gen-53> in time(self, line, cell, local_ns)
/usr/local/lib/python3.7/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
/usr/local/lib/python3.7/dist-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1191 else:
1192 st = clock2()
-> 1193 exec(code, glob, local_ns)
1194 end = clock2()
1195 out = None
<timed exec> in <module>()
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1176 _r=1):
1177 callbacks.on_train_batch_begin(step)
-> 1178 tmp_logs = self.train_function(iterator)
1179 if data_handler.should_sync:
1180 context.async_wait()
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
887
888 with OptionalXlaContext(self._jit_compile):
--> 889 result = self._call(*args, **kwds)
890
891 new_tracing_count = self.experimental_get_tracing_count()
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
931 # This is the first call of __call__, so we have to initialize.
932 initializers = []
--> 933 self._initialize(args, kwds, add_initializers_to=initializers)
934 finally:
935 # At this point we know that the initialization is complete (or less
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
762 self._concrete_stateful_fn = (
763 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
--> 764 *args, **kwds))
765
766 def invalid_creator_scope(*unused_args, **unused_kwds):
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
3048 args, kwargs = None, None
3049 with self._lock:
-> 3050 graph_function, _ = self._maybe_define_function(args, kwargs)
3051 return graph_function
3052
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
3442
3443 self._function_cache.missed.add(call_context_key)
-> 3444 graph_function = self._create_graph_function(args, kwargs)
3445 self._function_cache.primary[cache_key] = graph_function
3446
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
3287 arg_names=arg_names,
3288 override_flat_arg_shapes=override_flat_arg_shapes,
-> 3289 capture_by_value=self._capture_by_value),
3290 self._function_attributes,
3291 function_spec=self.function_spec,
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
997 _, original_func = tf_decorator.unwrap(python_func)
998
--> 999 func_outputs = python_func(*func_args, **func_kwargs)
1000
1001 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
670 # the function a weak reference to itself to avoid a reference cycle.
671 with OptionalXlaContext(compile_with_xla):
--> 672 out = weak_wrapped_fn().__wrapped__(*args, **kwds)
673 return out
674
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
984 except Exception as e: # pylint:disable=broad-except
985 if hasattr(e, "ag_error_metadata"):
--> 986 raise e.ag_error_metadata.to_exception(e)
987 else:
988 raise
TypeError: in user code:
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:850 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:840 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1285 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2833 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3608 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:833 run_step **
outputs = model.train_step(data)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:795 train_step
self.compiled_metrics.update_state(y, y_pred, sample_weight)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/compile_utils.py:460 update_state
metric_obj.update_state(y_t, y_p, sample_weight=mask)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:86 decorated
update_op = update_state_fn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:177 update_state_fn
return ag_update_state(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/metrics.py:659 update_state **
[y_true, y_pred], sample_weight)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/metrics_utils.py:546 ragged_assert_compatible_and_get_flat_values
raise TypeError('One of the inputs does not have acceptable types.')
TypeError: One of the inputs does not have acceptable types.
同样的方法适用于普通的二元分类,但不适用于多标签。 对于错误或一般方法的任何帮助,我将不胜感激。
【问题讨论】:
【参考方案1】:问题是您正在使用TFAutoModelForSequenceClassification
即ForSequenceClassification
,如果您要查看它的摘要,您会发现它返回Dense
输出,因此它不是您想要的编码器。
encoder = TFAutoModelForSequenceClassification.from_pretrained('bert-base-uncased')
encoder.summary()
'''
Result:
All model checkpoint layers were used when initializing TFBertForSequenceClassification.
Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Model: "tf_bert_for_sequence_classification_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bert (TFBertMainLayer) multiple 109482240
_________________________________________________________________
dropout_187 (Dropout) multiple 0
_________________________________________________________________
classifier (Dense) multiple 1538
=================================================================
Total params: 109,483,778
Trainable params: 109,483,778
Non-trainable params: 0
'''
但您想将其用作encoder
,因此您必须这样做
from transformers import TFBertModel
encoder = TFBertModel.from_pretrained('bert-base-uncased')
model = create_model(encoder)
encoder.summary()
'''
Model: "tf_bert_model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bert (TFBertMainLayer) multiple 109482240
=================================================================
Total params: 109,482,240
Trainable params: 109,482,240
Non-trainable params: 0
'''
正如您现在所见,您的编码器作为Bert
的输出。现在create_model
中的下一行是有道理的,但它会给出create_model
函数中下一行的错误原因
output = encoder('input_ids': input_ids,
'attention_mask': input_attention_mask)[0]
这是因为0
索引处的输出形状为(batch_size, token_length, embedding)
,但我们希望[CLS]
令牌的值应该是(batch_size, embedding)
并且位于1
索引处,因此我们必须更新到下一行:
output = encoder('input_ids': input_ids,
'attention_mask': input_attention_mask)[1]
此外,到目前为止,您正在将 input_shape
修复为 512
,但我们应该将值指定为 None
,以便我们可以有如下可变长度输入
input_ids = tf.keras.Input(shape=(None,), ragged=False, dtype=tf.int32, name='input_ids')
input_attention_mask = tf.keras.Input(shape=(None,), ragged=False, dtype=tf.int32, name='attention_mask')
完成所有这些更改后,下面是示例运行的结果。
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
encoder = TFBertModel.from_pretrained('bert-base-uncased')
model = create_model(encoder)
inputs = tokenizer('hello world', return_tensors='tf')
model.predict((inputs['input_ids'], inputs['attention_mask']))
'''
Results:
array([[0.7867866 , 0.65974414, 0.45628983]], dtype=float32)
'''
【讨论】:
感谢您的回复,这很有用,因为使用 TFBertModel 作为编码器构建模型确实更有意义。但是,它并没有解决最初的损失问题: -TypeError: One of the inputs does not have acceptable types.
使用 multilabel_loss 函数 - AttributeError: 'Tensor' object has no attribute 'nested_row_splits'
使用 loss = tf.keras.losses.BinaryCrossentropy。【参考方案2】:
没关系。问题已解决。
对于可能遇到相同问题的任何人,在使用类似于我的数据准备管道时,您应该通过添加 to_tensor()
来将标签张量转换为 EagerTensor
用于多标签情况。否则,tf Dataset 的标签张量将包含tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor
而不是tensorflow.python.framework.ops.EagerTensor
,这将导致损失计算时出错。所以 text_data_prep 函数应该是这样的:
def tf_text_data_prep(df):
"""
input: takes pandas dataframe
output: returns tokenized tf.Dataset
"""
hugging_ds = Dataset.from_pandas(df)
tokenized_ds = hugging_ds.map(
tokenize_function,
batched=True,
num_proc=strategy.num_replicas_in_sync,
remove_columns=["Text", '__index_level_0__'],
load_from_cache_file=True
)
# Convert to tensorflow
tf_dataset = tokenized_ds.with_format("tensorflow")
features = x: tf_dataset[x].to_tensor() for x in tokenizer.model_input_names
# add 'to_tensor() to avoid ragged_tensors in multilabel case
tf_data = tf.data.Dataset.from_tensor_slices((features,
tf_dataset["label"].to_tensor()))
return tf_data
然后训练可以使用 keras.losses 或自定义的。
【讨论】:
以上是关于尝试在多标签分类问题中应用自定义损失时出现 TypeError的主要内容,如果未能解决你的问题,请参考以下文章