在谷歌 colab 中使用带有 TPU 的 Keras 调谐器搜索方法时如何解决“从空列表中弹出”错误?

Posted

技术标签:

【中文标题】在谷歌 colab 中使用带有 TPU 的 Keras 调谐器搜索方法时如何解决“从空列表中弹出”错误?【英文标题】:How to fix "pop from empty list" error while using Keras tuner search method with TPU in google colab? 【发布时间】:2022-01-11 01:36:03 【问题描述】:

我之前可以在我的模型上使用 Google colab 的 GPU 运行时运行 keras Tuner 的搜索方法。但是当我切换到 TPU 运行时,我得到了以下错误。我还没有得出如何访问谷歌云存储的 TPU 运行时以保存 keras 调谐器保存模型检查点的检查点文件夹的结论。我也不知道该怎么做,我我收到以下错误。请帮我解决这个问题。

我的代码:

def post_se(hp):
    ip = Input(shape=(6, 128))
    x = Masking()(ip)
    x = LSTM(units=hp.Choice('lstm_1', values = [8,16,32,64,128,256,512]),return_sequences=True)(x)
    x = Dropout(hp.Choice(name='Dropout', values = [0.0,0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]))(x)
    x = LSTM(units=hp.Choice('lstm_2', values = [8,16,32,64,128,256,512]))(x)
    x = Dropout(hp.Choice(name='Dropout_2', values = [0.0,0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]))(x)
    y = Permute((2, 1))(ip)
    y = Conv1D(hp.Choice('conv_1_filter', values = [32,64,128,256,512]), hp.Choice(name='conv_1_filter_size', values = [3,5,7,8,9]), padding='same', kernel_initializer='he_uniform')(y)
    y = BatchNormalization()(y)
    y = Activation('relu')(y)
    y = squeeze_excite_block(y)
    y = Conv1D(hp.Choice('conv_2_filter', values = [32,64,128,256,512]), hp.Choice(name='conv_2_filter_size',values = [3,5,7,8,9]), padding='same', kernel_initializer='he_uniform')(y)
    y = BatchNormalization()(y)
    y = Activation('relu')(y)
    y = squeeze_excite_block(y)
    y = Conv1D(hp.Choice('conv_3_filter', values = [32,64,128,256,512,]), hp.Choice(name='conv_3_filter_size',values = [3,5,7,8,9]), padding='same', kernel_initializer='he_uniform')(y)
    y = BatchNormalization()(y)
    y = Activation('relu')(y)
    y = GlobalAveragePooling1D()(y)
    x = concatenate([x,y])
    # batch_size = hp.Choice('batch_size', values=[32, 64, 128, 256, 512, 1024, 2048, 4096])
    out = Dense(num_classes, activation='softmax')(x)
    model = Model(ip, out)
    if gpu:
        opt = keras.optimizers.Adam(learning_rate=0.001)
    if tpu:
        opt = keras.optimizers.Adam(learning_rate=8*0.001)
    model.compile(optimizer=opt, loss='categorical_crossentropy',metrics=['accuracy'])
    # model.summary()

    return model

if gpu:
    tuner = kt.tuners.BayesianOptimization(post_se,
        objective='val_accuracy',
        max_trials=30,
        seed=42,
        project_name='Model_gpu')
    # Will stop training if the "val_loss" hasn't improved in 30 epochs.
    tuner.search(X_train, train_label, epochs=200, validation_split=0.1, shuffle=True, callbacks=[tensorflow.keras.callbacks.EarlyStopping('val_loss', patience=30)])

if tpu:
    print("TPU")
    with strategy.scope():
        tuner = kt.tuners.BayesianOptimization(post_se,
            objective='val_accuracy',
            max_trials=30,
            seed=42,
            project_name='Model_tpu')
        # Will stop training if the "val_loss" hasn't improved in 30 epochs.
        tuner.search(X_train, train_label, epochs=200, validation_split=0.1, shuffle=True, callbacks=[tensorflow.keras.callbacks.EarlyStopping('val_loss', patience=30)])

错误日志

---------------------------------------------------------------------------
UnimplementedError                        Traceback (most recent call last)
/usr/lib/python3.7/contextlib.py in __exit__(self, type, value, traceback)
    129             try:
--> 130                 self.gen.throw(type, value, traceback)
    131             except StopIteration as exc:

10 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in resource_creator_scope(resource_type, resource_creator)
   2957                                                    resource_creator):
-> 2958     yield
   2959 

<ipython-input-15-24c1e1bb603d> in <module>()
     17         # Will stop training if the "val_loss" hasn't improved in 30 epochs.
---> 18         tuner.search(X_train, train_label, epochs=200, validation_split=0.1, shuffle=True, callbacks=[tensorflow.keras.callbacks.EarlyStopping('val_loss', patience=30)])

/usr/local/lib/python3.7/dist-packages/keras_tuner/engine/base_tuner.py in search(self, *fit_args, **fit_kwargs)
    178             self.on_trial_begin(trial)
--> 179             results = self.run_trial(trial, *fit_args, **fit_kwargs)
    180             # `results` is None indicates user updated oracle in `run_trial()`.

/usr/local/lib/python3.7/dist-packages/keras_tuner/engine/tuner.py in run_trial(self, trial, *args, **kwargs)
    303             copied_kwargs["callbacks"] = callbacks
--> 304             obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
    305 

/usr/local/lib/python3.7/dist-packages/keras_tuner/engine/tuner.py in _build_and_fit_model(self, trial, *args, **kwargs)
    233         model = self._try_build(hp)
--> 234         return self.hypermodel.fit(hp, model, *args, **kwargs)
    235 

/usr/local/lib/python3.7/dist-packages/keras_tuner/engine/hypermodel.py in fit(self, hp, model, *args, **kwargs)
    136         """
--> 137         return model.fit(*args, **kwargs)
    138 

/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _numpy(self)
   1116     except core._NotOkStatusException as e:  # pylint: disable=protected-access
-> 1117       raise core._status_to_exception(e) from None  # pylint: disable=protected-access
   1118 

UnimplementedError: File system scheme '[local]' not implemented (file: './untitled_project/trial_78ed6883514d67dc6222064095c134cb/checkpoints/epoch_0/checkpoint_temp/part-00000-of-00001')
    Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-15-24c1e1bb603d> in <module>()
     16             seed=42)
     17         # Will stop training if the "val_loss" hasn't improved in 30 epochs.
---> 18         tuner.search(X_train, train_label, epochs=200, validation_split=0.1, shuffle=True, callbacks=[tensorflow.keras.callbacks.EarlyStopping('val_loss', patience=30)])

/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py in __exit__(self, exception_type, exception_value, traceback)
    454                          "tf.distribute.set_strategy() out of `with` scope."),
    455             e)
--> 456     _pop_per_thread_mode()
    457 
    458 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribution_strategy_context.py in _pop_per_thread_mode()
     64 
     65 def _pop_per_thread_mode():
---> 66   ops.get_default_graph()._distribution_strategy_stack.pop(-1)  # pylint: disable=protected-access
     67 
     68 

IndexError: pop from empty list

有关一些额外信息,我将在这篇文章中附上我的代码。

【问题讨论】:

【参考方案1】:

这是你的错误:

UnimplementedError: File system scheme '[local]' not implemented (file: './untitled_project/trial_78ed6883514d67dc6222064095c134cb/checkpoints/epoch_0/checkpoint_temp/part-00000-of-00001')

请参阅https://***.com/a/62881833/14043558 以获得解决方案。

【讨论】:

以上是关于在谷歌 colab 中使用带有 TPU 的 Keras 调谐器搜索方法时如何解决“从空列表中弹出”错误?的主要内容,如果未能解决你的问题,请参考以下文章

谷歌 Colab TPU 版本

matplotlib 绘图在谷歌 colab 中不起作用

如何使用 colab 在谷歌驱动器上保存 np.array?

如何在谷歌 colab 中读取图像数据集以进行深度学习?

如何在谷歌 colab 上恢复到默认的 tensorflow 版本

Tesseract 4.1.1 错误 eng.traineddata 在谷歌 colab 中找不到