在 GPU 上训练比在 CPU 上慢得多 - 为啥以及如何加快速度？

Posted 2023-02-23

技术标签:

【中文标题】在 GPU 上训练比在 CPU 上慢得多 - 为啥以及如何加快速度？【英文标题】：Training on GPU much slower than on CPU - why and how to speed it up?在 GPU 上训练比在 CPU 上慢得多 - 为什么以及如何加快速度？ 【发布时间】：2020-10-18 02:30:50 【问题描述】：

我正在使用 Google Colab 的 CPU 和 GPU 训练卷积神经网络。

这是网络的架构：

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 62, 126, 32)       896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 31, 63, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 29, 61, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 30, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 12, 28, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 14, 64)         0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 4, 12, 64)         36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 6, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 768)               0         
_________________________________________________________________
dropout (Dropout)            (None, 768)               0         
_________________________________________________________________
lambda (Lambda)              (None, 1, 768)            0         
_________________________________________________________________
dense (Dense)                (None, 1, 256)            196864    
_________________________________________________________________
dense_1 (Dense)              (None, 1, 8)              2056      
_________________________________________________________________
permute (Permute)            (None, 8, 1)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 8, 36)             72        
=================================================================
Total params: 264,560
Trainable params: 264,560
Non-trainable params: 0

所以，这是一个非常小的网络，但有一个特定的输出，形状为 (8, 36)，因为我想识别车牌图像上的字符。

我用这段代码来训练网络：

model.fit_generator(generator=training_generator,
                    validation_data=validation_generator,
                    steps_per_epoch = num_train_samples // 128,
                    validation_steps = num_val_samples // 128,
                    epochs = 10)

生成器将图像大小调整为(64, 128)。这是关于生成器的代码：

class DataGenerator(Sequence):

    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size

    def __len__(self):
        return math.ceil(len(self.x) / self.batch_size)

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) *
        self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) *
        self.batch_size]

        return np.array([
            resize(imread(file_name), (64, 128))
               for file_name in batch_x]), np.array(batch_y)

在 CPU 上，一个 epoch 需要 70-90 分钟。在 GPU（149 瓦）上，它需要的时间是 CPU 的 5 倍。

你知道，为什么需要这么长时间吗？发电机有问题吗？我可以加快这个过程吗？

编辑：这是我笔记本的链接：https://colab.research.google.com/drive/1ux9E8DhxPxtgaV60WUiYI2ew2s74Xrwh?usp=sharing

我的数据存储在我的 Google 云端硬盘中。训练数据集包含 105 k 图像和验证数据集 76 k。总而言之，我有 1.8 GB 的数据。

我是否应该将数据存储在另一个地方？

非常感谢！

【问题讨论】：

请分享一个能够重现您观察到的问题的独立笔记本。原始问题中未描述对性能重要的因素，例如数据集位置和大小。例如，如果您的数据在云端硬盘中，您可能可以通过将其复制到本地 PD SSD 启动盘来加快速度。要使用GPU，有没有安装tensorflow-gpu？不，我没有安装 tensorflow-gpu。那有必要吗？我想，我只需要将运行时类型更改为“GPU”？ 【参考方案1】：

我想，你没有启用 GPU

转到Edit -> Notebook Settings 并选择GPU。然后点击SAVE

【讨论】：

谢谢！那有必要吗？在我做的所有教程中，我只看到运行时类型更改为GPU，没有人将硬件加速器更改为GPU。但是，我按照您的建议将其更改为 GPU，但不幸的是它并没有比以前快... @Tobitor 尝试检查 tensorflow 是否在 GPU 上运行 from tensorflow.python.client import device_lib print(device_lib.list_local_devices()) 输出必须显示有 GPU 可用 @Tobitor 实际上，这里有一个关于在 GPU 上运行 keras 模型的问题：***.com/questions/45662253/…。也许会有所帮助我做了这个，它在 GPU 上运行。但是，它并没有比以前快... @Tobitor 尝试添加这行

config = tf.ConfigProto( device_count = 'GPU': 1 , 'CPU': 2 )  sess = tf.Session(config=config)  keras.backend.set_session(sess)

以上是关于在 GPU 上训练比在 CPU 上慢得多 - 为啥以及如何加快速度？的主要内容，如果未能解决你的问题，请参考以下文章

为啥通过 django QuerySet 进行查询比在 Django 中使用游标慢得多？

训练某些网络时，GPU 上的 Keras（Tensorflow 后端）比 CPU 上慢

linux上的C++代码比windows上慢得多[关闭]

为啥 sift.compute() 在 MSER 关键点上比在 SIFT 关键点上慢

使用 System.Data.SQLite (C#) 的 SQLite 查询比在 SQLiteStudio 中要慢得多

为啥在 GPU 中执行方法的时间比在混合器项目中的 CPU 中执行的时间更多？