使用 GPU 在 colab 上使用 Turicreate 训练对象检测模型
Posted
技术标签:
【中文标题】使用 GPU 在 colab 上使用 Turicreate 训练对象检测模型【英文标题】:Training with GPU an object detection model on colab with Turicreate 【发布时间】:2021-10-27 14:31:48 【问题描述】:我正在尝试使用带有 TuriCreate 的 GPU 在 Google Colab 上训练对象检测模型。
根据 TuriCreate 的存储库,要在训练期间使用 gpu,您必须遵循以下说明:
https://github.com/apple/turicreate/blob/main/LinuxGPU.md
但是,每次我开始训练时,shell 在开始训练之前都会产生以下输出:
"Using CPU to create model."
我的 colab 的结构如下:
设置 cuda 环境
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
!sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
!sudo apt-get update
!wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt-get update
!wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
!sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
!sudo apt-get update
# Install development and runtime libraries (~4GB)
!sudo apt-get install --no-install-recommends \
cuda-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0
# Install TensorRT. Requires that libcudnn8 is installed above.
!sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
libnvinfer-dev=7.1.3-1+cuda11.0 \
libnvinfer-plugin7=7.1.3-1+cuda11.0
tc.config.set_num_gpus(-1)
model = tc.object_detector.create(train_sf)
scores = model.evaluate(valid_sf)
print(scores['mean_average_precision'])
model.export_coreml('model.mlmodel')
使用 nvidia-smi 检查安装
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 33C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
依赖安装
!pip install turicreate
!pip uninstall -y tensorflow
!pip install tensorflow-gpu
设置 bash 环境变量
!echo export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH >> ~/.bashrc
培训
tc.config.set_num_gpus(-1)
model = tc.object_detector.create(train_sf)
scores = model.evaluate(valid_sf)
print(scores['mean_average_precision'])
model.export_coreml('model.mlmodel')
这是输出
TuriCreate currently only supports using one GPU. Setting 'num_gpus' to 1.
Using 'image' as feature column
Using 'annotations' as annotations column
Using CPU to create model.
Setting 'batch_size' to 32
我无法理解我错过了什么。
【问题讨论】:
为什么不使用 TensorFlow 或 Keras? 【参考方案1】:我设法解决了这个问题:问题是由于 colab 机器上预装的 tensorflow 版本造成的。
!pip uninstall -y tensorflow
!pip uninstall -y tensorflow-gpu
!pip install turicreate
!pip install tensorflow==2.4.0
【讨论】:
以上是关于使用 GPU 在 colab 上使用 Turicreate 训练对象检测模型的主要内容,如果未能解决你的问题,请参考以下文章
google Colab 使用教程 免费GPU google Colaboratory 上运行 pytorch tensorboard
01google Colab 使用教程 免费GPU google Colaboratory 上运行 pytorch tensorboard