使用 nvidia gpu 创建 docker-compose 时“不允许使用‘设备’属性”

Posted

技术标签:

【中文标题】使用 nvidia gpu 创建 docker-compose 时“不允许使用‘设备’属性”【英文标题】:" 'devices' properties is not allowed" while creating docker-compose with nvidia gpu 【发布时间】:2021-05-23 16:31:02 【问题描述】:

问题描述

上下文信息(用于错误报告)

docker-compose version的输出

docker-compose version 1.17.1, build unknown
docker-py version: 2.5.1
CPython version: 2.7.17
OpenSSL version: OpenSSL 1.1.1  11 Sep 2018

docker version的输出

Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Dec 18 12:21:44 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Thu Dec 10 13:23:49 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.4
  GitCommit:        
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:        

docker-compose config的输出 (确保添加相关的-f 和其他标志)

ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)

重现问题的步骤

    使用简单的 nvidia cuda 映像拉取和检查 nvidia-gpu 的命令创建 Dockerfile
FROM nvidia/cuda:10.2-base
CMD nvidia-smi

2.当我们构建镜像并在没有 docker compose 的情况下运行它时,它就像一个魅力

docker image build testserver/ -t testserverimage
docker run --gpus all -exec -it testserverimage

显示 nvidia-gpu 设备

Sat Feb 20 13:10:46 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00001918:00:00.0 Off |                    0 |
| N/A   52C    P0    71W / 149W |   7897MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
    现在创建 docker-compose.yml
version: "3.5"

services:
  testserver:
    image: nvidia/cuda:10.2-base
    build: './modelserver'
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
              driver: nvidia

观察结果

ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)

预期结果

Sat Feb 20 13:10:46 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00001918:00:00.0 Off |                    0 |
| N/A   52C    P0    71W / 149W |   7897MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Stacktrace / 完整的错误信息

ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)

其他信息

操作系统版本/发行版,docker-compose 安装方法等。 操作系统信息:

NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Docker compose 安装:

sudo apt  install docker-compose

【问题讨论】:

【参考方案1】:

在文档https://docs.docker.com/compose/gpu-support/#enabling-gpu-access-to-service-containers 中:

Docker Compose v1.28.0+ 允许使用 Compose 规范中定义的设备结构来定义 GPU 预留。

您的 docker-compose 版本是 1.17.1,因此您需要将 docker-compose 至少升级到 1.28.0。

【讨论】:

非常感谢。我确实错过了版本控制,因为我认为从“apt”安装总是会给我最新版本。下次不会再发生了,再次感谢。 Ubuntu 18.04 太旧,无法在其存储库中包含 docker-compose 1.28。 是的,@zigarn。再次感谢您的帮助。

以上是关于使用 nvidia gpu 创建 docker-compose 时“不允许使用‘设备’属性”的主要内容,如果未能解决你的问题,请参考以下文章

nvidia GPU 仅适用于 python2.7

Linux下监视NVIDIA的GPU使用情况(转)

使用nvidia_gpu_expoter配合prometheus+grafana监控GPU性能

如何在Docker中使用Nvidia GPU

如何使用 NVIDIA 驱动程序/CUDA(支持 tensorflow-gpu)和带有 pip 的 Python3 为图像制作 Dockerfile?

Linux下查看NVIDIA的GPU使用情感