Nova — 启动 GPU 虚拟机

Posted 范桂飓

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Nova — 启动 GPU 虚拟机相关的知识,希望对你有一定的参考价值。

目录

文章目录

GPU Passthrough

环境

  • CentOS 7.9
  • OpenStack Train
  • NVIDIA Tesla K40

HostOS 配置

  1. Bios 开启 Intel VT-x、VT-d(Intel VT for Directed I/O)硬件辅助虚拟化功能,以及 Onboard VGA(图像显示卡)功能。
egrep -c '(vmx|svm)' /proc/cpuinfo
  1. Linux Kernel 开启 IOMMU 功能,使能 Intel VT-d。
$ vi /etc/default/grub
...
GRUB_CMDLINE_LINUX="... intel_iommu=on"

# MBR BIOS
$ grub2-mkconfig -o /boot/grub2/grub.cfg
# UEFI BIOS
$ grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

$ reboot

$ dmesg | grep -i iommu
...
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.6.3.el7.x86_64 root=UUID=4e83b2b5-5ff1-4b1b-af0f-3f6a7f8275ea ro intel_iommu=on crashkernel=auto rhgb quiet
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.6.3.el7.x86_64 root=UUID=4e83b2b5-5ff1-4b1b-af0f-3f6a7f8275ea ro intel_iommu=on crashkernel=auto rhgb quiet
[    0.000000] DMAR: IOMMU enabled
[    0.257808] DMAR-IR: IOAPIC id 3 under DRHD base  0xfbffc000 IOMMU 0
[    0.257810] DMAR-IR: IOAPIC id 1 under DRHD base  0xd7ffc000 IOMMU 1
[    0.257812] DMAR-IR: IOAPIC id 2 under DRHD base  0xd7ffc000 IOMMU 1

$ cat /proc/cmdline | grep iommu
  1. 查看 GPU 设备的 PCIe 配置信息,可见一个 PCIe GPU 设备实际上由 4 个子设备(VGA、Audio、USB、Serial bus)组成。
$ lspci -nn | grep NVID
...
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e04] (rev a1)
06:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
06:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)
06:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)
  1. 为 GPU 设备配置 VFIO(Virtual Function I/O)驱动,使得整个 PCIe GPU 设备的所有子设备都能够 Passthrough 到 VM。因为,同一个 PCIe 插槽上的若干个子设备,会被分配到同一个 IOMMU Group。进行 PCIe GPU Passthrough 时,需要将 IOMMU Group 中的所有设备都透传给同一个 VM,否则 nova-compute 会触发错误:Please ensure all devices within the iommu_group are bound to their vfio bus driver.
# 禁用 GPU 的默认驱动,为了保证设备不被宿主机使用。
$ vi /etc/modprobe.d/blacklist.conf
...
blacklist nvidia         # for VGA
blacklist snd_hda_intel  # for Audio
blacklist xhci_hcd       # for USB
blacklist nouveau
blacklist nvidiafb

# 重建新的 Kernel 镜像文件。
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)

# 加载 VFIO 驱动。
$ vi /etc/modules-load.d/openstack-gpu.conf
...
vfio_pci
vfio
vfio_iommu_type1
pci_stub
kvm
kvm_intel

# 配置 PCIe GPU 设备的 4 个子设备都使用 VFIO 驱动。
$ vi /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7

$ reboot

$ dmesg | grep -i vfio
[    6.755346] VFIO - User Level meta-driver version: 0.3
[    6.803197] vfio_pci: add [10de:1b06[ffff:ffff]] class 0x000000/00000000
[    6.803306] vfio_pci: add [10de:10ef[ffff:ffff]] class 0x000000/00000000


$ lspci -vv -s 06:00.0 | grep driver
	Kernel driver in use: vfio-pci
$ lspci -vv -s 06:00.1 | grep driver
	Kernel driver in use: vfio-pci
$ lspci -vv -s 06:00.2 | grep driver
	Kernel driver in use: vfio-pci
$ lspci -vv -s 06:00.3 | grep driver
	Kernel driver in use: vfio-pci

OpenStack 配置

  1. 查看 PCIe 设备的 [Verdor ID:Product ID]。
$ lspci -v -s 06:00.0
$ lspci -v -s 06:00.1
$ lspci -v -s 06:00.2
$ lspci -v -s 06:00.3
  1. 配置 nova-scheduler service,启用 PciPassthroughFilter 调度过滤器。
[filter_scheduler]
...
enabled_filters = ...,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
  1. 配置 nova-api service,登记 GPU 设备信息,需要根据 GPU 具体型号的实际情况来设置 device_type。
[pci]
# 2080
alias = "name":"nv2080vga","product_id":"1e04","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080aud","product_id":"10f7","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080usb","product_id":"1ad6","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080bus","product_id":"1ad7","vendor_id":"10de","device_type":"type-PCI"

## T4
#alias = "name":"nvT43D","product_id":"1eb8","vendor_id":"10de","device_type":"type-PF"

## 1080
#alias = "name":"nv1080vga","product_id":"1b06","vendor_id":"10de","device_type":"type-PCI"
#alias = "name":"nv1080aud","product_id":"10ef","vendor_id":"10de","device_type":"type-PCI"
  1. 配置 nova-compute service。
[pci]
alias = "name":"nv2080vga","product_id":"1e04","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080aud","product_id":"10f7","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080usb","product_id":"1ad6","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080bus","product_id":"1ad7","vendor_id":"10de","device_type":"type-PCI"

passthrough_whitelist = [ "vendor_id": "10de", "product_id": "1e04" ,
			              "vendor_id": "10de", "product_id": "10f7" ,
                          "vendor_id": "10de", "product_id": "1ad6" ,
                          "vendor_id": "10de", "product_id": "1ad7" ]
  1. 创建 GPU flavor。
openstack flavor create --public --ram 2048 --disk 20 --vcpus 2 m1.large

# openstack flavor set FLAVOR-NAME --property pci_passthrough:alias=ALIAS:COUNT
openstack flavor set m1.large --property pci_passthrough:alias='nv2080vga:1,nv2080aud:1,nv2080usb:1,nv2080bus:1'
  1. 隐藏 VM 的 Hypervisor ID,NIVIDIA GPU 的驱动程序会检测自己是否跑在 VM 里,如果是,则会出错。所以需要对 GPU 驱动程序隐藏 VM 的 Hypervisor ID。
$ openstack image set [IMG-UUID] --property img_hide_hypervisor_id=true

# VM 中执行
$ cpuid | grep hypervisor_id
hypervisor_id = "  @  @    "
hypervisor_id = "  @  @    "
《新程序员》:云原生和全面数字化实践 50位技术专家共同创作,文字、视频、音频交互阅读

以上是关于Nova — 启动 GPU 虚拟机的主要内容,如果未能解决你的问题,请参考以下文章

Nova — 虚拟机的 vCPU 型号与热迁移

Nova — 虚拟机的 vCPU 型号与热迁移

Nova启动虚拟机执行过程

荣耀nova7是虚拟机吗?

nova命令管理虚拟机

Openstack深入了解虚拟机