Nova — 启动 GPU 虚拟机
Posted 范桂飓
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Nova — 启动 GPU 虚拟机相关的知识,希望对你有一定的参考价值。
目录
文章目录
GPU Passthrough
环境
- CentOS 7.9
- OpenStack Train
- NVIDIA Tesla K40
HostOS 配置
- Bios 开启 Intel VT-x、VT-d(Intel VT for Directed I/O)硬件辅助虚拟化功能,以及 Onboard VGA(图像显示卡)功能。
egrep -c '(vmx|svm)' /proc/cpuinfo
- Linux Kernel 开启 IOMMU 功能,使能 Intel VT-d。
$ vi /etc/default/grub
...
GRUB_CMDLINE_LINUX="... intel_iommu=on"
# MBR BIOS
$ grub2-mkconfig -o /boot/grub2/grub.cfg
# UEFI BIOS
$ grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
$ reboot
$ dmesg | grep -i iommu
...
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.6.3.el7.x86_64 root=UUID=4e83b2b5-5ff1-4b1b-af0f-3f6a7f8275ea ro intel_iommu=on crashkernel=auto rhgb quiet
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.6.3.el7.x86_64 root=UUID=4e83b2b5-5ff1-4b1b-af0f-3f6a7f8275ea ro intel_iommu=on crashkernel=auto rhgb quiet
[ 0.000000] DMAR: IOMMU enabled
[ 0.257808] DMAR-IR: IOAPIC id 3 under DRHD base 0xfbffc000 IOMMU 0
[ 0.257810] DMAR-IR: IOAPIC id 1 under DRHD base 0xd7ffc000 IOMMU 1
[ 0.257812] DMAR-IR: IOAPIC id 2 under DRHD base 0xd7ffc000 IOMMU 1
$ cat /proc/cmdline | grep iommu
- 查看 GPU 设备的 PCIe 配置信息,可见一个 PCIe GPU 设备实际上由 4 个子设备(VGA、Audio、USB、Serial bus)组成。
$ lspci -nn | grep NVID
...
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e04] (rev a1)
06:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
06:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)
06:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)
- 为 GPU 设备配置 VFIO(Virtual Function I/O)驱动,使得整个 PCIe GPU 设备的所有子设备都能够 Passthrough 到 VM。因为,同一个 PCIe 插槽上的若干个子设备,会被分配到同一个 IOMMU Group。进行 PCIe GPU Passthrough 时,需要将 IOMMU Group 中的所有设备都透传给同一个 VM,否则 nova-compute 会触发错误:Please ensure all devices within the iommu_group are bound to their vfio bus driver.
# 禁用 GPU 的默认驱动,为了保证设备不被宿主机使用。
$ vi /etc/modprobe.d/blacklist.conf
...
blacklist nvidia # for VGA
blacklist snd_hda_intel # for Audio
blacklist xhci_hcd # for USB
blacklist nouveau
blacklist nvidiafb
# 重建新的 Kernel 镜像文件。
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
# 加载 VFIO 驱动。
$ vi /etc/modules-load.d/openstack-gpu.conf
...
vfio_pci
vfio
vfio_iommu_type1
pci_stub
kvm
kvm_intel
# 配置 PCIe GPU 设备的 4 个子设备都使用 VFIO 驱动。
$ vi /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7
$ reboot
$ dmesg | grep -i vfio
[ 6.755346] VFIO - User Level meta-driver version: 0.3
[ 6.803197] vfio_pci: add [10de:1b06[ffff:ffff]] class 0x000000/00000000
[ 6.803306] vfio_pci: add [10de:10ef[ffff:ffff]] class 0x000000/00000000
$ lspci -vv -s 06:00.0 | grep driver
Kernel driver in use: vfio-pci
$ lspci -vv -s 06:00.1 | grep driver
Kernel driver in use: vfio-pci
$ lspci -vv -s 06:00.2 | grep driver
Kernel driver in use: vfio-pci
$ lspci -vv -s 06:00.3 | grep driver
Kernel driver in use: vfio-pci
OpenStack 配置
- 查看 PCIe 设备的 [Verdor ID:Product ID]。
$ lspci -v -s 06:00.0
$ lspci -v -s 06:00.1
$ lspci -v -s 06:00.2
$ lspci -v -s 06:00.3
- 配置 nova-scheduler service,启用 PciPassthroughFilter 调度过滤器。
[filter_scheduler]
...
enabled_filters = ...,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
- 配置 nova-api service,登记 GPU 设备信息,需要根据 GPU 具体型号的实际情况来设置 device_type。
[pci]
# 2080
alias = "name":"nv2080vga","product_id":"1e04","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080aud","product_id":"10f7","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080usb","product_id":"1ad6","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080bus","product_id":"1ad7","vendor_id":"10de","device_type":"type-PCI"
## T4
#alias = "name":"nvT43D","product_id":"1eb8","vendor_id":"10de","device_type":"type-PF"
## 1080
#alias = "name":"nv1080vga","product_id":"1b06","vendor_id":"10de","device_type":"type-PCI"
#alias = "name":"nv1080aud","product_id":"10ef","vendor_id":"10de","device_type":"type-PCI"
- 配置 nova-compute service。
[pci]
alias = "name":"nv2080vga","product_id":"1e04","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080aud","product_id":"10f7","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080usb","product_id":"1ad6","vendor_id":"10de","device_type":"type-PCI"
alias = "name":"nv2080bus","product_id":"1ad7","vendor_id":"10de","device_type":"type-PCI"
passthrough_whitelist = [ "vendor_id": "10de", "product_id": "1e04" ,
"vendor_id": "10de", "product_id": "10f7" ,
"vendor_id": "10de", "product_id": "1ad6" ,
"vendor_id": "10de", "product_id": "1ad7" ]
- 创建 GPU flavor。
openstack flavor create --public --ram 2048 --disk 20 --vcpus 2 m1.large
# openstack flavor set FLAVOR-NAME --property pci_passthrough:alias=ALIAS:COUNT
openstack flavor set m1.large --property pci_passthrough:alias='nv2080vga:1,nv2080aud:1,nv2080usb:1,nv2080bus:1'
- 隐藏 VM 的 Hypervisor ID,NIVIDIA GPU 的驱动程序会检测自己是否跑在 VM 里,如果是,则会出错。所以需要对 GPU 驱动程序隐藏 VM 的 Hypervisor ID。
$ openstack image set [IMG-UUID] --property img_hide_hypervisor_id=true
# VM 中执行
$ cpuid | grep hypervisor_id
hypervisor_id = " @ @ "
hypervisor_id = " @ @ "
《新程序员》:云原生和全面数字化实践
50位技术专家共同创作,文字、视频、音频交互阅读
以上是关于Nova — 启动 GPU 虚拟机的主要内容,如果未能解决你的问题,请参考以下文章