nvidia-smi:Failed to initialize NVML: Driver/library version mismatch
Posted 刘润森!
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了nvidia-smi:Failed to initialize NVML: Driver/library version mismatch相关的知识,希望对你有一定的参考价值。
在公司电脑上,经常遇到Failed to initialize NVML: Driver/library version mismatch
其实呢,就是显卡和Driver版本不匹配。
(base) ng@ng-Z390:/home/lrs/KAIR-master$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
有人说删驱动,其实很傻逼的,如果有驱动,删了浪费时间。
查看nvcc
,就知道有驱动了。
(base) ng@ng-Z390:/home/lrs/KAIR-master$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0
查看nvidia的version
(base) ng@ng-Z390:/home/lrs/KAIR-master$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.73.01 Thu Apr 1 21:40:36 UTC 2021
GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
Failed to initialize NVML: Driver/library version mismatch最正确的方法是sudo dkms install -m nvidia -v 460.73.01
,460.73.01
是版本。
如果安装报错,就查看对应的log。
unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j16 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-73-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.4.0-73-generic/build LD=/usr/bin/ld.bfd modules....(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.4.0-73-generic (x86_64)
Consult /var/lib/dkms/nvidia/460.73.01/build/make.log for more information.
我的log是/var/lib/dkms/nvidia/460.73.01/build/make.log
下面是log 报错的原因
cc: error: unrecognized command line option ‘-fstack-protector-strong’
make[2]: *** [/var/lib/dkms/nvidia/460.73.01/build/nvidia/nv-acpi.o] Error 1
Makefile:1760: recipe for target '/var/lib/dkms/nvidia/460.73.01/build' failed
make[1]: *** [/var/lib/dkms/nvidia/460.73.01/build] Error 2
make[1]: 离开目录“/usr/src/linux-headers-5.4.0-73-generic”
Makefile:80: recipe for target 'modules' failed
make: *** [modules] Error 2
这个cc: error: unrecognized command line option ‘-fstack-protector-strong’基本上是C++编译的问题,因此建议换gcc版本
之前是4.7的,更了4.8或者7的都没有问题。
ubuntu安装gcc
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-7
sudo apt-get install g++-7
(base) ng@ng-Z390:~/miniconda3$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 99
(base) ng@ng-Z390:~/miniconda3$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 100
在设置gcc设置软链接可能会出现错误,下面是具体的解决方法:
修改软连接
查看博客:https://blog.csdn.net/recher_He1107/article/details/106739850
如果没有问题,就设置默认gcc版本,再安装sudo dkms install -m nvidia -v 460.73.01
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 100
(base) ng@ng-Z390:~/miniconda3$ sudo dkms install -m nvidia -v 460.73.01
安装好了,就基本没有问题,如果出现什么文件存在,其实之前安装报错,文件存在,删除就可以了
(base) ng@ng-Z390:~$ nvidia-smi
Mon Jun 28 14:03:35 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 25% 64C P0 50W / 250W | 0MiB / 11016MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
这几天发现又出现了问题,看了一个nvidia 的版本突然变成了460.80
,按照上面的方法,重新了安装了460.80
sudo dkms install -m nvidia -v 460.80
我于是在ubuntu18.04 配置禁止升级并安装NVIDIA显卡驱动
修改配置文件/etc/apt/apt.conf.d/10periodic
#0是关闭,1是开启,将所有值改为0
(base) ng@ng-Z390:/etc/apt/apt.conf.d$ cat 10periodic
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
APT::Periodic::Unattended-Upgrade "1";
(base) ng@ng-Z390:/etc/apt/apt.conf.d$ cat 10periodic
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
APT::Periodic::Unattended-Upgrade "0";
(base) ng@ng-Z390:/etc/apt/apt.conf.d$ sudo apt-mark hold linux-image-generic linux-headers-generic
linux-image-generic 已经设置为保留。
linux-headers-generic 已经设置为保留
以上是关于nvidia-smi:Failed to initialize NVML: Driver/library version mismatch的主要内容,如果未能解决你的问题,请参考以下文章
Failed to initialize NVML: Driver/library version mismatch
init : Failed to spawn readahead-collector main process :unable to execute ...
nvidia-smi报错:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver 原因及避坑解决方案
JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510) 解决
RuntimeError: Failed to init API, possibly an invalid tessdata path: C:UsersylpPycharmProjectsun(示例代
Gradle sync failed: Gradle sync failed: Timeout waiting to lock cp_init remapped class cache for a2h