华为NPU卡ubuntu(无网络连接情况)驱动安装记录

Posted Data+Science+Insight

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了华为NPU卡ubuntu(无网络连接情况)驱动安装记录相关的知识,希望对你有一定的参考价值。

华为NPU卡ubuntu驱动安装记录

 

前奏:文中的绝大多数情况都是因为服务器没有网络服务,如果读者的电脑有完全的网络服务可能参考意义不是很大。不过,处理的方法和排查的思路可以作为一个借鉴。

See the source image

root@ubuntu:/home/ubuntu# bash A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run 
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing Atlas300_software_version_20.2.0  100%  
[Driver] [2021-05-31 10:50:35] [ERROR]ERR_NO:0x0004;ERR_DES: Unrecognized parameters. Try './xxx.run --help' for more information.
root@ubuntu:/home/ubuntu# ./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run 
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing Atlas300_software_version_20.2.0  100%  
[Driver] [2021-05-31 10:51:36] [ERROR]ERR_NO:0x0004;ERR_DES: Unrecognized parameters. Try './xxx.run --help' for more information.

chmod+x

root@ubuntu:/home/ubuntu# 
root@ubuntu:/home/ubuntu# chmod +x A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run

gcc

root@ubuntu:/home/ubuntu# ./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --full
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing Atlas300_software_version_20.2.0  100%  
[Driver] [2021-05-31 11:19:50] [INFO]Start time: 2021-05-31 11:19:50
[Driver] [2021-05-31 11:19:50] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Driver] [2021-05-31 11:19:50] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Driver] [2021-05-31 11:19:51] [INFO]base version is none.
[Driver] [2021-05-31 11:19:51] [INFO]set username and usergroup, HwHiAiUser:HwHiAiUser
[ERROR]gcc: command not found
[Driver] [2021-05-31 11:19:52] [ERROR]Drv_dkms_env_check failed, details in : /var/log/ascend_seclog/ascend_install.log 
[Driver] [2021-05-31 11:19:52] [INFO]Install driver failed, please retry after uninstall and reboot!
[Driver] [2021-05-31 11:19:52] [INFO]End time: 2021-05-31 11:19:52

update


#如果有网络:
根据提示输入apt-get update
更新完成后再重新安装gcc

输入代码gcc --version查看gcc版本,就成功啦!
root@ubuntu:/home/ubuntu# apt install gcc
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu cpp cpp-7 gcc-7 gcc-7-base libasan4 libatomic1 libbinutils libc-dev-bin libc6-dev libcc1-0 libcilkrts5
  libgcc-7-dev libgomp1 libisl19 libitm1 liblsan0 libmpc3 libmpx2 libquadmath0 libtsan0 libubsan0 linux-libc-dev manpages-dev
Suggested packages:
  binutils-doc cpp-doc gcc-7-locales gcc-multilib make autoconf automake libtool flex bison gdb gcc-doc gcc-7-multilib gcc-7-doc libgcc1-dbg libgomp1-dbg
  libitm1-dbg libatomic1-dbg libasan4-dbg liblsan0-dbg libtsan0-dbg libubsan0-dbg libcilkrts5-dbg libmpx2-dbg libquadmath0-dbg glibc-doc
The following NEW packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu cpp cpp-7 gcc gcc-7 gcc-7-base libasan4 libatomic1 libbinutils libc-dev-bin libc6-dev libcc1-0 libcilkrts5
  libgcc-7-dev libgomp1 libisl19 libitm1 liblsan0 libmpc3 libmpx2 libquadmath0 libtsan0 libubsan0 linux-libc-dev manpages-dev
0 upgraded, 27 newly installed, 0 to remove and 0 not upgraded.
Need to get 30.6 MB of archives.
After this operation, 118 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Ign:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 binutils-common amd64 2.30-21ubuntu1~18.04.4
Ign:2 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libbinutils amd64 2.30-21ubuntu1~18.04.4
Ign:3 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 binutils-x86-64-linux-gnu amd64 2.30-21ubuntu1~18.04.4
Ign:4 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 binutils amd64 2.30-21ubuntu1~18.04.4
Ign:5 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 gcc-7-base amd64 7.5.0-3ubuntu1~18.04
Err:6 http://archive.ubuntu.com/ubuntu bionic/main amd64 libisl19 amd64 0.19-1

deb包现在安装



#服务器没有网络连接:
下载了一堆deb包
root@ubuntu:/home/ubuntu/ubuntu_packages# dpkg -i gcc-7-base_7.5.0-3ubuntu1~18.04_amd64.deb

进入目录之后:
sudo dpkg -i ./* 

root@ubuntu:/home/ubuntu# ./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --full
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing Atlas300_software_version_20.2.0  100%  
[Driver] [2021-05-31 13:39:54] [INFO]Start time: 2021-05-31 13:39:54
[Driver] [2021-05-31 13:39:54] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Driver] [2021-05-31 13:39:54] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Driver] [2021-05-31 13:39:54] [INFO]base version is none.
[Driver] [2021-05-31 13:39:54] [INFO]set username and usergroup, HwHiAiUser:HwHiAiUser
[ERROR]make: command not found
[Driver] [2021-05-31 13:39:55] [ERROR]Drv_dkms_env_check failed, details in : /var/log/ascend_seclog/ascend_install.log 
[Driver] [2021-05-31 13:39:55] [INFO]Install driver failed, please retry after uninstall and reboot!
[Driver] [2021-05-31 13:39:55] [INFO]End time: 2021-05-31 13:39:55

make


root@ubuntu:/home/ubuntu# make
Command 'make' not found, but can be installed with:
apt install make      
apt install make-guile

下载make包:
make_install.tar
解压:
tar -xvf make_install.tar
继续:
sudo dpkg -i ./* 
[ERROR]make: command not found
[Driver] [2021-05-31 13:55:41] [ERROR]Drv_dkms_env_check failed, details in : /var/log/ascend_seclog/ascend_install.log

查看日志:
/var/log/ascend_seclog/ascend_install.log

root@ubuntu:/var/log/ascend_seclog# vim ascend_install.log 

[Driver] [2021-05-12 16:40:06] [INFO]runPackagePath =/home/ubuntu
[Driver] [2021-05-12 16:40:06] [INFO]Start time: 2021-05-12 16:40:06
[Driver] [2021-05-12 16:40:06] [INFO]UserCommand: A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --full
[Driver] [2021-05-12 16:40:06] [INFO]base version was destroyed or not exist.
[Driver] [2021-05-12 16:40:06] [INFO]sklogd is down
[Driver] [2021-05-12 16:40:06] [ERROR]ERR_NO:0x0091;ERR_DES:HwHiAiUser not exists! Please add HwHiAiUser
[Driver] [2021-05-12 16:40:06] [INFO]End time: 2021-05-12 16:40:06
[Driver] [2021-05-12 16:40:25] [INFO]runPackagePath =/home/ubuntu
[Driver] [2021-05-12 16:40:25] [INFO]Start time: 2021-05-12 16:40:25
[Driver] [2021-05-12 16:40:25] [INFO]UserCommand: A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --full
[Driver] [2021-05-12 16:40:25] [INFO]base version was destroyed or not exist.
[Driver] [2021-05-12 16:40:25] [INFO]sklogd is down
[Driver] [2021-05-12 16:40:25] [ERROR]ERR_NO:0x0091;ERR_DES:HwHiAiUser not exists! Please add HwHiAiUser
[Driver] [2021-05-12 16:40:25] [INFO]End time: 2021-05-12 16:40:25
[Driver] [2021-05-12 16:42:19] [INFO]runPackagePath =/home/ubuntu
[Driver] [2021-05-12 16:42:19] [INFO]Start time: 2021-05-12 16:42:19
[Driver] [2021-05-12 16:42:19] [INFO]UserCommand: A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --uninstall
[Driver] [2021-05-12 16:42:19] [INFO]base version was destroyed or not exist.
[Driver] [2021-05-12 16:42:19] [INFO]sklogd is down
[Driver] [2021-05-12 16:42:19] [INFO]FEATURE_HOT_RESET is : FEATURE_HOT_RESET=n

或者:
tail(head) -f /var/log/ascend_seclog/ascend_install.log

./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --full

指定用户


# 指定一个用户
./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --full --install-username=ubuntu --install-usergroup=ubuntu

root@ubuntu:/home/ubuntu/ubuntu_packages# make -h

Command 'make' not found, but can be installed with:

apt install make      
apt install make-guile

root@ubuntu:/home/ubuntu/ubuntu_packages# 

make还是有问题
继续处理。

尝试卸载

./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --install-username=ubuntu --install-usergroup=ubuntu --full
支持工程师说可以先卸载,,,
./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run  --uninstall

卸载时有新的错误:
root@ubuntu:/home/ubuntu# ./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run  --uninstall
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing Atlas300_software_version_20.2.0  100%  
[Driver] [2021-05-31 16:50:16] [INFO]Start time: 2021-05-31 16:50:16
[Driver] [2021-05-31 16:50:16] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Driver] [2021-05-31 16:50:16] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Driver] [2021-05-31 16:50:16] [INFO]base version is none.
[Driver] [2021-05-31 16:50:16] [ERROR]ERR_NO:0x0090;ERR_DES:uninstall driver failed;Detail message in /var/log/ascend_seclog/ascend_install.log
[Driver] [2021-05-31 16:50:16] [INFO]End time: 2021-05-31 16:50:16
#尝试重启大法:
shutdown -r now 立刻重启

另外一台服务器尝试新建虚拟机

重新作了一台类似的虚拟机之后OK
先安装:
./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --install-username=ubuntu --install-usergroup=ubuntu --full
接着安装:
./A300-3000-3010-npu-firmware_1.76.22.3.220.run --install-username=ubuntu --install-usergroup=ubuntu --full

本台服务器新建虚拟机

 

#安装到90%的时候,系统会异常关机;

最后尝试在虚拟机里面关掉所有的PCI之后系统重启安装则OK,

但是尝试重新挂载PCI则虚拟机无法启动

#最后,卸载虚拟机、直接在裸机上安装ubuntu机器;

使用npu-smi info查看NPU的情况

 

总结:

tar xvf A300.tar
tar xvf install_gcc&make.tar
sudo dpkg -i *.deb
./A300-3010-npu-driver_20.2.0_ubuntu18.04-x86_64.run --install-username=ubuntu --install-usergroup=ubuntu --full

注意:

ubuntu安装需要在安装系统的过程中安装ssh、因为系统默认不安装、如果跳过后期安装需要直接在服务器上处理,无法远程连接和操作。

finalshell也是一个优秀的远程连接工具;

通过Esxi的ip地址可以直接进入vmware的管理界面,vmware workstation也是通过Esxi访问到服务器的虚拟化服务的;

 

参考:gcc离线安装 ubuntu 不用编译_在Ubuntu系统上手动安装GCC环境
参考:Ubuntu下deb包的安装方法
参考:Ubuntu 16.04 amd64下deb安装gcc5.4和所有依赖
参考:Ubuntu使用apt-get安装本地deb包
参考:linux-Ubuntu如何安装tar.gz文件
参考:ubuntu重启、关机命令
参考:NPU

参考:Atlas 800 训练服务器 npu-smi 命令参考 (型号9000) 0

以上是关于华为NPU卡ubuntu(无网络连接情况)驱动安装记录的主要内容,如果未能解决你的问题,请参考以下文章

ensp云朵检测不到环回适配器

Ubuntu右上角无网络图标,或命令行查看无IP或无网络

Ubuntu怎样安装无线网卡驱动解决无线网不能连接

ubuntu安装i226网卡驱动

Kali Linux ——在无网络情况下安装无线网卡驱动

Ubuntu怎样安装无线网卡驱动解决无线网不能连接