RDMA技术浅析

Posted yuanyun_elber

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了RDMA技术浅析相关的知识,希望对你有一定的参考价值。

环境

纸上谈兵了这么多,我们还是来做一下rdma的测试看看。公司正好有mellanox的网卡,网卡是

[root@localhost ~]# lspci -vvv |grep Eth
01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
01:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

Linux版本

[root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) [root@localhost ~]# uname -r 3.10.0-1160.el7.x86_64

固件版本是

[root@localhost bak]# flint -d /dev/mst/mt4117_pciconf0 -i fw-ConnectX4Lx-rel-14_31_1014-MCX4121A-ACA_Ax-UEFI-14.24.13-FlexBoot-3.6.403.bin  burn
Current FW version on flash:  14.23.1020
New FW version:               14.31.1014

FSMST_INITIALIZE -   OK
Writing Boot image component -   OK
-I- To load new FW run mlxfwreset or reboot machine.

安装OFED

mellanox的ofed下载地址如下:

https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

下载自己操作系统对应的版本

tar xvf MLNX_OFED_SRC-5.5-1.0.3.2.tgz cd MLNX_OFED_SRC-5.5-1.0.3.2/ ./install.pl

安装完之后,看到了GUID和若干PASS的状态

[root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# hca_self_test.ofed
---- Performing Adapter Device Self Test ----
Number of CAs Detected ................. 2
PCI Device Check ....................... PASS
Kernel Arch ............................ x86_64
Host Driver Version .................... OFED-internal-5.5-1.0.3: 3.10.0-1160.el7.x86_64
Host Driver RPM Check .................. PASS
Firmware on CA #0 NIC .................. v14.23.1020
Firmware on CA #1 NIC .................. v14.23.1020
Host Driver Initialization ............. PASS
Number of CA Ports Active .............. 0
Port State of Port #1 on CA #0 (NIC)..... DOWN (Ethernet)
Port State of Port #1 on CA #1 (NIC)..... DOWN (Ethernet)
Error Counter Check on CA #0 (NIC)...... PASS
Error Counter Check on CA #1 (NIC)...... PASS
Kernel Syslog Check .................... PASS
Node GUID on CA #0 (NIC) ............... 98:03:9b:03:00:48:bd:c8
Node GUID on CA #1 (NIC) ............... 98:03:9b:03:00:48:bd:c9

可以输入一些命令查看ib的状态

[root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# ibdev2netdev        //查看以太网设备和IB设备/端口之间的关联
mlx5_0 port 1 ==> eth1 (Down)
mlx5_1 port 1 ==> eth2 (Down)
[root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# ibv_devinfo
hca_id: mlx5_0
transport:                      InfiniBand (0)                    //IB协议
fw_ver:                         14.23.1020
node_guid:                      9803:9b03:0048:bdc8
sys_image_guid:                 9803:9b03:0048:bdc8
vendor_id:                      0x02c9
vendor_part_id:                 4117
hw_ver:                         0x0
board_id:                       MT_2420110034
phys_port_cnt:                  1
port:   1
state:                  PORT_DOWN (1)
max_mtu:                4096 (5)
active_mtu:             1024 (3)
sm_lid:                 0
port_lid:               0
port_lmc:               0x00
link_layer:             Ethernet
hca_id: mlx5_1
transport:                      InfiniBand (0)
fw_ver:                         14.23.1020
node_guid:                      9803:9b03:0048:bdc9
sys_image_guid:                 9803:9b03:0048:bdc8
vendor_id:                      0x02c9
vendor_part_id:                 4117
hw_ver:                         0x0
board_id:                       MT_2420110034
phys_port_cnt:                  1
port:   1
state:                  PORT_DOWN (1)
max_mtu:                4096 (5)
active_mtu:             1024 (3)
sm_lid:                 0
port_lid:               0
port_lmc:               0x00
link_layer:             Ethernet

从上面的打印来看,目前的state还是PORT_DOWN,而且link_layer不是IB模式,网上说要修改LINK_TYPE_P1为1(1是IB模式,2是ethernet模式)

[root@localhost ~]# mlxconfig -d /dev/mst/mt4117_pciconf0 query |grep LINK

但是没找到LINK_TYPE_P1这个选项。

怀疑是不是固件版本的问题

更新固件试试

网上查了一下,需要下一个MST的工具包

https://network.nvidia.com/products/adapter-software/firmware-tools/

tar xvf mft-4.18.0-106-x86_64-rpm.tgz
cd mft-4.18.0-106-x86_64-rpm/
./install.sh
mst start
service mst status

下载最新版本的固件

https://network.nvidia.com/support/firmware/connectx4lxen/

[root@localhost bak]# flint -d /dev/mst/mt4117_pciconf0 -i fw-ConnectX4Lx-rel-14_31_1014-MCX4121A-ACA_Ax-UEFI-14.24.13-FlexBoot-3.6.403.bin  burn
Current FW version on flash:  14.23.1020
New FW version:               14.31.1014

FSMST_INITIALIZE -   OK
Writing Boot image component -   OK
-I- To load new FW run mlxfwreset or reboot machine.

没有效果

下载老一点的驱动,5.1的,替换5.5的驱动,还是不行

后来在这个网址看到如下信息:

https://access.redhat.com/articles/3082811

Note that the card in the example output is an Ethernet-only card, so there is no port type setting.

这里就提到了connect4x lx网卡是不支持IB的,但是为啥mlxconfig query又显示transport是IB呢,太奇怪了。

感觉无法做这个测试了。

transport: InfiniBand (0)

而且connect4x lx和connect4x都是mlx5芯片的 ,原生就应该支持IB,为啥要搞出个不支持rdma的板卡呢。

https://mymellanox.force.com/mellanoxcommunity/s/question/0D51T00008dGyJMSA0/how-to-use-mellanox-connectx4-lx

这个网址同样提到

Unfortunately, I'm starting to think that I have the wrong card (and that this only works for Ethernet), because I am unable to change the link type of this card to infiniband. I have followed all the instructions, but it says that the option (LINK_TYPE) isn't found when I try via the command line.​

以上是关于RDMA技术浅析的主要内容,如果未能解决你的问题,请参考以下文章

RDMA技术浅析

RDMA技术浅析

RDMA技术浅析

RDMA技术浅析

RDMA技术浅析

RDMA技术浅析