ovn-central raft HA (by quqi99)

Posted quqi99

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ovn-central raft HA (by quqi99)相关的知识,希望对你有一定的参考价值。

作者:张华 发表于:2022-10-12
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

What’s raft

RAFT(https://raft.github.io/)是一种致性算法,它有三种节点:

Set up ovn-central raft HA env

基于3个LXD容器快速搭建ovn-central raft HA环境.

cd ~ && lxc launch faster:ubuntu/focal v1
lxc launch faster:ubuntu/focal v2
lxc launch faster:ubuntu/focal v3
#the subnet is 192.168.121.0/24
lxc config device override v1 eth0 ipv4.address=192.168.121.2
lxc config device override v2 eth0 ipv4.address=192.168.121.3
lxc config device override v3 eth0 ipv4.address=192.168.121.4
lxc stop v1 && lxc start v1 && lxc stop v2 && lxc start v2 && lxc stop v3 && lxc start v3

#on v1
lxc exec `lxc list |grep v1 |awk -F '|' 'print $2'` bash
sudo apt install ovn-central -y
cat << EOF |tee /etc/default/ovn-central
OVN_CTL_OPTS= \\
  --db-nb-addr=192.168.121.2 \\
  --db-sb-addr=192.168.121.2 \\
  --db-nb-cluster-local-addr=192.168.121.2 \\
  --db-sb-cluster-local-addr=192.168.121.2 \\
  --db-nb-create-insecure-remote=yes \\
  --db-sb-create-insecure-remote=yes \\
  --ovn-northd-nb-db=tcp:192.168.121.2:6641,tcp:192.168.121.3:6641,tcp:192.168.121.4:6641 \\
  --ovn-northd-sb-db=tcp:192.168.121.2:6642,tcp:192.168.121.3:6642,tcp:192.168.121.4:6642
EOF
rm -rf /var/lib/ovn/* && rm -rf /var/lib/ovn/.ovn*
systemctl restart ovn-central
root@v1:~# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
6943
Name: OVN_Northbound
Cluster ID: 51b9 (51b9f953-989f-4f90-9add-73dbabe3fe06)
Server ID: 6943 (69432f05-2d37-44fd-8869-2ec365bb0b4c)
Address: tcp:192.168.121.2:6643
Status: cluster member
Role: leader
Term: 2
Leader: self
Vote: self
Election timer: 1000
Log: [2, 5]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-0000 <-0000
Servers:
    6943 (6943 at tcp:192.168.121.2:6643) (self) next_index=4 match_index=4

#on v2
lxc exec `lxc list |grep v2 |awk -F '|' 'print $2'` bash
sudo apt install ovn-central -y
cat << EOF |tee /etc/default/ovn-central
OVN_CTL_OPTS= \\
  --db-nb-addr=192.168.121.3 \\
  --db-sb-addr=192.168.121.3 \\
  --db-nb-cluster-local-addr=192.168.121.3 \\
  --db-sb-cluster-local-addr=192.168.121.3 \\
  --db-nb-create-insecure-remote=yes \\
  --db-sb-create-insecure-remote=yes \\
  --ovn-northd-nb-db=tcp:192.168.121.2:6641,tcp:192.168.121.3:6641,tcp:192.168.121.4:6641 \\
  --ovn-northd-sb-db=tcp:192.168.121.2:6642,tcp:192.168.121.3:6642,tcp:192.168.121.4:6642 \\
  --db-nb-cluster-remote-addr=192.168.121.2 \\
  --db-sb-cluster-remote-addr=192.168.121.2
EOF
rm -rf /var/lib/ovn/* && rm -rf /var/lib/ovn/.ovn*
systemctl restart ovn-central
root@v2:~# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
158b
Name: OVN_Northbound
Cluster ID: 51b9 (51b9f953-989f-4f90-9add-73dbabe3fe06)
Server ID: 158b (158b0aea-ba5d-42e0-b69b-2fc05204f622)
Address: tcp:192.168.121.3:6643
Status: cluster member
Role: follower
Term: 2
Leader: 6943
Vote: unknown
Election timer: 1000
Log: [2, 7]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 <-6943
Servers:
    6943 (6943 at tcp:192.168.121.2:6643)
    158b (158b at tcp:192.168.121.3:6643) (self)


#on v3
lxc exec `lxc list |grep v3 |awk -F '|' 'print $2'` bash
sudo apt install ovn-central -y
cat << EOF |tee /etc/default/ovn-central
OVN_CTL_OPTS= \\
  --db-nb-addr=192.168.121.4 \\
  --db-sb-addr=192.168.121.4 \\
  --db-nb-cluster-local-addr=192.168.121.4 \\
  --db-sb-cluster-local-addr=192.168.121.4 \\
  --db-nb-create-insecure-remote=yes \\
  --db-sb-create-insecure-remote=yes \\
  --ovn-northd-nb-db=tcp:192.168.121.2:6641,tcp:192.168.121.3:6641,tcp:192.168.121.4:6641 \\
  --ovn-northd-sb-db=tcp:192.168.121.2:6642,tcp:192.168.121.3:6642,tcp:192.168.121.4:6642 \\
  --db-nb-cluster-remote-addr=192.168.121.2 \\
  --db-sb-cluster-remote-addr=192.168.121.2
EOF
rm -rf /var/lib/ovn/* && rm -rf /var/lib/ovn/.ovn*
systemctl restart ovn-central
root@v3:~# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
298d
Name: OVN_Northbound
Cluster ID: 51b9 (51b9f953-989f-4f90-9add-73dbabe3fe06)
Server ID: 298d (298de33b-1b92-47c4-95aa-ebcf8e80f567)
Address: tcp:192.168.121.4:6643
Status: cluster member
Role: follower
Term: 2
Leader: 6943
Vote: unknown

Election timer: 1000
Log: [2, 8]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->158b <-6943 <-158b
Servers:
    6943 (6943 at tcp:192.168.121.2:6643)
    158b (158b at tcp:192.168.121.3:6643)
    298d (298d at tcp:192.168.121.4:6643) (self)
OVN_NB_DB=tcp:192.168.121.2:6641,tcp:192.168.121.3:6641,tcp:192.168.121.4:6641 ovn-nbctl show
OVN_SB_DB=tcp:192.168.121.2:6642,tcp:192.168.121.3:6642,tcp:192.168.121.4:6642 ovn-sbctl show

Cluster Failover Testing

做一个failover的测试,停掉容器v1 (lxc stop v1), 会在v2与v3上看到如下日志,现在v2变成了leader,并且Term由2变成了3.

root@v2:~# tail -f /var/log/ovn/ovsdb-server-nb.log
2022-10-12T03:38:18.092Z|00085|raft|INFO|received leadership transfer from 6943 in term 2
2022-10-12T03:38:18.092Z|00086|raft|INFO|term 3: starting election
2022-10-12T03:38:18.095Z|00088|raft|INFO|term 3: elected leader by 2+ of 3 servers

root@v3:~# tail -f /var/log/ovn/ovsdb-server-nb.log
2022-10-12T03:38:18.095Z|00021|raft|INFO|server 158b is leader for term 3

root@v2:~# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
158b
Name: OVN_Northbound
Cluster ID: 51b9 (51b9f953-989f-4f90-9add-73dbabe3fe06)
Server ID: 158b (158b0aea-ba5d-42e0-b69b-2fc05204f622)
Address: tcp:192.168.121.3:6643
Status: cluster member
Role: leader
Term: 3
Leader: self
Vote: self
Election timer: 1000
Log: [2, 9]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: (->0000) <-298d ->298d
Servers:
    6943 (6943 at tcp:192.168.121.2:6643) next_index=9 match_index=0
    158b (158b at tcp:192.168.121.3:6643) (self) next_index=8 match_index=8
    298d (298d at tcp:192.168.121.4:6643) next_index=9 match_index=8

set inactivity-probe for raft port (6644)

ovn-sbctl list connection
#ovn-sbctl --no-leader-only list connection
ovn-sbctl --inactivity-probe=30000 set-connection pssl:6642 pssl:6644 pssl:16642
ovn-sbctl --inactivity-probe=30000 set-connection pssl:6642 pssl:6644 pssl:16642 punix:/var/run/ovn/ovnsb_db.sock
ovn-sbctl --inactivity-probe=30000 set-connection read-write role="ovn-controller" pssl:6642 read-write role="ovn-controller" pssl:6644 pssl:16642

或者使用下面的(它等价于:ovn-nbctl --inactivity-probe=57 set-connection pssl:6642 pssl:6644)

#https://mail.openvswitch.org/pipermail/ovs-discuss/2020-February/049743.html
#https://opendev.org/x/charm-ovn-central/commit/9dcd53bb75805ff733c8f10b99724ea16a2b5f25
ovn-sbctl -- --id=@connection create Connection target="pssl\\:6644" inactivity_probe=55 -- set SB_Global . connections=@connection
ovn-sbctl set connection . inactivity_probe=56

#above 'set SB_Global' will delete all then create one new, here 'add SB_Global' is only to add one new
ovn-sbctl -- --id=@connection create Connection target="pssl\\:6648" -- add SB_Global . connections @connection
ovn-sbctl --inactivity-probe=30000 set-connection pssl:6648

use ovsdb-tool to set up cluster

https://mail.openvswitch.org/pipermail/ovs-discuss/2020-February/049743.html
使用下列命令来用ovsdb-tool来创建cluster时刚开始未成功(在v2上用’ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound’ 看到'Remotes for joining: tcp:192.168.121.4:6643 tcp:192.168.121.3:6643'无法加入cluster),那是因为在v2上运行’join-cluster’命令时v2的ip (192.168.121.3:6644)应该写在最前面, 所以对于v2它应该是" tcp:192.168.121.3:6644 tcp:192.168.121.2:6644 tcp:192.168.121.4:6644",而不是" tcp:192.168.121.2:6644 tcp:192.168.121.3:6644 tcp:192.168.121.4:6644"

#reset env in all nodes(v1, v2, v3)
systemctl stop ovn-central
rm -rf /var/lib/ovn/* && rm -rf /var/lib/ovn/.ovn*
rm -rf /etc/default/ovn-central

# on v1
rm -rf /var/lib/openvswitch/ovn*b_db.db
ovsdb-tool create-cluster /var/lib/openvswitch/ovnsb_db.db /usr/share/ovn/ovn-sb.ovsschema tcp:192.168.121.2:6644
ovsdb-tool create-cluster /var/lib/openvswitch/ovnnb_db.db /usr/share/ovn/ovn-nb.ovsschema tcp:192.168.121.2:6643

# on v2
rm -rf /var/lib/openvswitch/ovn*b_db.db
ovsdb-tool join-cluster /var/lib/openvswitch/ovnsb_db.db OVN_Southbound tcp:192.168.121.3:6644 tcp:192.168.121.2:6644 tcp:192.168.121.4:6644
ovsdb-tool join-cluster /var/lib/openvswitch/ovnnb_db.db OVN_Northbound tcp:192.168.121.3:6643 tcp:192.168.121.2:6643 tcp:192.168.121.4:6643

# on v3
rm -rf /var/lib/openvswitch/ovn*b_db.db
ovsdb-tool join-cluster /var/lib/openvswitch/ovnsb_db.db OVN_Southbound tcp:192.168.121.4:6644 tcp:192.168.121.2:6644 tcp:192.168.121.3:6644
ovsdb-tool join-cluster /var/lib/openvswitch/ovnnb_db.db OVN_Northbound tcp:192.168.121.4:6643 tcp:192.168.121.2:6643 tcp:192.168.121.3:6643

# then append the following content in /etc/default/ovn-central, finally restart ovn-central
--db-nb-file=/var/lib/openvswitch/ovnnb_db.db --db-sb-file=/var/lib/openvswitch/ovnsb_db.db

以上是关于ovn-central raft HA (by quqi99)的主要内容,如果未能解决你的问题,请参考以下文章