set up ovn development env (by quqi99)

Posted quqi99

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了set up ovn development env (by quqi99)相关的知识,希望对你有一定的参考价值。

作者:张华 发表于:2022-07-08
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

编译ovs并启动ovs-vswitchd

#https://docs.ovn.org/en/latest/intro/install/general.html
sudo apt-get -y install build-essential fakeroot graphviz autoconf automake \\
   bzip2 debhelper dh-autoreconf libssl-dev libtool openssl procps sparse \\
   python3 wget gdb git libnetfilter-conntrack-dev libmnl-dev libelf-dev
sudo apt purge openvswitch-switch openvswitch-common python3-openvswitch -y
git clone https://github.com/openvswitch/ovs.git
cd /bak/linux/ovs
./boot.sh
#/lib/modules/5.15.0-40-generic/build is version 5.15.35, but version newer than 5.8.x is not supported
#./configure --prefix=/usr --localstatedir=/var  --sysconfdir=/etc --enable-ssl --with-linux=/lib/modules/`uname -r`/build
./configure --help |grep debug
CFLAGS="-Wall -O2 -g" ./configure --prefix=/usr --localstatedir=/var  --sysconfdir=/etc --enable-ssl --with-debug
CFLAGS="-Wall -O2 -g" make -j8
sudo make install
sudo make modules_install #but version newer than 5.8.x is not supported
sudo ovs-dpctl show
#不删除openvswitch.ko, 可能会抛出:'Unknown symbol in module'
#但删除时可能会抛出:Module openvswitch is in use,那是因为lsmod |grep openvswitch存在引用计算
#删了之后除了下列的手动安装通过'sudo make modules_install'也能安装, 这个命令不work是因为上面没有加--with-linux=/lib/modules/`uname -r`/build
#但"version 5.15.35, but version newer than 5.8.x is not supported"这个错误装不了ko, 那还是用之前deb安装的ko模块吧
./configure --help |grep debug
#sudo /usr/share/openvswitch/scripts/ovs-ctl stop && sudo ovs-dpctl del-dp ovs-system && sudo rmmod -f openvswitch.ko
# Module openvswitch is in use,Intall new module and it's depends
#sudo modinfo ./datapath/linux/openvswitch.ko |grep depends
#sudo modprobe nf_conntrack nf_nat nf_defrag_ipv6 libcrc32c nf_nat_ipv6 gre nf_nat_ipv4
#sudo insmod ./datapath/linux/openvswitch.ko 
#sudo modinfo ./datapath/linux/openvswitch.ko |grep depends
#config_file="/etc/depmod.d/openvswitch.conf"
#for module in ./datapath/linux/*.ko; do
#  modname="$(basename $module)"
#  echo "override $modname%.ko * extra" |sudo tee -a "$config_file"
#  echo "override $modname%.ko * weak-updates" |sudo tee -a "$config_file"
#done
#sudo depmod -a
#sudo modprobe openvswitch
export PATH=$PATH:/usr/share/ovn/scripts:/usr/share/openvswitch/scripts
#start ovs-vswitchd
sudo /usr/share/openvswitch/scripts/ovs-ctl start --system-id=$(hostname)
#sudo /usr/bin/ovs-appctl -t ovsdb-server ovsdb-server/add-remote ptcp:6640:192.168.122.1 #enable remote db
#sudo ovs-vsctl show
sudo ovs-vsctl get Open_vSwitch . external-ids
#sudo cp debian/openvswitch-switch.init /etc/init.d/openvswitch-switch
#debug it
sudo cgdb -p `pidof ovs-vswitchd`

编译ovn

cd /bak/linux/
git clone https://github.com/ovn-org/ovn.git
cd ovn
./boot.sh
#configure ovn to include -g to build debug symbols and -O2 to enable optimizations,
#perf should use '--call-graph dwarf' rather than '--call-graph fp' due to 'no-omit-frame-pointer'
#the error 'utilities/ovn-nbctl.c too few arguments to function ‘inet_parse_active’' will be hit with '--enable-sparse'
#'-fsanitize=address -fno-omit-frame-pointer -fno-common' will cause: LeakSanitizer does not work under ptrace (strace, gdb, etc)
#'--enable-coverage' will cause: libgconv ... overwriting an existing profile data with a different timestamp
CFLAGS="-g -O2" ./configure  \\
  --with-ovs-source=../ovs  --with-ovs-build=../ovs \\
  --prefix=/usr --localstatedir=/var  --sysconfdir=/etc \\
  --enable-Werror --with-debug
CFLAGS="-g -O2" make -j8
sudo make install
#unit test
make check

启动ovn-central(ovn-northd, ovnnb, ovnsb)

export PATH=$PATH:/usr/share/ovn/scripts:/usr/share/openvswitch/scripts
sudo /usr/share/ovn/scripts/ovn-ctl stop_northd
sudo /usr/share/ovn/scripts/ovn-ctl start_northd

它等于:

sudo mkdir -p /etc/ovn
sudo mkdir -p /var/run/ovn
#sudo killall ovsdb-server && sudo rm -rf /etc/ovn/*
sudo /usr/bin/ovsdb-tool create /etc/ovn/ovnnb_db.db ovn-nb.ovsschema
sudo /usr/bin/ovsdb-tool create /etc/ovn/ovnsb_db.db ovn-sb.ovsschema
sudo /usr/sbin/ovsdb-server /etc/ovn/ovnnb_db.db --remote=punix:/var/run/ovn/ovnnb_db.sock \\
     --remote=db:OVN_Northbound,NB_Global,connections \\
     --private-key=db:OVN_Northbound,SSL,private_key \\
     --certificate=db:OVN_Northbound,SSL,certificate \\
     --bootstrap-ca-cert=db:OVN_Northbound,SSL,ca_cert \\
     --pidfile=/var/run/ovn/ovnnb-server.pid --detach --log-file=/var/log/ovn/ovnnb-server.log
sudo /usr/sbin/ovsdb-server /etc/ovn/ovnsb_db.db --remote=punix:/var/run/ovn/ovnsb_db.sock \\
     --remote=db:OVN_Southbound,SB_Global,connections \\
     --private-key=db:OVN_Southbound,SSL,private_key \\
     --certificate=db:OVN_Southbound,SSL,certificate \\
     --bootstrap-ca-cert=db:OVN_Southbound,SSL,ca_cert \\
     --pidfile=/var/run/ovn/ovnsb-server.pid --detach --log-file=/var/log/ovn/ovnsb-server.log
sudo /usr/bin/ovn-nbctl --no-wait init
sudo /usr/bin/ovn-sbctl init
#Start ovn-northd, telling it to connect to the OVN db servers same Unix domain socket
#sudo /usr/bin/ovn-northd --pidfile --detach --log-file=/var/log/ovn/ovn-northd.log
sudo /usr/bin/ovn-northd
sudo /usr/bin/ovn-nbctl set-connection ptcp:6641:0.0.0.0 -- set connection . inactivity_probe=60000
sudo /usr/bin/ovn-sbctl set-connection ptcp:6642:0.0.0.0 -- set connection . inactivity_probe=60000
sudo netstat -lntp |grep 664

启动ovn-host

#sudo /usr/bin/ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/var/run/ovn/ovn-controller.pid --detach --monitor
sudo /usr/share/ovn/scripts/ovn-ctl stop_controller
sudo /usr/share/ovn/scripts/ovn-ctl start_controller

然后配置它:

#on all compute chassises (we also have one chassis so it's also a compute chassis), make ovn-controller connect to southbound db.
# lxcbr0=192.168.121.1, IP=$(ip addr show eth0| awk '$1 == "inet" print $2' | cut -f1 -d/)
sudo /usr/bin/ovs-vsctl set open_vswitch .  \\
  external_ids:ovn-remote=tcp:192.168.121.1:6642 \\
  external_ids:ovn-encap-ip=192.168.121.1 \\
  external_ids:ovn-encap-type=geneve \\
  external_ids:system-id=ovn1
ovs-vsctl set Open_vSwitch . external-ids:ovn-cms-options=\\"enable-chassis-as-gw\\"
$ sudo ovs-vsctl get Open_vSwitch . external-ids
hostname=t440p.lan, ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="192.168.121.1", ovn-encap-type=geneve, ovn-remote="tcp:192.168.121.1:6642", rundir="/var/run/openvswitch", system-id=ovn1
sudo /usr/bin/ovs-vsctl add-br br-int
sudo /usr/bin/ovs-vsctl set bridge br-int protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15

源码调试

这里只是有源码调试, 编译时已经通过( CFLAGS=“-g -O2” )带了符号表了. 如果ovn是通过deb包装的得安装dbg后缀的符号表.

sudo /usr/share/ovn/scripts/ovn-ctl stop_controller
sudo killall ovn-controller
sudo cgdb --args /usr/bin/ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/var/run/ovn/ovn-controller.pid --detach --monitor
#借助 follow-fork-mode 选项,我们只能选择调试子进程还是父进程,且一经选定,调试过程中将无法改变。如果既想调试父进程,又想随时切换并调试某个子进程,就需要借助 detach-on-fork 选项
#(gdb) set detach-on-fork off #http://c.biancheng.net/view/8274.html
(gdb) set follow-fork-mode child
(gdb) b main
(gdb) b update_sb_monitors
(gdb) clear update_sb_monitors
(gdb) info b
#eg: debug 'OVNSB commit failed, force recompute next time', ovsdb_idl_loop_commit_and_wait
(gdb) b ovsdb_idl_loop_commit_and_wait
(gdb) b ovsdb_idl_wait
(gdb) r

[可选]启动一个lxd容器作为第二个节点

安装并设置lxd.

#https://blog.csdn.net/quqi99/article/details/125004749
sudo snap install lxd --classic
sudo usermod -aG $USER lxd
# MUST NOT use sudo, so must cd to home dir to run it
cd ~ && lxd init --auto
sudo chown -R $USER ~/.config/
export EDITOR=vim
#lxc storage create default dir && lxc profile device add default root disk path=/ pool=default
lxc storage show default
lxc network show lxdbr0
lxc network set lxdbr0 ipv4.address=192.168.121.1/24
lxc network set lxdbr0 ipv6.address none
ip addr show lxdbr0
sudo iptables-save |grep 192.168.121
ps -ef |grep 192.168.121
cat << EOF | tee ./lxd-profile.yaml
config:
  boot.autostart: "true"
  linux.kernel_modules: openvswitch,nbd,ip_tables,ip6_tables
  security.nesting: "true"
  security.privileged: "true"
description: ""
devices:
  eth0:
    mtu: "9000"
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  eth1:
    mtu: "9000"
    name: eth1
    nictype: bridged
    parent: lxdbr0
    type: nic
  kvm:
    path: /dev/kvm
    type: unix-char
  mem:
    path: /dev/mem
    type: unix-char
  root:
    path: /
    pool: default
    type: disk
  tun:
    path: /dev/net/tun
    type: unix-char
name: juju-default
used_by: []
EOF
lxc profile create juju-default 2>/dev/null || echo "juju-default profile already exists"
cat ./lxd-profile.yaml |lxc profile edit juju-default
lxc profile show juju-default

通过一个lxd容器:

# juju-default uses eth0 so must use eth0 here as well
cat << EOF | tee network.yml
version: 1
config:
 - type: physical
    name: eth0
    subnets:
      - type: static
        ipv4: true
        address: 192.168.121.2
        netmask: 255.255.255.0
        gateway: 192.168.121.1
        control: auto
 - type: nameserver
    address: 192.168.99.1
EOF
lxc launch ubuntu:focal ovn2 -p juju-default --config=user.network-config="$(cat network.yml)"
lxc exec `lxc list |grep ovn2 |awk -F '|' 'print $2'` bash

为了避免版本不同我们不通过apt来安装(openvswitch-switch ovn-host),而是重复上面的步骤来源码安装OVS与OVN(scp -r /bak/linux/ov* root@192.168.121.2:/root/), 然后再按下列步骤仅启动ovs-vswitchd与ovn-controller. 为方便, 也可以试试’apt install openvswitch-switch ovn-host’的.

export PATH=$PATH:/usr/share/ovn/scripts:/usr/share/openvswitch/scripts
#start ovs-vswitchd
sudo /usr/share/openvswitch/scripts/ovs-ctl start --system-id=$(hostname)
#start ovn-controller
ovs-vsctl add-br br-int
ovs-vsctl set bridge br-int protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
ovs-vsctl set open_vswitch .  \\
  external_ids:ovn-remote=tcp:192.168.122.1:6642 \\
  external_ids:ovn-encap-ip=$(ip addr show eth0| awk '$1 == "inet" print $2' | cut -f1 -d/) \\
  external_ids:ovn-encap-type=geneve \\
  external_ids:system-id=$(hostname)
ovs-vsctl set Open_vSwitch . external-ids:ovn-cms-options=\\"enable-chassis-as-gw\\"
root@ovn2:~/ovn# ovs-vsctl get Open_vSwitch . external-ids
hostname=ovn2, ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="192.168.121.2", ovn-encap-type=geneve, ovn-remote="tcp:192.168.122.1:6642", rundir="/var/run/openvswitch", system-id=ovn2

搭建一个小ovn环境测试

#https://blog.russellbryant.net/2016/11/11/ovn-logical-flows-and-ovn-trace/
# Create the first logical switch and its two ports.
sudo ovn-nbctl ls-add sw0
sudo ovn-nbctl lsp-add sw0 sw0-port1
sudo ovn-nbctl lsp-set-addresses sw0-port1 "00:00:00:00:00:01 10.0.0.51"
sudo ovn-nbctl lsp-set-port-security sw0-port1 "00:00:00:00:00:01 10.0.0.51"
sudo ovn-nbctl lsp-add sw0 sw0-port2
sudo ovn-nbctl lsp-set-addresses sw0-port2 "00:00:00:00:00:02 10.0.0.52"
sudo ovn-nbctl lsp-set-port-security sw0-port2 "00:00:00:00:00:02 10.0.0.52"
# Create the second logical switch and its two ports.
sudo ovn-nbctl ls-add sw1
sudo ovn-nbctl lsp-add sw1 sw1-port1
sudo ovn-nbctl lsp-set-addresses sw1-port1 "00:00:00:00:00:03 192.168.1.51"
sudo ovn-nbctl lsp-set-port-security sw1-port1 "00:00:00:00:00:03 192.168.1.51"
sudo ovn-nbctl lsp-add sw1 sw1-port2
sudo ovn-nbctl lsp-set-addresses sw1-port2 "00:00:00:00:00:04 192.168.1.52"
sudo ovn-nbctl lsp-set-port-security sw1-port2 "00:00:00:00:00:04 192.168.1.52"
# Create a logical router between sw0 and sw1.
sudo ovn-nbctl create Logical_Router name=lr0
sudo ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 10.0.0.1/24
sudo ovn-nbctl lsp-add sw0 sw0-lrp0 \\
    -- set Logical_Switch_Port sw0-lrp0 type=router \\
    options:router-port=lrp0 addresses='"00:00:00:00:ff:01"'
sudo ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 192.168.1.1/24
sudo ovn-nbctl lsp-add sw1 sw1-lrp1 \\
    -- set Logical_Switch_Port sw1-lrp1 type=router \\
    options:router-port=lrp1 addresses='"00:00:00:00:ff:02"'
sudo ovn-sbctl lflow-list

但是它报了下列错, 为什么呢?

$ sudo ovn-nbctl lsp-set-addresses sw0-port1 "00:00:00:00:00:01 10.0.0.51"
ovn-nbctl: 10.0.0.51: Invalid address format. See ovn-nb(5). Hint: An Ethernet address must be listed before an IP address, together as a single argument.

OVS DB CLI

sudo ovsdb-client list-dbs
sudo ovsdb-client list-tables Open_vSwitch
sudo ovsdb-client list-columns Open_vSwitch Port
#sudo ovsdb-client dump Open_vSwitch Port
sudo ovs-vsctl list Port
sudo ovs-vsctl --columns=_uuid,external_ids,name list Port
sudo ovs-vsctl --if-exists set Port b8e62998-5560-4eff-972a-01eed8d20bf6  external_ids:quqi='test'
sudo ovs-vsctl --if-exists get Port b8e62998-5560-4eff-972a-01eed8d20bf6  external_ids:quqi
sudo ovs-vsctl --if-exists remove Port b8e62998-5560-4eff-972a-01eed8d20bf6  external_ids quqi='test'
sudo ovs-vsctl --if-exists add Port b8e62998-5560-4eff-972a-01eed8d20bf6  external_ids quqi2='test2'
sudo ovs-vsctl --if-exists remove Port b8e62998-5560-4eff-972a-01eed8d20bf6  external_ids quqi2='test2'

OVS/OVN Trace

linuxbridge,是纯粹根据MAC转发,桥内的转发问题通常我们看看fdb表项正确,ebtables、iptables是否做了拦截就能解决大部分问题了. 而ovs使用openflow 流表转发报文,情况就复杂的多,特别在使用多个datapath bridge、多级流表的情况下,通过肉眼看流表还是很费力的. OVS/OVN Trace则方便来定位问题.

  • ovn-trace, 只能用于ovn,在Centrial节点执行,跟踪OVN托管的datapath,显示内容和ovn逻辑配置对应,可读性好。
  • ovs-appctl ofproto/trace, 可以在非central节点执行,而且可以跟踪non-ovn托管的datapath,但显示内容更加抽象,可读性差;
  • 它们都是通过查询ovs流表整理出路径,并非构造真实报文。 二者可以相互转换(ovs-appctl ofproto/trace “xxxx” > tmp && ovn-detrace < tmp).

另外需要依赖外部模块时,可能就无法使用。比如,如果pipeline 中用到ct,是无法真正进入ct模块做报文修改,如nat修改,就会造成trace失效,所以使用起来还是有一定限制的。
可参考我之前关于ovn的一些博客, 或者: https://www.jianshu.com/p/c308ecc3b191

以上是关于set up ovn development env (by quqi99)的主要内容,如果未能解决你的问题,请参考以下文章