基础云服务器重启之后ovs路由器中keepalived进程没正常启动导致vxlan虚拟机无法通信

Posted lsw-blogs

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了基础云服务器重启之后ovs路由器中keepalived进程没正常启动导致vxlan虚拟机无法通信相关的知识,希望对你有一定的参考价值。

问题现象

 

所有的 ovs 路由器的ns keepalived没有正常启动

l3日志如下:

    Stderr:  execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134
    2019-04-25 17:32:13.457 2948 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): [ip, netns, exec, qrouter-02b81fb5-974e-45f2-af80-70532c032737, ip, link, set, lo, up] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100
    2019-04-25 17:32:13.505 2948 DEBUG neutron.agent.linux.utils [-]
    Command: [ip, netns, exec, uqrouter-02b81fb5-974e-45f2-af80-70532c032737, ip, link, set, lo, up]
    Exit code: 0
    Stdin:
    Stdout:
    Stderr:  execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134
    2019-04-25 17:32:13.506 2948 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): [ip, netns, exec, qrouter-02b81fb5-974e-45f2-af80-70532c032737, sysctl, -w, net.ipv4.ip_forward=1] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100
    2019-04-25 17:32:13.572 2948 DEBUG neutron.agent.linux.utils [-]
    Command: [ip, netns, exec, uqrouter-02b81fb5-974e-45f2-af80-70532c032737, sysctl, -w, net.ipv4.ip_forward=1]
    Exit code: 0
    Stdin:
    Stdout: net.ipv4.ip_forward = 1
    Stderr:  execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134
    2019-04-25 17:32:13.577 2948 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): [ip, netns, exec, qrouter-02b81fb5-974e-45f2-af80-70532c032737, ip, -o, link, show, ha-63d60b5e-eb] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100
    2019-04-25 17:32:13.635 2948 DEBUG neutron.agent.linux.utils [-]
    Command: [ip, netns, exec, uqrouter-02b81fb5-974e-45f2-af80-70532c032737, ip, -o, link, show, uha-63d60b5e-eb]
    Exit code: 1
    Stdin:
    Stdout:
    Stderr:  execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134
    2019-04-25 17:32:13.577 2948 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): [ip, netns, exec, qrouter-02b81fb5-974e-45f2-af80-70532c032737, ip, -o, link, show, ha-63d60b5e-eb] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100
    2019-04-25 17:32:13.635 2948 DEBUG neutron.agent.linux.utils [-]
    Command: [ip, netns, exec, uqrouter-02b81fb5-974e-45f2-af80-70532c032737, ip, -o, link, show, uha-63d60b5e-eb]
    Exit code: 1
    Stdin:
    Stdout:
    Stderr: Device "ha-63d60b5e-eb" does not exist.
     execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134
    2019-04-25 17:32:13.636 2948 DEBUG neutron.agent.linux.utils [-] Running command: [ip, -o, link, show, br-int] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
    2019-04-25 17:32:13.645 2948 DEBUG neutron.agent.linux.utils [-]
    Command: [ip, -o, link, show, br-int]
    Exit code: 1
    Stdin:
    Stdout:
    Stderr: Device "br-int" does not exist.
     execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134
    2019-04-25 17:32:13.646 2948 ERROR neutron.agent.l3.agent [-] Failed to process compatible router 02b81fb5-974e-45f2-af80-70532c032737
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent Traceback (most recent call last):
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 465, in _process_router_update
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent     self._process_router_if_compatible(router)
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 410, in _process_router_if_compatible
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent     self._process_added_router(router)
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 415, in _process_added_router
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent     self._router_added(router[id], router)
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 316, in _router_added
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent     ri.initialize(self.process_monitor)
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 87, in initialize
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent     self.ha_network_added()
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 147, in ha_network_added
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent     prefix=HA_DEV_PREFIX)
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 235, in plug
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent     self.check_bridge_exists(bridge)
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 169, in check_bridge_exists
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent     raise exceptions.BridgeDoesNotExist(bridge=bridge)
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent BridgeDoesNotExist: Bridge br-int does not exist.
    2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent

message 日志:

 

    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived[5202]: Starting VRRP child process, pid=5203
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Registering Kernel netlink reflector
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Registering Kernel netlink command channel
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Registering gratuitous ARP shared channel
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Unable to load ipset library
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Unable to initialise ipsets
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Opening file /var/lib/neutron/ha_confs/02b81fb5-974e-45f2-af80-70532c032737/keepalived.conf.
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Cant find interface ha-63d60b5e-eb for vrrp_instance VR_1 !!!
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: ha-63d60b5e-eb no match, ignoring...
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: VRRP is trying to assign ip address 169.254.0.1/24 to unknown ha-63d60b5e-eb interface !!! go out and fix your conf !!!
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Cannot specify scope for IPv6 addresses (fe80::f816:3eff:fe96:5ba7/64) - ignoring scope
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Cannot specify scope for IPv6 addresses (fe80::f816:3eff:fec1:3176/64) - ignoring scope
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: VRRP_Instance(VR_1) Unknown interface !
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port qg-5c50092e-c7 tag=2
    Apr 25 17:35:35 TX-JIAKE-NETWORK-02 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port qr-7abbee5d-b0 tag=3
    Apr 25 17:35:36 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Stopped
    Apr 25 17:35:36 TX-JIAKE-NETWORK-02 Keepalived[5202]: pid 5203 exited with permanent error CONFIG. Terminating
    Apr 25 17:35:36 TX-JIAKE-NETWORK-02 Keepalived[5202]: Stopping
    Apr 25 17:35:38 TX-JIAKE-NETWORK-02 systemd: Reloading.

 

 

由以上日志可知,当路由器的ha接口加到 ns 的时候发现端口找不到,于是去查看 br-ex 桥,发现 br-ex 桥找不到,br-ex桥由 openvswitch 提供,通过查看服务启动时间发现 openvswitch 服务启动时间比 neutron-l3-agent 服务晚

解决方案

 

修改范围: 网络节点(一般情况为控制节点)

 

修改 /usr/lib/systemd/system/neutron-l3-agent.service 上面代码段改成下面代码段

 

    [Unit]
    Description=OpenStack Neutron Layer 3 Agent
    After=syslog.target network.target
    [Unit]
    Description=OpenStack Neutron Layer 3 Agent
    After=syslog.target network.target network.service openvswitch.service
    Requires=openvswitch.service

 

执行

 

  1. systemctl daemon-reload

以上是关于基础云服务器重启之后ovs路由器中keepalived进程没正常启动导致vxlan虚拟机无法通信的主要内容,如果未能解决你的问题,请参考以下文章

解决keepalived服务无法生成VIP故障

keepalived安装与配置

Keepalived+LVS/DR服务器 基础搭建

keepalived工作原理

Tstack基础云修改网络模式Vxlan_OVs到Vlan_OVS

【keepalived】keepalived的非抢占模式与单播模式