基础云服务器重启之后ovs路由器中keepalived进程没正常启动导致vxlan虚拟机无法通信
Posted lsw-blogs
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了基础云服务器重启之后ovs路由器中keepalived进程没正常启动导致vxlan虚拟机无法通信相关的知识,希望对你有一定的参考价值。
问题现象
所有的 ovs 路由器的ns keepalived没有正常启动
l3日志如下:
Stderr: execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134 2019-04-25 17:32:13.457 2948 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): [‘ip‘, ‘netns‘, ‘exec‘, ‘qrouter-02b81fb5-974e-45f2-af80-70532c032737‘, ‘ip‘, ‘link‘, ‘set‘, ‘lo‘, ‘up‘] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100 2019-04-25 17:32:13.505 2948 DEBUG neutron.agent.linux.utils [-] Command: [‘ip‘, ‘netns‘, ‘exec‘, u‘qrouter-02b81fb5-974e-45f2-af80-70532c032737‘, ‘ip‘, ‘link‘, ‘set‘, ‘lo‘, ‘up‘] Exit code: 0 Stdin: Stdout: Stderr: execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134 2019-04-25 17:32:13.506 2948 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): [‘ip‘, ‘netns‘, ‘exec‘, ‘qrouter-02b81fb5-974e-45f2-af80-70532c032737‘, ‘sysctl‘, ‘-w‘, ‘net.ipv4.ip_forward=1‘] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100 2019-04-25 17:32:13.572 2948 DEBUG neutron.agent.linux.utils [-] Command: [‘ip‘, ‘netns‘, ‘exec‘, u‘qrouter-02b81fb5-974e-45f2-af80-70532c032737‘, ‘sysctl‘, ‘-w‘, ‘net.ipv4.ip_forward=1‘] Exit code: 0 Stdin: Stdout: net.ipv4.ip_forward = 1 Stderr: execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134 2019-04-25 17:32:13.577 2948 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): [‘ip‘, ‘netns‘, ‘exec‘, ‘qrouter-02b81fb5-974e-45f2-af80-70532c032737‘, ‘ip‘, ‘-o‘, ‘link‘, ‘show‘, ‘ha-63d60b5e-eb‘] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100 2019-04-25 17:32:13.635 2948 DEBUG neutron.agent.linux.utils [-] Command: [‘ip‘, ‘netns‘, ‘exec‘, u‘qrouter-02b81fb5-974e-45f2-af80-70532c032737‘, ‘ip‘, ‘-o‘, ‘link‘, ‘show‘, u‘ha-63d60b5e-eb‘] Exit code: 1 Stdin: Stdout: Stderr: execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134 2019-04-25 17:32:13.577 2948 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): [‘ip‘, ‘netns‘, ‘exec‘, ‘qrouter-02b81fb5-974e-45f2-af80-70532c032737‘, ‘ip‘, ‘-o‘, ‘link‘, ‘show‘, ‘ha-63d60b5e-eb‘] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100 2019-04-25 17:32:13.635 2948 DEBUG neutron.agent.linux.utils [-] Command: [‘ip‘, ‘netns‘, ‘exec‘, u‘qrouter-02b81fb5-974e-45f2-af80-70532c032737‘, ‘ip‘, ‘-o‘, ‘link‘, ‘show‘, u‘ha-63d60b5e-eb‘] Exit code: 1 Stdin: Stdout: Stderr: Device "ha-63d60b5e-eb" does not exist. execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134 2019-04-25 17:32:13.636 2948 DEBUG neutron.agent.linux.utils [-] Running command: [‘ip‘, ‘-o‘, ‘link‘, ‘show‘, ‘br-int‘] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84 2019-04-25 17:32:13.645 2948 DEBUG neutron.agent.linux.utils [-] Command: [‘ip‘, ‘-o‘, ‘link‘, ‘show‘, ‘br-int‘] Exit code: 1 Stdin: Stdout: Stderr: Device "br-int" does not exist. execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:134 2019-04-25 17:32:13.646 2948 ERROR neutron.agent.l3.agent [-] Failed to process compatible router ‘02b81fb5-974e-45f2-af80-70532c032737‘ 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent Traceback (most recent call last): 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 465, in _process_router_update 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent self._process_router_if_compatible(router) 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 410, in _process_router_if_compatible 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent self._process_added_router(router) 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 415, in _process_added_router 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent self._router_added(router[‘id‘], router) 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 316, in _router_added 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent ri.initialize(self.process_monitor) 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 87, in initialize 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent self.ha_network_added() 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 147, in ha_network_added 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent prefix=HA_DEV_PREFIX) 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 235, in plug 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent self.check_bridge_exists(bridge) 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 169, in check_bridge_exists 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent raise exceptions.BridgeDoesNotExist(bridge=bridge) 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent BridgeDoesNotExist: Bridge br-int does not exist. 2019-04-25 17:32:13.646 2948 TRACE neutron.agent.l3.agent
message 日志:
Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived[5202]: Starting VRRP child process, pid=5203 Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Registering Kernel netlink reflector Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Registering Kernel netlink command channel Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Registering gratuitous ARP shared channel Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Unable to load ipset library Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Unable to initialise ipsets Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Opening file ‘/var/lib/neutron/ha_confs/02b81fb5-974e-45f2-af80-70532c032737/keepalived.conf‘. Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Cant find interface ha-63d60b5e-eb for vrrp_instance VR_1 !!! Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: ha-63d60b5e-eb no match, ignoring... Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: VRRP is trying to assign ip address 169.254.0.1/24 to unknown ha-63d60b5e-eb interface !!! go out and fix your conf !!! Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Cannot specify scope for IPv6 addresses (fe80::f816:3eff:fe96:5ba7/64) - ignoring scope Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Cannot specify scope for IPv6 addresses (fe80::f816:3eff:fec1:3176/64) - ignoring scope Apr 25 17:35:35 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: VRRP_Instance(VR_1) Unknown interface ! Apr 25 17:35:35 TX-JIAKE-NETWORK-02 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port qg-5c50092e-c7 tag=2 Apr 25 17:35:35 TX-JIAKE-NETWORK-02 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port qr-7abbee5d-b0 tag=3 Apr 25 17:35:36 TX-JIAKE-NETWORK-02 Keepalived_vrrp[5203]: Stopped Apr 25 17:35:36 TX-JIAKE-NETWORK-02 Keepalived[5202]: pid 5203 exited with permanent error CONFIG. Terminating Apr 25 17:35:36 TX-JIAKE-NETWORK-02 Keepalived[5202]: Stopping Apr 25 17:35:38 TX-JIAKE-NETWORK-02 systemd: Reloading.
由以上日志可知,当路由器的ha接口加到 ns 的时候发现端口找不到,于是去查看 br-ex
桥,发现 br-ex
桥找不到,br-ex桥由 openvswitch
提供,通过查看服务启动时间发现 openvswitch
服务启动时间比 neutron-l3-agent
服务晚
解决方案
修改范围: 网络节点(一般情况为控制节点)
修改 /usr/lib/systemd/system/neutron-l3-agent.service
上面代码段改成下面代码段
[Unit] Description=OpenStack Neutron Layer 3 Agent After=syslog.target network.target
[Unit] Description=OpenStack Neutron Layer 3 Agent After=syslog.target network.target network.service openvswitch.service Requires=openvswitch.service
执行
systemctl daemon-reload
以上是关于基础云服务器重启之后ovs路由器中keepalived进程没正常启动导致vxlan虚拟机无法通信的主要内容,如果未能解决你的问题,请参考以下文章