ovn metadata (by quqi99)

Posted quqi99

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ovn metadata (by quqi99)相关的知识,希望对你有一定的参考价值。

作者:张华 发表于:2022-08-25
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

问题

客户描述虚机的metadata功能偶尔有问题,他们怀疑是:https://bugs.launchpad.net/networking-ovn/+bug/1953591
客户还说出现问题之后在出现问题的虚机所在的host上重启neutron-ovn-metadata-agent服务可以恢复服务,重启这个服务会重新运行:

/usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2 haproxy -f /var/lib/neutron/ovn-metadata-proxy/63e2c276-60dd-40e3-baa1-c16342eacce2.conf

ovn metadata原理

我很熟悉neutron metadata, 但ovn metadata弄得还是很多,但它们的原理应该是类似的.

  • 对于sriov port它会有一个对应的external port来提供ovn metadata功能,所以ovn medata是在GW chassis上做的,与compute chassis并不在同一节点.流程见:20220607 - sriov ovn metadata - https://blog.csdn.net/quqi99/article/details/116893909 ,
  • 对于普通虚机的metadata是由它所在的compute chassis上的ovnmeta-xxx来做.

1, 虚机VM(192.168.21.75)内可以由dhcp(192.168.21.2)提供主机路由来直接转发到元数据代理neutron-ovn-metadata-agent(192.168.21.2), 若没有启用dhcp则由GW转发至neutron-ovn-metadata-agent

$ ssh -i ~/testkey.priv ubuntu@10.5.150.219 -- ip route list |grep 169
169.254.169.254 via 192.168.21.2 dev ens2 proto dhcp src 192.168.21.75 metric 100

$ openstack port list
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                           | Status |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+
| 48244606-524d-4c24-bc54-0143869d4267 |      | fa:16:3e:43:81:23 | ip_address='192.168.21.2', subnet_id='7a3260aa-e2a6-4721-8f0a-5b06aec71b53'  | DOWN   |
| 5043ab8b-184d-4d83-abf8-fa044aec9a6d |      | fa:16:3e:e2:8c:a0 | ip_address='10.5.152.84', subnet_id='a0f40836-715c-4dcc-9df1-fb4093aa0f79'   | DOWN   |
| 544e4423-d3d8-418e-8b6c-ea294ff0d1b3 |      | fa:16:3e:a2:83:76 |                                                                              | DOWN   |
| 70b96954-32a2-488b-b67b-4bf039189b02 |      | fa:16:3e:42:29:d4 | ip_address='192.168.21.1', subnet_id='7a3260aa-e2a6-4721-8f0a-5b06aec71b53'  | ACTIVE |
| c99885ae-1778-46cb-af07-29cf32113ef7 |      | fa:16:3e:4d:03:e6 | ip_address='10.5.150.219', subnet_id='a0f40836-715c-4dcc-9df1-fb4093aa0f79'  | N/A    |
| dd6bd055-3460-459e-b9b2-8da7211dacdc |      | fa:16:3e:53:b7:77 | ip_address='192.168.21.75', subnet_id='7a3260aa-e2a6-4721-8f0a-5b06aec71b53' | ACTIVE |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+

2, 计算节点上运行一个neutron-ovn-metadata-agent, 同时neutron-ovn-metadata-agent为每个network创建一个ovnmeta-xxx并启动haproxy实例.
  如果ovn-meta-xxx不存在就会被创建一个,并且使用veth pair将ovn-meta-xxx(eg:tap8ea7d54e-51)与host(eg:tap8ea7d54e-50, 再加到br-int)相连.
haproxy实例监听169.254.169.254:80,并加入X-OVN-Network-ID头,最后转到backend (unix socket: /var/lib/neutron/metadata_proxy)

#nova-compute/0 has 3 processes about metadata
/usr/bin/python3 /usr/bin/neutron-ovn-metadata-agent --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/neutron_ovn_metadata_agent.ini --log-file=/var/log/neutron/neutron-ovn-metadata-agent.log
/usr/bin/python3 /usr/bin/nova-api-metadata --config-file=/etc/nova/nova.conf --log-file=/var/log/nova/nova-api-metadata.log
haproxy -f /var/lib/neutron/ovn-metadata-proxy/8ea7d54e-5fe0-408d-9de8-5ccbed8ede84.conf

$ sudo ip netns exec ovnmeta-8ea7d54e-5fe0-408d-9de8-5ccbed8ede84 ip addr show tap8ea7d54e-51 |grep -E 'link|inet'
    link/ether fa:16:3e:43:81:23 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.169.254/32 brd 169.254.169.254 scope global tap8ea7d54e-51
    inet 192.168.21.2/24 brd 192.168.21.255 scope global tap8ea7d54e-51
$ ip addr show tap8ea7d54e-50 |grep link
    link/ether ca:cb:5c:af:dd:22 brd ff:ff:ff:ff:ff:ff link-netns ovnmeta-8ea7d54e-5fe0-408d-9de8-5ccbed8ede84
$ sudo ovs-vsctl show |grep 8ea7d54e
        Port tap8ea7d54e-50
            Interface tap8ea7d54e-50

$ sudo ip netns exec ovnmeta-8ea7d54e-5fe0-408d-9de8-5ccbed8ede84 route -n
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.21.0    0.0.0.0         255.255.255.0   U     0      0        0 tap8ea7d54e-51

$ cat /var/lib/neutron/ovn-metadata-proxy/8ea7d54e-5fe0-408d-9de8-5ccbed8ede84.conf
global
    log         /dev/log local0 info
    user        root
    group       root
    maxconn     1024
    pidfile     /var/lib/neutron/external/pids/8ea7d54e-5fe0-408d-9de8-5ccbed8ede84.pid.haproxy
    daemon
defaults
    log global
    mode http
    option httplog
    option dontlognull
    option http-server-close
    option forwardfor
    retries                 3
    timeout http-request    30s
    timeout connect         30s
    timeout client          32s
    timeout server          32s
    timeout http-keep-alive 30s
listen listener
    bind 169.254.169.254:80
    server metadata /var/lib/neutron/metadata_proxy
    http-request add-header X-OVN-Network-ID 8ea7d54e-5fe0-408d-9de8-5ccbed8ede84

3, neutron-ovn-metadata-agent与unix socket(/var/lib/neutron/metadata_proxy)相连,它再将请求发给本机上nova-api-metadata(DVR模式下每个计算节点都会有自己的nova-api-metadata)

已有数据分析

无奈,客户遇到问题了并在07-21 17.15重启了neutron-ovn-metadata-agent, 他抓的sosreport是在运行workaround之后抓的,参考价值有限啊.
他提到了这个namespace - ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2, 所以搜索出如下信息,但是因为sosreport是在运行workaround之后抓的所以看不出什么问题.

$ grep -r 'tap63e2c276' sos_commands/networking/namespaces/ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2/ip_netns_exec_ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2_ip_-d_address_show
2: tap63e2c276-61@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 100.94.98.4/23 brd 100.94.99.255 scope global tap63e2c276-61
    inet 169.254.169.254/32 brd 169.254.169.254 scope global tap63e2c276-61

但是根据这个IP (100.94.98.4)却搜不到东西(find var/log/neutron/* |xargs zgrep -i -E ‘100.94.98.4,’),但是却搜到了一个关于100.94.98.40的日志:

var/log/neutron/neutron-ovn-metadata-agent.log.1:2022-07-20 21:15:56.758 3872 ERROR neutron.agent.ovn.metadata.server [-] No port found in network 63e2c276-60dd-40e3-baa1-c16342eacce2 with IP address 100.94.98.40

这个日志来自代码./neutron/agent/ovn/metadata/server.py#_get_instance_and_project_id,在SB_DB里查询到的Port_Binding为0,

 99     def _get_instance_and_project_id(self, req):                                
100         remote_address = req.headers.get('X-Forwarded-For')                     
101         network_id = req.headers.get('X-OVN-Network-ID')                        
102                                                                                 
103         ports = self.sb_idl.get_network_port_bindings_by_ip(network_id,         
104                                                             remote_address)     
105         num_ports = len(ports)                                                  
106         if num_ports == 1:                                                      
107             external_ids = ports[0].external_ids                                
108             return (external_ids[ovn_const.OVN_DEVID_EXT_ID_KEY],               
109                     external_ids[ovn_const.OVN_PROJID_EXT_ID_KEY])              
110         elif num_ports == 0:                                                    
111             LOG.error("No port found in network %s with IP address %s",                                                                                                            
112                       network_id, remote_address)

899     def get_network_port_bindings_by_ip(self, network, ip_address):                                                                                                                
900         rows = self.db_list_rows('Port_Binding').execute(check_error=True)      
901         # TODO(twilson) It would be useful to have a db_find that takes a       
902         # comparison function                                                   
903                                                                                 
904         def check_net_and_ip(port):                                             
905             # If the port is not bound to any chassis it is not relevant        
906             if not port.chassis:                                                
907                 return False                                                    
908                                                                                 
909             # TODO(dalvarez): Remove the comparison to port.datapath.uuid in Y  
910             # cycle when we are sure that all namespaces will be created with   
911             # the Neutron network UUID and not anymore with the OVN datapath    
912             # UUID.                                                             
913             is_in_network = lambda port: (                                      
914                 str(port.datapath.uuid) == network or                           
915                 utils.get_network_name_from_datapath(port.datapath) == network) 
916                                                                                 
917             return port.mac and is_in_network(port) and (                       
918                     ip_address in port.mac[0].split(' '))                       
919                                                                                 
920         return [r for r in rows if check_net_and_ip(r)] 

但是却搜不着和get_network_port_bindings_by_ip相关的东西(find var/log/neutron/* |xargs zgrep -i -E ‘get_network_port_bindings_by_ip’),所以应该不是连不着SB_DB的问题(若是连不上SB_DB,那运行get_network_port_bindings_by_ip时会出错,打出堆栈会有get_network_port_bindings_by_ip这个字眼吧?), 所以很有可能原因是真的Port_Binding就是0. 在我的测试环境上用’ovn-sbctl list Port_Binding’会看到下列数据(测试环境中的metadata port为192.168.21.2):

_uuid               : a011fdcd-15ba-48c3-aec0-2ef590aee864
chassis             : []
datapath            : 0f509be3-9b1f-4c66-9c59-ac4211cf5ef4
encap               : []
external_ids        : "neutron:cidrs"="192.168.21.2/24", "neutron:device_id"=ovnmeta-8ea7d54e-5fe0-408d-9de8-5ccbed8ede84, "neutron:device_owner"="network:distributed", "neutron:network_name"=neutron-8ea7d54e-5fe0-408d-9de8-5ccbed8ede84, "neutron:port_name"="", "neutron:project_id"="728f2ae8aa8443b5a255574a5e236746", "neutron:revision_number"="2", "neutron:security_group_ids"=""
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : "48244606-524d-4c24-bc54-0143869d4267"
mac                 : ["fa:16:3e:43:81:23 192.168.21.2"]
nat_addresses       : []
options             : requested-chassis=""
parent_port         : []
tag                 : []
tunnel_key          : 1
type                : localport
up                  : false
virtual_parent      : []

Port_Binding是谁创建的呢?
另外,nova-compute中大量这种日志, 似乎连不上neutron-api (9696):

var/log/nova/nova-api-metadata.log:2022-07-21 13:41:24.086 1790270 ERROR nova.api.metadata.handler [req-42f38810-e2c3-4df0-8de6-754d1f6d5472 - - - - -] Failed to get metadata for instance id: 80f1bb26-f5db-4919-abb2-6b61d13f9267

var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:21:46.371 3516 ERROR oslo.messaging._drivers.impl_rabbit [-] [e96b9581-fa77-4f0f-a580-05a13b18bfad] AMQP server on 192.168.0.124:5672 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds.: amqp.exceptions.ConnectionForced: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
...
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.928 3514 ERROR nova.api.metadata.handler [req-c4d5a1b4-1c36-4864-b729-d68de13ef59d - - - - -] Failed to get metadata for instance id: cc1c7466-b5d4-47d0-8e23-e91c785fafb3: neutronclient.common.exceptions.NeutronClientException: <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">

var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler neutronclient.common.exceptions.NeutronClientException: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler <html><head>
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler <title>502 Proxy Error</title>
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler </head><body>
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler <h1>Proxy Error</h1>
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler <p>The proxy server received an invalid^M
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler response from an upstream server.<br />^M
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p>
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler <hr>
var/log/nova/nova-api-metadata.log.20.gz:2022-07-01 20:59:04.849 3514 ERROR nova.api.metadata.handler <address>Apache/2.4.41 (Ubuntu) Server at neutron.internal.api.xxx.infra.xxx.net Port 9696</address>

还有大量这种日志:

2022-07-20 23:59:18.828 1790159 INFO nova.compute.resource_tracker [req-25cb1333-d268-4870-877d-8465aea2b451 - - - - -] Instance 4d35b4ff-177c-4659-b6a4-d4d1d5ddecab has allocations against this compute host but is not found in the database.

也发现这种日志:

2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/command.py", line 42, in execute
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command     t.add(self)
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command     next(self.gen)
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3/dist-packages/ovsdbapp/api.py", line 110, in transaction
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command     del self._nested_txns_map[cur_thread_id]
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3/dist-packages/ovsdbapp/api.py", line 61, in __exit__
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command     self.result = self.commit()
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command     raise exceptions.TimeoutException(
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.exceptions.TimeoutException: Commands [DbSetCommand(table=Chassis_Private, record=srv1ibm003d.maas, col_values=(('external_ids', 'neutron:ovn-metadata-sb-cfg': '31205'),))] exceeded timeout 180 seconds, cause: Result queue is empty
2022-07-11 16:37:25.769 2361 ERROR ovsdbapp.backend.ovs_idl.command
2022-07-11 16:37:25.780 2361 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:100.94.0.158:6642: connected
2022-07-11 16:37:25.817 2361 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:100.94.0.158:6642: clustered database server is not cluster leader; trying another server
2022-07-11 16:37:25.818 2361 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:100.94.0.158:6642: connection closed by client
2022-07-11 16:37:33.831 2361 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:100.94.0.204:6642: connected
2022-07-11 16:37:34.566 2361 INFO neutron.agent.ovn.metadata.agent [-] Connection to OVSDB established, doing a full sync

所以是metadata-agent连接SB_DB时超时导致查询到的port_binding为0了, 找到一个lp bug https://bugs.launchpad.net/neutron/+bug/1951841, 这个bug说system_id不为UUID时会报错,我们也看到了下列日志:

2022-07-11 18:57:45.926 798838 WARNING neutron.agent.ovn.metadata.agent [-] Can't read ovn-bridge external-id from OVSDB. Using br-int instead.
2022-07-11 18:57:45.926 798838 INFO neutron.agent.ovn.metadata.agent [-] Loaded chassis name xxx.maas (UUID: 4fe633c2-0c52-5c86-92b2-0ee1f5e71a90) and ovn bridge br-int.
2022-07-11 18:57:45.927 798838 INFO oslo_service.service [-] Starting 2 workers

308     def _get_ovn_bridge(self):                                                  
309         """Return the external_ids:ovn-bridge value of the Open_vSwitch table.  
310                                                                                 
311         This is the OVS bridge used to plug the metadata ports to.              
312         If the key doesn't exist, this method will return 'br-int' as default.  
313         """                                                                     
314         ext_ids = self.ovs_idl.db_get(                                          
315             'Open_vSwitch', '.', 'external_ids').execute()                      
316         try:                                                                    
317             return ext_ids['ovn-bridge']                                        
318         except KeyError:                                                        
319             LOG.warning("Can't read ovn-bridge external-id from OVSDB. Using "                                                                                                     
320                         "br-int instead.")                                      
321             return 'br-int' 

但是取不到external_ids还是因为Port_Binding不存在啊?

排除lp bug 1951841

https://bugs.launchpad.net/neutron/+bug/1951841
ovn除了为计算节点添加Chassis记录(ovn-sbctl list Chassis), 还得为metadata-agent添加一个Chassis_Private记录(它根据Chassis system-id来生成Chassis_Private的agent-id.下面的代码中假设self.chassis_id是uuid,但实现charm部署的却不一定是uuid,这会导致neutron-ovn-metadata-agent运行可以但this produces that the neutron-ovn-metadata-agent daemon stays running but disconnected and new launched VMs won’t have access to the metadata service.

    def register_metadata_agent(self):
        # NOTE(lucasagomes): db_add() will not overwrite the UUID if
        # it's already set.
        table = ('Chassis_Private' if self.has_chassis_private else 'Chassis')
        # Generate unique, but consistent metadata id for chassis name
        agent_id = uuid.uuid5(self.chassis_id, 'metadata_agent')
        ext_ids = ovn_const.OVN_AGENT_METADATA_ID_KEY: str(agent_id)
        self.sb_idl.db_add(table, self.chassis, 'external_ids',
                           ext_ids).execute(check_error=True)

# ovn-sbctl list Chassis_Private
_uuid               : 45524f21-0ca2-4736-a851-e5db69dad805
chassis             : 4e2fabf4-3834-4137-8454-3525e541d652
external_ids        : "neutron:ovn-metadata-id"="26e2474c-75fc-5977-a311-828b9ad92180", "neutron:ovn-metadata-sb-cfg"="5"
name                : juju-3df241-ovn-8.cloud.sts
nb_cfg              : 5
nb_cfg_timestamp    : 1661337217145

做了下列实验未重现问题,可能是做一次不会稳定重现吗?

$ juju ssh nova-compute/0 sudo ip netns | grep ovnmeta
ovnmeta-0211506b-233e-4773-a034-3950dfefe23d
ovnmeta-8ea7d54e-5fe0-408d-9de8-5ccbed8ede84 (id: 0)

$ openstack server delete focal-103100
$ juju ssh nova-compute/0 sudo ip netns | grep ovnmeta
<empty>

$ juju status ovn-central | grep leader
ovn-central  21.09.0  active      3  ovn-central  21.09/stable   24  no       Unit is ready (leader: ovnnb_db)
ovn-central/0   active    idle   9        10.5.1.22       6641/tcp,6642/tcp  Unit is ready (leader: ovnnb_db)
ovn-central/2*  active    idle   11       10.5.3.101      6641/tcp,6642/tcp  Unit is ready (leader: ovnsb_db)
$ juju ssh $(juju status ovn-central | grep leader | tail -n 1 | awk 'print $1' | tr -d '*') sudo reboot
#wait a few minutes
$ juju status ovn-central | grep leader
ovn-central  21.09.0  active      3  ovn-central  21.09/stable   24  no       Unit is ready (leader: ovnnb_db)
ovn-central/0   active    idle   9        10.5.1.22       6641/tcp,6642/tcp  Unit is ready (leader: ovnnb_db)
ovn-central/1   active    idle   10       10.5.0.186      6641/tcp,6642/tcp  Unit is ready (leader: ovnsb_db northd: active)

$ ./tools/instance_launch.sh 1 focal
$ ./tools/float_all.sh
$ juju ssh nova-compute/0 sudo ip netns | grep ovnmeta
ovnmeta-8ea7d54e-5fe0-408d-9de8-5ccbed8ede84 (id: 0)

OVN中的Port_Binding表更新后会触发neutron中的 PortBindingUpdateVirtualPortsEvent 事件, 然后neutron的update_virtual_port_host再去neutron中创建PortBinding记录.

#https://bugs.launchpad.net/neutron/+bug/1951841/comments/28
neutron-ovn-metadata-agent如果每次启动用一个随机的agent-id来创建Private_Chassis的话,这个patch(https://github.com/openstack/neutron/commit/da3ce7319866e8dc874d405e91f9af160e2c3d31)改成不用随机的了,但是它使用了(chassis_id = uuid.UUID(self._get_own_chassis_name()) && agent_id = uuid.uuid5(chassis_id, ‘metadata_agent’)),这个会要求_get_own_chassis_name必须是一个uuid否则会抛异常.
为什么抛出了异常metada-agent就不再重新连接SB_DB去注册Private_Chassis了呢?那是因为ChassisCreateEventBase#run调用"self.agent.register_metadata_agent"第一次运行的时候因为first_time=true不会call self.agent.register_metadata_agent

    def run(self, event, row, old):
        if self.first_time:
            self.first_time = False
        else:
            # NOTE(lucasagomes): Re-register the ovn metadata agent
            # with the local chassis in case its entry was re-created
            # (happens when restarting the ovn-controller)
            self.agent.register_metadata_agent()
            LOG.info("Connection to OVSDB established, doing a full sync")
            self.agent.sync()

解决方法是支持让system-id为non-UUID - https://opendev.org/openstack/neutron/commit/6da4432fed255f3bcf3831f5d0520ab389ce36e5

但是这个代码已经在19.1.0里了,客户用的是19.3.0,显然不是这个bug

以上是关于ovn metadata (by quqi99)的主要内容,如果未能解决你的问题,请参考以下文章

ovn metadata (by quqi99)

ovn metadata (by quqi99)

set up ovn based sr-iov test env (by quqi99)

set up ovn based sr-iov test env (by quqi99)

Testing ovn manually based on LXD (by quqi99)

set up ovn development env (by quqi99)