06 基于 docker 网关服务 启动耗时过长问题排查

Posted 蓝风9

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了06 基于 docker 网关服务 启动耗时过长问题排查相关的知识,希望对你有一定的参考价值。

前言

今天 重启网关服务的时候, 发现了这样的一个问题 一个单纯的配置了一些路由策略的网关重新启动需要 启动 288s ?!! 

呵呵 这个显然是存在问题 

这里大致 inspect 一下这个流程 

耗时的服务

耗时的服务主要是来自于调用 InetAddress.getLocalHost

调用 InetAddress.getLocalHost 的相关服务大概有如下列表 

jmx
	ConnectorBootstrap.startLocalConnectorServer()
spring-boot
	ApplicationPid.getPid()
mongo
	new ObjectId()
mybatis-plus
	Sequence.getDatacenterId()
	Sequence.getMaxWorkerId()
netty
	DefaultChannelId.newInstance()
nacos 
	NetUtils.localIP()
// 其他, 这里仅仅是抽样的一部分

我们这个场景里面, 主要耗时的原因在于 获取到的 hostname 做了 dns 查询之后查询不到结果, 然后 有一些超时处理 

根据 hostname 获取地址信息在 java language 层面 对应的是 impl.lookupAllHostAddr(host) 

在 c/c++ 层面调用的库函数是 getaddrinfo 

#0  0x00007fafca44a84d in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fafa89d1155 in ?? () from /lib/x86_64-linux-gnu/libresolv.so.2
#2  0x00007fafa89ced62 in __libc_res_nquery () from /lib/x86_64-linux-gnu/libresolv.so.2
#3  0x00007fafa89cfc29 in __libc_res_nsearch () from /lib/x86_64-linux-gnu/libresolv.so.2
#4  0x00007fafa8be3b89 in _nss_dns_gethostbyname4_r () from /lib/x86_64-linux-gnu/libnss_dns.so.2
#5  0x00007fafca43ad58 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007fafca43de5e in getaddrinfo () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007fafabdef6de in Java_java_net_Inet6AddressImpl_lookupAllHostAddr () from /usr/local/ProgramFiles/jdk1.8.0_291/jre/lib/amd64/libnet.so
#8  0x00007fafb47cdb08 in ?? ()
#9  0x00007fafc58f7800 in ?? ()
#10 0x0000000200000000 in ?? ()
#11 0x00007faf618d93c0 in ?? ()
#12 0x0000000000000000 in ?? ()

测试 InetAddress.getLocalHost 的超时用例

/**
 * Test27INetAddressGetLocalHost
 *
 * @author Jerry.X.He
 * @version 1.0
 * @date 2022/3/23 10:15
 */
public class Test27INetAddressGetLocalHost 

    // Test27INetAddressGetLocalHost
    public static void main(String[] args) throws Exception 

//        InetAddress addr = InetAddress.getLocalHost();
//        System.out.println(addr.getHostName());

        long start = System.currentTimeMillis();
        try 
            InetAddress[] addressList = InetAddress.getAllByName("xxx");
         catch (Exception e) 

        
        long spent = System.currentTimeMillis() - start;
        System.out.println(" query name service timeout : " + spent + " ms");

    


在不同的环境超时有一些差异 

-- Linux ubuntu 4.4.0-186-generic #216-Ubuntu SMP Wed Jul 1 05:34:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu:~/HelloWorld# java Test27INetAddressGetLocalHost
 query name service timeout : 20031 ms
root@ubuntu:~/HelloWorld# java Test27INetAddressGetLocalHost
 query name service timeout : 20027 ms
root@ubuntu:~/HelloWorld# java Test27INetAddressGetLocalHost
 query name service timeout : 20027 ms


-- openjdk:8-jdk-alpine 的基础镜像
/usr/local # java Test27INetAddressGetLocalHost
 query name service timeout : 5014 ms
/usr/local # java Test27INetAddressGetLocalHost
 query name service timeout : 5013 ms
/usr/local # java Test27INetAddressGetLocalHost
 query name service timeout : 5019 ms


-- windows10 机器
Jerry.X.He@DESKTOP-SphpNQ2 MINGW64 /d/IdeaWorkStations/HelloWorld02/target/classes (master)
$ java com.hx.test01.Test27INetAddressGetLocalHost
 query name service timeout : 6778 ms
Jerry.X.He@DESKTOP-SPHPNQ2 MINGW64 /d/IdeaWorkStations/HelloWorld02/target/classes (master)
$ java com.hx.test01.Test27INetAddressGetLocalHost
 query name service timeout : 6783 ms
Jerry.X.He@DESKTOP-SPHPNQ2 MINGW64 /d/IdeaWorkStations/HelloWorld02/target/classes (master)
$ java com.hx.test01.Test27INetAddressGetLocalHost
 query name service timeout : 6776 ms

结合实际场景的网关服务

复现问题的时间开销 

Started Gateway in 428.506 seconds (JVM running for 429.258)
Started Gateway in 428.469 seconds (JVM running for 429.224)
Started Gateway in 429.492 seconds (JVM running for 432.747)
Started Gateway in 435.327 seconds (JVM running for 438.343)

调整之后的时间开销 

Started Gateway in 13.036 seconds (JVM running for 16.186)

大致剖析一下时间开销的差异 

我这里的机器是 Linux ubuntu 4.4.0-186-generic #216-Ubuntu SMP Wed Jul 1 05:34:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux 

超时时间为 20s, 下面 INetAddress - getLocalhost 合计是出现了 23 次, 合计 460s 

但是实际 服务启动时间大致是 430s 左右, 可能是存在一部分 异步的请求导致存在一些相交的时间段[仅仅是猜测] 

但是 大体的服务启动的过程中核心耗时问题已经清晰了 

INetAddress - getLocalhost
2022-03-14 18:18:31.452  INFO 33752 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration' of type [org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration$$EnhancerBySpringCGLIB$$4f404519] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
-2022-03-14 18:19:35.013  WARN 33752 --- [           main] [TID: N/A] o.s.boot.actuate.endpoint.EndpointId     : Endpoint ID 'nacos-config' contains invalid characters, please migrate to a valid format.
-2022-03-14 18:19:35.016  WARN 33752 --- [           main] [TID: N/A] o.s.boot.actuate.endpoint.EndpointId     : Endpoint ID 'nacos-discovery' contains invalid characters, please migrate to a valid format.
-2022-03-14 18:19:35.072  WARN 33752 --- [           main] [TID: N/A] o.s.boot.actuate.endpoint.EndpointId     : Endpoint ID 'service-registry' contains invalid characters, please migrate to a valid format.
-2022-03-14 18:19:35.428  INFO 33752 --- [           main] [TID: N/A] o.s.cloud.context.scope.GenericScope     : BeanFactory id=d922db8a-512c-321a-b5f2-3a9cff497728
-2022-03-14 18:19:35.520  INFO 33752 --- [           main] [TID: N/A] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration' of type [org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration$$EnhancerBySpringCGLIB$$4f404519] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
-2022-03-14 18:19:37.898  INFO 33752 --- [           main] [TID: N/A] o.s.cloud.commons.util.InetUtils         : Cannot determine local hostname
-2022-03-14 18:19:37.987  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [After]
-2022-03-14 18:19:37.988  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Before]
-2022-03-14 18:19:37.989  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Between]
-2022-03-14 18:19:37.991  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Cookie]
-2022-03-14 18:19:37.992  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Header]
-2022-03-14 18:19:37.992  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Host]
-2022-03-14 18:19:37.994  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Method]
-2022-03-14 18:19:37.994  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Path]
-2022-03-14 18:19:37.994  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Query]
-2022-03-14 18:19:37.994  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [ReadBodyPredicateFactory]
-2022-03-14 18:19:37.995  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [RemoteAddr]
-2022-03-14 18:19:37.995  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [Weight]
-2022-03-14 18:19:37.995  INFO 33752 --- [           main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator    : Loaded RoutePredicateFactory [CloudFoundryRouteService]
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
-2022-03-14 18:23:58.745  WARN 33752 --- [           main] [TID: N/A] c.n.c.sources.URLConfigurationSource     : No URLs will be polled as dynamic configuration sources.
-2022-03-14 18:23:58.745  INFO 33752 --- [           main] [TID: N/A] c.n.c.sources.URLConfigurationSource     : To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
-2022-03-14 18:23:58.754  WARN 33752 --- [           main] [TID: N/A] c.n.c.sources.URLConfigurationSource     : No URLs will be polled as dynamic configuration sources.
-2022-03-14 18:23:58.755  INFO 33752 --- [           main] [TID: N/A] c.n.c.sources.URLConfigurationSource     : To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
-2022-03-14 18:23:59.922  INFO 33752 --- [           main] [TID: N/A] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 22 endpoint(s) beneath base path '/actuator'
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
-2022-03-14 18:25:18.817  INFO 33752 --- [           main] [TID: N/A] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port(s): 7790

处理问题的一些方式 

  1. 去掉 docker-compose 中的 hostname 的配置, 继承宿主机的 hostname, 对应于 hosts 中有映射配置 
  2. hosts 中增加配置的 hostname 的映射 
  3. dns 服务器中增加配置的 hostname 的映射 

以上是关于06 基于 docker 网关服务 启动耗时过长问题排查的主要内容,如果未能解决你的问题,请参考以下文章

jboss 服务器启动耗时过长

一类 SVM 算法耗时过长

QTcpSocket.connectToHost() 耗时过长

SQL 查询耗时过长

BAT等大厂必问技术面试题

gc 过长问题排查