06 基于 docker 网关服务 启动耗时过长问题排查
Posted 蓝风9
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了06 基于 docker 网关服务 启动耗时过长问题排查相关的知识,希望对你有一定的参考价值。
前言
今天 重启网关服务的时候, 发现了这样的一个问题 一个单纯的配置了一些路由策略的网关重新启动需要 启动 288s ?!!
呵呵 这个显然是存在问题
这里大致 inspect 一下这个流程
耗时的服务
耗时的服务主要是来自于调用 InetAddress.getLocalHost
调用 InetAddress.getLocalHost 的相关服务大概有如下列表
jmx
ConnectorBootstrap.startLocalConnectorServer()
spring-boot
ApplicationPid.getPid()
mongo
new ObjectId()
mybatis-plus
Sequence.getDatacenterId()
Sequence.getMaxWorkerId()
netty
DefaultChannelId.newInstance()
nacos
NetUtils.localIP()
// 其他, 这里仅仅是抽样的一部分
我们这个场景里面, 主要耗时的原因在于 获取到的 hostname 做了 dns 查询之后查询不到结果, 然后 有一些超时处理
根据 hostname 获取地址信息在 java language 层面 对应的是 impl.lookupAllHostAddr(host)
在 c/c++ 层面调用的库函数是 getaddrinfo
#0 0x00007fafca44a84d in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fafa89d1155 in ?? () from /lib/x86_64-linux-gnu/libresolv.so.2
#2 0x00007fafa89ced62 in __libc_res_nquery () from /lib/x86_64-linux-gnu/libresolv.so.2
#3 0x00007fafa89cfc29 in __libc_res_nsearch () from /lib/x86_64-linux-gnu/libresolv.so.2
#4 0x00007fafa8be3b89 in _nss_dns_gethostbyname4_r () from /lib/x86_64-linux-gnu/libnss_dns.so.2
#5 0x00007fafca43ad58 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007fafca43de5e in getaddrinfo () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x00007fafabdef6de in Java_java_net_Inet6AddressImpl_lookupAllHostAddr () from /usr/local/ProgramFiles/jdk1.8.0_291/jre/lib/amd64/libnet.so
#8 0x00007fafb47cdb08 in ?? ()
#9 0x00007fafc58f7800 in ?? ()
#10 0x0000000200000000 in ?? ()
#11 0x00007faf618d93c0 in ?? ()
#12 0x0000000000000000 in ?? ()
测试 InetAddress.getLocalHost 的超时用例
/**
* Test27INetAddressGetLocalHost
*
* @author Jerry.X.He
* @version 1.0
* @date 2022/3/23 10:15
*/
public class Test27INetAddressGetLocalHost
// Test27INetAddressGetLocalHost
public static void main(String[] args) throws Exception
// InetAddress addr = InetAddress.getLocalHost();
// System.out.println(addr.getHostName());
long start = System.currentTimeMillis();
try
InetAddress[] addressList = InetAddress.getAllByName("xxx");
catch (Exception e)
long spent = System.currentTimeMillis() - start;
System.out.println(" query name service timeout : " + spent + " ms");
在不同的环境超时有一些差异
-- Linux ubuntu 4.4.0-186-generic #216-Ubuntu SMP Wed Jul 1 05:34:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu:~/HelloWorld# java Test27INetAddressGetLocalHost
query name service timeout : 20031 ms
root@ubuntu:~/HelloWorld# java Test27INetAddressGetLocalHost
query name service timeout : 20027 ms
root@ubuntu:~/HelloWorld# java Test27INetAddressGetLocalHost
query name service timeout : 20027 ms
-- openjdk:8-jdk-alpine 的基础镜像
/usr/local # java Test27INetAddressGetLocalHost
query name service timeout : 5014 ms
/usr/local # java Test27INetAddressGetLocalHost
query name service timeout : 5013 ms
/usr/local # java Test27INetAddressGetLocalHost
query name service timeout : 5019 ms
-- windows10 机器
Jerry.X.He@DESKTOP-SphpNQ2 MINGW64 /d/IdeaWorkStations/HelloWorld02/target/classes (master)
$ java com.hx.test01.Test27INetAddressGetLocalHost
query name service timeout : 6778 ms
Jerry.X.He@DESKTOP-SPHPNQ2 MINGW64 /d/IdeaWorkStations/HelloWorld02/target/classes (master)
$ java com.hx.test01.Test27INetAddressGetLocalHost
query name service timeout : 6783 ms
Jerry.X.He@DESKTOP-SPHPNQ2 MINGW64 /d/IdeaWorkStations/HelloWorld02/target/classes (master)
$ java com.hx.test01.Test27INetAddressGetLocalHost
query name service timeout : 6776 ms
结合实际场景的网关服务
复现问题的时间开销
Started Gateway in 428.506 seconds (JVM running for 429.258)
Started Gateway in 428.469 seconds (JVM running for 429.224)
Started Gateway in 429.492 seconds (JVM running for 432.747)
Started Gateway in 435.327 seconds (JVM running for 438.343)
调整之后的时间开销
Started Gateway in 13.036 seconds (JVM running for 16.186)
大致剖析一下时间开销的差异
我这里的机器是 Linux ubuntu 4.4.0-186-generic #216-Ubuntu SMP Wed Jul 1 05:34:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
超时时间为 20s, 下面 INetAddress - getLocalhost 合计是出现了 23 次, 合计 460s
但是实际 服务启动时间大致是 430s 左右, 可能是存在一部分 异步的请求导致存在一些相交的时间段[仅仅是猜测]
但是 大体的服务启动的过程中核心耗时问题已经清晰了
INetAddress - getLocalhost
2022-03-14 18:18:31.452 INFO 33752 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration' of type [org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration$$EnhancerBySpringCGLIB$$4f404519] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
-2022-03-14 18:19:35.013 WARN 33752 --- [ main] [TID: N/A] o.s.boot.actuate.endpoint.EndpointId : Endpoint ID 'nacos-config' contains invalid characters, please migrate to a valid format.
-2022-03-14 18:19:35.016 WARN 33752 --- [ main] [TID: N/A] o.s.boot.actuate.endpoint.EndpointId : Endpoint ID 'nacos-discovery' contains invalid characters, please migrate to a valid format.
-2022-03-14 18:19:35.072 WARN 33752 --- [ main] [TID: N/A] o.s.boot.actuate.endpoint.EndpointId : Endpoint ID 'service-registry' contains invalid characters, please migrate to a valid format.
-2022-03-14 18:19:35.428 INFO 33752 --- [ main] [TID: N/A] o.s.cloud.context.scope.GenericScope : BeanFactory id=d922db8a-512c-321a-b5f2-3a9cff497728
-2022-03-14 18:19:35.520 INFO 33752 --- [ main] [TID: N/A] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration' of type [org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration$$EnhancerBySpringCGLIB$$4f404519] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
-2022-03-14 18:19:37.898 INFO 33752 --- [ main] [TID: N/A] o.s.cloud.commons.util.InetUtils : Cannot determine local hostname
-2022-03-14 18:19:37.987 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [After]
-2022-03-14 18:19:37.988 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Before]
-2022-03-14 18:19:37.989 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Between]
-2022-03-14 18:19:37.991 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Cookie]
-2022-03-14 18:19:37.992 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Header]
-2022-03-14 18:19:37.992 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Host]
-2022-03-14 18:19:37.994 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Method]
-2022-03-14 18:19:37.994 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Path]
-2022-03-14 18:19:37.994 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Query]
-2022-03-14 18:19:37.994 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [ReadBodyPredicateFactory]
-2022-03-14 18:19:37.995 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [RemoteAddr]
-2022-03-14 18:19:37.995 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [Weight]
-2022-03-14 18:19:37.995 INFO 33752 --- [ main] [TID: N/A] o.s.c.g.r.RouteDefinitionRouteLocator : Loaded RoutePredicateFactory [CloudFoundryRouteService]
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
-2022-03-14 18:23:58.745 WARN 33752 --- [ main] [TID: N/A] c.n.c.sources.URLConfigurationSource : No URLs will be polled as dynamic configuration sources.
-2022-03-14 18:23:58.745 INFO 33752 --- [ main] [TID: N/A] c.n.c.sources.URLConfigurationSource : To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
-2022-03-14 18:23:58.754 WARN 33752 --- [ main] [TID: N/A] c.n.c.sources.URLConfigurationSource : No URLs will be polled as dynamic configuration sources.
-2022-03-14 18:23:58.755 INFO 33752 --- [ main] [TID: N/A] c.n.c.sources.URLConfigurationSource : To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
-2022-03-14 18:23:59.922 INFO 33752 --- [ main] [TID: N/A] o.s.b.a.e.web.EndpointLinksResolver : Exposing 22 endpoint(s) beneath base path '/actuator'
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
INetAddress - getLocalhost
-2022-03-14 18:25:18.817 INFO 33752 --- [ main] [TID: N/A] o.s.b.web.embedded.netty.NettyWebServer : Netty started on port(s): 7790
处理问题的一些方式
- 去掉 docker-compose 中的 hostname 的配置, 继承宿主机的 hostname, 对应于 hosts 中有映射配置
- hosts 中增加配置的 hostname 的映射
- dns 服务器中增加配置的 hostname 的映射
完
以上是关于06 基于 docker 网关服务 启动耗时过长问题排查的主要内容,如果未能解决你的问题,请参考以下文章