ZuulProxy 在应该进行故障转移时因“RibbonCommand 超时且没有可用的回退”而失败

Posted

技术标签:

【中文标题】ZuulProxy 在应该进行故障转移时因“RibbonCommand 超时且没有可用的回退”而失败【英文标题】:ZuulProxy fails with "RibbonCommand timed-out and no fallback available" when it should do failover 【发布时间】:2015-11-13 19:48:50 【问题描述】:

简短说明: 我正在尝试使用 ZuulProxy 来处理实例故障转移,但它会抛出 ZuulException: Forwarding 错误,而不是响应来自工作实例的结果。

详细说明: 我的设置是一个独立的 Eureka Server、一个 ConfigServer、一个 ZuulProxy (@EnableZuulProxy) 和两个服务实例,它们都在 Eureka 中注册。

一切都使用 spring-cloud-starter-parent Angel.SR3 运行

我的服务发现:

@SpringBootApplication
@EnableEurekaServer
public class EurekaServer 
    public static void main(String[] args) 
        SpringApplication.run(EurekaServer.class, args);
    

我的配置服务器:

@SpringBootApplication
@EnableAutoConfiguration
@EnableConfigServer
@ComponentScan
@EnableDiscoveryClient
public class ConfigserverApplication 

  public static void main(String[] args) 
    SpringApplication.run(ConfigserverApplication.class, args);
  

我的ZuulProxy:

@SpringBootApplication
@EnableAutoConfiguration
@ComponentScan
@EnableDiscoveryClient
@EnableZuulProxy
public class ZuulProxy 
  public static void main(String[] args) 
    SpringApplication.run(ZuulProxy.class, args);
  

zuul 中的路由规则

zuul.ignoredServices=*
zuul.routes.examplems=/example/**

我的服务实例:

@SpringBootApplication
@Configuration
@EnableAutoConfiguration
@EnableDiscoveryClient
@ComponentScan(basePackages = "se.example.microservices")
@EnableSwagger
public class Application 

  public static void main(String[] args) throws Exception
    SpringApplication.run(Application.class, args);
  

我的服务实例使用spring.application.name=examplems注册自己

当我启动两个服务实例并通过 zuulproxy 发出请求时,一切正常,它将请求轮询到我的两个服务实例。 但是当我停止其中一个实例时,Zuul 仍然尝试多次将请求转发到停止的实例,然后它失败了:

com.netflix.zuul.exception.ZuulException: Forwarding error
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: examplemsRibbonCommand timed-out and no fallback available.

我希望对已停止实例的请求超时并透明地故障转移到正在运行的实例。真正奇怪的是,zuul(根据日志)似乎首先尝试多次停止实例(这当然失败),然后将请求转发给工作实例,得到一个好的答案,而不是转发OK 对客户端的回答会引发异常,导致响应失败,状态为 500!?

请查看日志(我的工作实例在主机 PMD11286 上,我停止的实例在 PMD11933 上):

2015-08-20 08:45:46.343  INFO 7700 --- [nio-9050-exec-9] o.s.c.n.zuul.filters.ProxyRouteLocator   : Finding route for path: /example/ping/delay
2015-08-20 08:45:46.343 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Connection 0.0.0.0:50251<->172.20.120.39:9060 closed
2015-08-20 08:45:46.343 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:47.372 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:47.372 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:48.386 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:48.386 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:49.416 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:49.416 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:50.430 DEBUG 7700 --- [N_MANAGER_TIMER] o.a.h.i.c.t.ThreadSafeClientConnManager  : Closing expired connections
2015-08-20 08:45:50.430 DEBUG 7700 --- [N_MANAGER_TIMER] o.a.h.impl.conn.tsccm.ConnPoolByRoute    : Closing expired connections
2015-08-20 08:45:50.446 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:50.446 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:51.475 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connect to PMD11933:9060 timed out. Connection will be retried using another IP address
2015-08-20 08:45:51.475 DEBUG 7700 --- [nio-9050-exec-9] .a.h.i.c.DefaultClientConnectionOperator : Connecting to PMD11933:9060
2015-08-20 08:45:52.505 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Connection 0.0.0.0:50251<->172.20.120.39:9060 closed
2015-08-20 08:45:52.505 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Connection 0.0.0.0:50251<->172.20.120.39:9060 shut down
2015-08-20 08:45:52.505 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Connection 0.0.0.0:50251<->172.20.120.39:9060 closed
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.client.protocol.RequestAddCookies  : CookieSpec selected: ignoreCookies
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.client.protocol.RequestAuthCache   : Auth cache not set in the context
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.c.p.RequestTargetAuthentication    : Target auth state: UNCHALLENGED
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.c.p.RequestProxyAuthentication     : Proxy auth state: UNCHALLENGED
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Sending request: GET /ping/delay HTTP/1.1
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "GET /ping/delay HTTP/1.1[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "x-forwarded-host: 127.0.0.1:9050[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "x-forwarded-prefix: /example[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "accept-encoding: deflate, gzip[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "user-agent: curl/7.42.1[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "accept: */*[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "Netflix.NFHttpClient.Version: 1.0[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "X-netflix-httpclientname: examplems[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "Host: PMD11286:9060[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "Connection: Keep-Alive[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  >> "[\r][\n]"
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> GET /ping/delay HTTP/1.1
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> x-forwarded-host: 127.0.0.1:9050
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> x-forwarded-prefix: /example
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> accept-encoding: deflate, gzip
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> user-agent: curl/7.42.1
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> accept: */*
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> Netflix.NFHttpClient.Version: 1.0
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> X-netflix-httpclientname: examplems
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> Host: PMD11286:9060
2015-08-20 08:45:52.520 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : >> Connection: Keep-Alive
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "HTTP/1.1 200 OK[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Server: Apache-Coyote/1.1[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "X-Application-Context: examplems:9060[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Content-Type: text/plain;charset=UTF-8[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Content-Length: 76[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Date: Thu, 20 Aug 2015 06:45:52 GMT[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "[\r][\n]"
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] o.a.h.impl.conn.DefaultClientConnection  : Receiving response: HTTP/1.1 200 OK
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << HTTP/1.1 200 OK
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << Server: Apache-Coyote/1.1
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << X-Application-Context: examplems:9060
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << Content-Type: text/plain;charset=UTF-8
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << Content-Length: 76
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.headers                  : << Date: Thu, 20 Aug 2015 06:45:52 GMT
2015-08-20 08:45:52.630 DEBUG 7700 --- [nio-9050-exec-9] org.apache.http.wire                     :  << "Svar efter 100 ms v[0xc3][0xa4]ntan. Kan [0xc3][0xa4]ndras med: ?time=200  15-08-20 08:45:52,618"
2015-08-20 08:45:52.630  WARN 7700 --- [nio-9050-exec-9] o.s.c.n.z.filters.post.SendErrorFilter   : Error during filtering

com.netflix.zuul.exception.ZuulException: Forwarding error
        at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.forward(RibbonRoutingFilter.java:142)
        at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.run(RibbonRoutingFilter.java:107)
        at com.netflix.zuul.ZuulFilter.runFilter(ZuulFilter.java:112)
        at com.netflix.zuul.FilterProcessor.processZuulFilter(FilterProcessor.java:197)
        at com.netflix.zuul.FilterProcessor.runFilters(FilterProcessor.java:161)
        at com.netflix.zuul.FilterProcessor.route(FilterProcessor.java:120)
        at com.netflix.zuul.ZuulRunner.route(ZuulRunner.java:84)
        at com.netflix.zuul.http.ZuulServlet.route(ZuulServlet.java:111)
        at com.netflix.zuul.http.ZuulServlet.service(ZuulServlet.java:77)
        at org.springframework.web.servlet.mvc.ServletWrappingController.handleRequestInternal(ServletWrappingController.java:158)
        at org.springframework.cloud.netflix.zuul.web.ZuulController.handleRequestInternal(ZuulController.java:43)
        at org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.java:146)
        at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:50)
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959)
        at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:966)
        at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:857)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:618)
        at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:842)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.boot.actuate.autoconfigure.EndpointWebMvcAutoConfiguration$ApplicationContextHeaderFilter.doFilterInternal(EndpointWebMvcAutoConfiguration.java:291)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.boot.actuate.trace.WebRequestTraceFilter.doFilterInternal(WebRequestTraceFilter.java:102)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:85)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.springframework.boot.actuate.autoconfigure.MetricFilterAutoConfiguration$MetricsFilter.doFilterInternal(MetricFilterAutoConfiguration.java:90)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:516)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1086)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:659)
        at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1558)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1515)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: examplemsRibbonCommand timed-out and no fallback available.
        at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:782)
        at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:769)
        at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:77)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at com.netflix.hystrix.AbstractCommand$DeprecatedOnFallbackHookApplication$1.onError(AbstractCommand.java:1444)
        at com.netflix.hystrix.AbstractCommand$FallbackHookApplication$1.onError(AbstractCommand.java:1334)
        at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:314)
        at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:306)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable$1.call(Observable.java:144)
        at rx.Observable$1.call(Observable.java:136)
        at rx.Observable.unsafeSubscribe(Observable.java:7466)
        at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:78)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70)
        at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$1.run(AbstractCommand.java:923)
        at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:41)
        at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:37)
        at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable.run(HystrixContextRunnable.java:57)
        at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$2.tick(AbstractCommand.java:943)
        at com.netflix.hystrix.util.HystrixTimer$1.run(HystrixTimer.java:98)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        ... 1 common frames omitted
Caused by: java.util.concurrent.TimeoutException: null
        at com.netflix.hystrix.AbstractCommand$9.call(AbstractCommand.java:589)
        at com.netflix.hystrix.AbstractCommand$9.call(AbstractCommand.java:570)
        at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:77)
        ... 15 common frames omitted

如果我等待一两分钟,服务将从 eureka 中删除,并最终传播到 zuulproxy,这意味着停止的服务不再有流量。但我的假设是,ribbon/hysterix 会更优雅/更快地处理无响应的服务。

任何提示/建议? 谢谢马格努斯

【问题讨论】:

您等了多长时间才重试?功能区保留服务器缓存,并且需要一些时间(30 秒以上)才能删除故障服务器。 实际上你是正确的@spencergibb,如果我等待一两分钟,服务将从 eureka 中删除,最终这会传播到 zuulproxy,这意味着停止的服务不再有流量。但我的假设是,ribbon/hysterix 会更优雅/更快地处理无响应的服务。我会用这些信息更新我的帖子。 【参考方案1】:

1/超时

Zuul 请求由 Hystrix 监控,其目的(在该应用程序中)是对长时间运行的请求应用超时。

Hystrix 提供了两种不同的方式来执行命令和强制超时:SEMAPHORE 和 THREAD 执行隔离。

当使用 THREAD 隔离时,Hystrix 命令在与线程池不同的线程上执行。 Hystrix 然后“暂停”持有传入请求的线程,直到从下游服务器接收到响应或发生超时。

当使用 SEMAPHORE 隔离时,Hystrix 命令在请求线程上执行。只有在从下游服务器收到响应后才会检测到超时。因此,如果您将 Zuul/Hystrix 配置为超时 5 秒并且您的服务需要 30 秒才能完成,那么您的客户端将仅在 30 秒后才收到超时通知 - 即使服务响应成功(!)

Netflix 建议默认执行 THREAD,除非在极少数情况下。不幸的是,由于我不知道的原因,SpringCloud Zuul 集成将其更改为 SEMAPHORE。请参阅Why is ZUUL forcing a SEMAPHORE isolation to execute its Hystrix commands? 了解更多信息。

这解释了为什么您收到 500 错误,尽管已成功联系到剩余的实时服务器。

2/重试

功能区用于对远程服务进行实际调用。它使用 Eureka 提供的信息来确定可用的服务和相应的地址。 Eureka 使用每 30 秒更新一次的本地缓存。所以正如@spencergibb 所说,它可能会保留过时的信息一段时间(死服务器) - 但这是意料之中的。

功能区在连接/联系服务失败时会自动重试。它可以配置为在尝试另一个服务器之前重试同一服务器几次。我不记得默认值和实际配置属性,但我个人一直在使用以下设置:

# Max number of retries on the same server (excluding the first try)
ribbon.maxAutoRetries = 1

# Max number of next servers to retry (excluding the first server)
ribbon.MaxAutoRetriesNextServer = 2

3/连接超时

从您的日志看来,连接到远程服务的尝试失败大约需要 1 秒。这对于停止服务来说非常长。尝试连接到没有服务侦听的 TCP 端口应该立即失败(至少如果主机/IP 是可访问的并且连接尝试没有以 void 结束)...

连接超时由以下属性控制 - 确保将其设置为下降值:

# Connect timeout used by Apache HttpClient
ribbon.ConnectTimeout=3000

# Read timeout used by Apache HttpClient
ribbon.ReadTimeout=5000

希望这些信息可以帮助您解决问题;-)

【讨论】:

谢谢@bertrand!你当然给了我一些有用的信息来挖掘。 如果对您有帮助,请不要犹豫将答案标记为有效。这是其他人在浏览论坛时的宝贵信息。祝你的问题好运:) @BertrandRenuart 只是想说谢谢您的出色回答。

以上是关于ZuulProxy 在应该进行故障转移时因“RibbonCommand 超时且没有可用的回退”而失败的主要内容,如果未能解决你的问题,请参考以下文章

我应该如何处理 postgres-xl gtm 故障转移

在 haproxy 中的写入故障转移期间,正在进行的 mysql 事务会发生啥?

使用 Spring AMQP 和 RabbitMQ HA 进行故障转移

如何删除 SQL Server 故障转移群集实例

Camel:故障转移EMS Tibco队列

使用 cloudwatch 警报进行 AWS route53 故障转移