Zuul/Ribbon/Hystrix 不在不同的实例上重试

Posted

技术标签:

【中文标题】Zuul/Ribbon/Hystrix 不在不同的实例上重试【英文标题】:Zuul/Ribbon/Hystrix not retrying on different instance 【发布时间】:2016-09-13 08:51:27 【问题描述】:

背景

我正在使用 Spring cloud Brixton.RC2,以及 Zuul 和 Eureka。

我有一个带有@EnableZuulProxy 的网关服务和一个带有status 方法的book-service。通过配置,我可以通过休眠定义的时间来模拟 status 方法的工作。

Zuul 路线很简单

zuul.routes.foos.path=/foos/**
zuul.routes.foos.serviceId=reservation-service

我运行book-service 的两个实例。当我将睡眠时间设置为低于 Hystrix 超时阈值(1000 毫秒)时,我可以看到请求发送到图书服务的两个实例。这很好用。

问题

我了解如果 Hystrix 命令失败,Ribbon 应该可以在不同的服务器上重试该命令。这应该使失败对客户端透明。

我阅读了Ribbon配置,在Zuul中添加了如下配置:

zuul.routes.reservation-service.retryable=true //not sure which one to try
zuul.routes.foos.retryable=true //not sure which one to try

ribbon.MaxAutoRetries=0 // I don't want to retry on the same host, I also tried with 1 it doesn't work either
ribbon.MaxAutoRetriesNextServer=2
ribbon.OkToRetryOnAllOperations=true

现在我更新配置,只有一个服务休眠超过 1s,这意味着我有一个健康服务,一个坏的。

当我调用网关时,调用会发送到两个实例,一半的调用返回 500。在网关中,我看到 Hystrix 超时:

com.netflix.zuul.exception.ZuulException: Forwarding error
    [...]
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: reservation-service timed-out and no fallback available.
    [...]
Caused by: java.util.concurrent.TimeoutException: null

为什么 Ribbon 不在另一个实例上重试调用?

我在这里错过了什么吗?


参考文献

与此question 相关(未解决) Ribbon configuration 据此commit Zuul 应该支持功能区重试

【问题讨论】:

【参考方案1】:

Zuul 默认使用 SEMAPHORE 隔离策略,不允许设置超时。我无法通过这种策略使用负载平衡。对我有用的是(按照你的例子):

1) 将 Zuul 的隔离改为 THREAD:

hystrix:
  command:
    reservation-service:
      execution:
        isolation:
          strategy: THREAD
          thread:
            timeoutInMilliseconds: 100000

重要提示:timeoutInMilliseconds= 100000 就像说没有 HystrixTimeout。为什么?因为如果 Hystrix 超时,就不会有任何负载均衡(我只是用 timeoutInMilliseconds 测试了它)

然后,将 Ribbon 的 ReadTimeout 配置为所需的值:

reservation-service:
  ribbon:
    ReadTimeout: 800
    ConnectTimeout: 250
    OkToRetryOnAllOperations: true
    MaxAutoRetriesNextServer: 2
    MaxAutoRetries: 0

在这种情况下,功能区中的 1 秒服务超时后,它将使用 500 毫秒服务重试

下面是我在 zuul 实例中得到的日志:

o.s.web.servlet.DispatcherServlet        : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/api/stories]
o.s.web.servlet.DispatcherServlet        : Last-Modified value for [/api/stories] is: -1
c.n.zuul.http.HttpServletRequestWrapper  : Path = null
c.n.zuul.http.HttpServletRequestWrapper  : Transfer-Encoding = null
c.n.zuul.http.HttpServletRequestWrapper  : Content-Encoding = null
c.n.zuul.http.HttpServletRequestWrapper  : Content-Length header = -1
c.n.loadbalancer.ZoneAwareLoadBalancer   : Zone aware logic disabled or there is only one zone
c.n.loadbalancer.LoadBalancerContext     : storyteller-api using LB returned Server: localhost:7799 for request /api/stories

---> ATTEMPTING THE SLOW SERVICE

com.netflix.niws.client.http.RestClient  : RestClient sending new Request(GET: ) http://localhost:7799/api/stories
c.n.http4.MonitoredConnectionManager     : Get connection: ->http://localhost:7799, timeout = 250
com.netflix.http4.NamedConnectionPool    : [->http://localhost:7799] total kept alive: 1, total issued: 0, total allocated: 1 out of 200
com.netflix.http4.NamedConnectionPool    : No free connections [->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool    : Available capacity: 50 out of 50 [->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool    : Creating new connection [->http://localhost:7799]
com.netflix.http4.NFHttpClient           : Attempt 1 to execute request
com.netflix.http4.NFHttpClient           : Closing the connection.
c.n.http4.MonitoredConnectionManager     : Released connection is not reusable.
com.netflix.http4.NamedConnectionPool    : Releasing connection [->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool    : Notifying no-one, there are no waiting threads

--- HERE'S RIBBON'S TIMEOUT

c.n.l.reactive.LoadBalancerCommand       : Got error com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out when executed on server localhost:7799
c.n.loadbalancer.ZoneAwareLoadBalancer   : Zone aware logic disabled or there is only one zone
c.n.loadbalancer.LoadBalancerContext     : storyteller-api using LB returned Server: localhost:9977 for request /api/stories

---> HERE IT RETRIES

com.netflix.niws.client.http.RestClient  : RestClient sending new Request(GET: ) http://localhost:9977/api/stories
c.n.http4.MonitoredConnectionManager     : Get connection: ->http://localhost:9977, timeout = 250
com.netflix.http4.NamedConnectionPool    : [->http://localhost:9977] total kept alive: 1, total issued: 0, total allocated: 1 out of 200
com.netflix.http4.NamedConnectionPool    : Getting free connection [->http://localhost:9977][null]
com.netflix.http4.NFHttpClient           : Stale connection check
com.netflix.http4.NFHttpClient           : Attempt 1 to execute request
com.netflix.http4.NFHttpClient           : Connection can be kept alive indefinitely
c.n.http4.MonitoredConnectionManager     : Released connection is reusable.
com.netflix.http4.NamedConnectionPool    : Releasing connection [->http://localhost:9977][null]
com.netflix.http4.NamedConnectionPool    : Pooling connection [->http://localhost:9977][null]; keep alive indefinitely
com.netflix.http4.NamedConnectionPool    : Notifying no-one, there are no waiting threads
o.s.web.servlet.DispatcherServlet        : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
o.s.web.servlet.DispatcherServlet        : Successfully completed request
o.s.web.servlet.DispatcherServlet        : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/favicon.ico]
o.s.w.s.handler.SimpleUrlHandlerMapping  : Matching patterns for request [/favicon.ico] are [/**/favicon.ico]
o.s.w.s.handler.SimpleUrlHandlerMapping  : URI Template variables for request [/favicon.ico] are 
o.s.w.s.handler.SimpleUrlHandlerMapping  : Mapping [/favicon.ico] to HandlerExecutionChain with handler [ResourceHttpRequestHandler [locations=[ServletContext resource [/], class path resource [META-INF/resources/], class path resource [resources/], class path resource [static/], class path resource [public/], class path resource []], resolvers=[org.springframework.web.servlet.resource.PathResourceResolver@a0d875d]]] and 1 interceptor
o.s.web.servlet.DispatcherServlet        : Last-Modified value for [/favicon.ico] is: -1
o.s.web.servlet.DispatcherServlet        : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
o.s.web.servlet.DispatcherServlet        : Successfully completed request

【讨论】:

我已将reservation-service.ribbon.ConnectTimeout=250 reservation-service.ribbon.OkToRetryOnAllOperations=true reservation-service.ribbon.MaxAutoRetriesNextServer=2 reservation-service.ribbon.MaxAutoRetries=0 添加到我的配置中,但恐怕不能解决问题。 效果很好!我没有意识到功能区重试发生在同一个 Hystrix 命令中。我认为每次重试都会有自己的 Hystrix 命令。这样更有意义。

以上是关于Zuul/Ribbon/Hystrix 不在不同的实例上重试的主要内容,如果未能解决你的问题,请参考以下文章

不在不同线程中重新评估昂贵的数据

为啥 Plotly(在 Python3 中)不在折线图中制作不同的线?

谷歌地图 API 不在不同的大陆上绘制

Armadillo / Xcode:“仅返回类型不同的函数不能被重载”错误无处不在

OpenMP 不在不同的线程中运行这个 for 循环,我该如何修复它

两个不同 Numpy 数组中的点之间的最小欧几里得距离,不在