4.SpringCloud -- 服务降级熔断 HystrixSentinel

Posted 2021-09-26 爱是与世界平行

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了4.SpringCloud -- 服务降级熔断 HystrixSentinel相关的知识，希望对你有一定的参考价值。

4.SpringCloud -- 服务降级、熔断 Hystrix、Sentinel

一、引入服务降级、熔断
二、服务降级、熔断 -- Hystrix
三、服务降级、熔断 -- Sentinel

一、引入服务降级、熔断

1.1 问题与解决

【问题：】
    通过前面几篇博客介绍，完成了基本项目创建、服务注册中心、服务调用 以及 负载均衡（也即 各个模块 已经能正常通信、共同对外提供服务了）。
    
    对于一个复杂的分布式系统来说，可能存在数十个模块，且模块之间可能会相互调用（嵌套），
    这就带来了一个问题：
        如果某个核心模块突然宕机（或者不能提供服务了），那么所有调用该 核心模块服务 的模块 将会出现问题，
        类似于 病毒感染，一个模块出现问题，将逐步感染其他模块出现问题，最终导致系统崩溃（也即服务雪崩）。

【服务雪崩：】
    服务雪崩 指的是 服务提供者 不可用（不能提供服务） 而导致 服务消费者不可用，并逐级放大的过程。
    比如：
        多个微服务之间形成链式调用，A、B 调用 C，C 调用 D，D 调用其他服务等。。。
        如果 D 因某种原因（宕机、网络延迟等） 不能对外提供服务了，将导致 C 访问出现问题，而 C 出现问题，将可能导致 A、B 出现问题，也即 问题逐级放大（最终可能引起系统崩溃）。

【解决：】
    服务降级、服务熔断 是解决 服务雪崩的 常用手段。
相关技术：
    Hystrix（维护状态，不推荐使用）
    Sentienl（推荐使用）

1.2 服务降级与服务熔断

（1）服务降级

【服务降级：】
    服务降级 指的是 当服务器压力 剧增 时，根据当前 业务、流量 情况 对一些服务（一般为非核心业务）进行有策略的降级，确保核心业务正常执行。
    即 释放非核心服务 占用的服务器资源 确保 核心任务正常执行。
注：
    可以理解为 损失一部分业务能力，保证系统整体正常运行，从而防止 服务雪崩。
    资源是有限的，请求并发高时，若不对服务进行降级处理，系统可能花费大量资源进行非核心业务处理，导致 核心业务 效率降低，进而影响整体服务性能。
    此处的降级可以理解为 不提供服务 或者 延时提供服务（服务执行暂时不正常，给一个默认的返回结果，等一段时间后，正常提供服务）。
    
【服务降级分类：】
手动降级：
    可以通过修改配置中心配置，并根据事先定义好的逻辑，执行降级逻辑。

自动降级：
    超时降级：设置超时时间、超时重试次数，请求超时则服务降级，并使用异步机制检测 进行 服务恢复。
    失败次数降级：当请求失败达到一定次数则服务降级，同样使用异步机制检测 进行服务恢复。
    故障降级：服务宕机了则服务降级。
    限流降级：请求访问量过大则服务降级。

（2）服务熔断

【服务熔断：】
    服务熔断 指的是 目标服务不可用 或者 请求响应超时时，为了保证整体服务可用，
    不再调用目标服务，而是直接返回默认处理（释放系统资源），通过某种算法检测到目标服务可用后，则恢复其调用。
注：
    在一定时间内，服务调用失败次数达到一定比例，则认为 当前服务不可用。
    服务熔断 可以理解为 特殊的 服务降级（即 服务不可用 --> 服务降级 --> 服务调用恢复）。 

【martinfowler 相关博客地址：】
    https://martinfowler.com/bliki/CircuitBreaker.html

（3）服务降级和服务熔断的区别

【相同点：】
    目标相同：均从 可靠性、可用性 触发，避免系统崩溃（服务雪崩）。
    效果相同：均属于 某功能暂不可用。
    
【不同点：】
    服务降级 一般 是从整体考虑，可以手动关闭 非核心业务，确保 核心业务正常执行。
    服务熔断 一般 是某个服务不可用，自动关闭 服务调用，并在一定时间内 重新尝试 恢复该服务调用。
注（个人理解（仅供参考））：
    服务降级 可以作为 预防措施（手动降级），即 服务并没有出错，但是为了提升系统效率，我主动放弃 一部分非核心业务，保证系统资源足够用于 执行 核心业务。
    服务熔断 就是 服务出错的 解决方案（自动降级），即 服务出错后 的一系列处理。

二、服务降级、熔断 – Hystrix

2.1 什么是 Hystrix ？

【Hystrix：】
    Hystrix 是一个用于处理分布式系统 延迟 和 容错的 开源库，
    目的是 隔离远程系统、服务和第三方库的访问点，停止级联故障，并在不可避免发生故障的复杂分布式系统中实现恢复能力。
注：
    分布式系统难免出现 阻塞、超时、异常 等问题，Hystrix 可以保证在一个服务出问题时，不影响整个系统使用（避免服务雪崩），提高系统的可用性。
    虽然 Hystrix 已进入维护模式，不再更新，但还是可以学习一下思想、基本使用。

【常用特性：】
    服务降级
    服务熔断
    服务监控

【相关地址：】
    https://github.com/Netflix/Hystrix

2.2 使用 JMeter 模拟超时故障发生

（1）什么是 JMeter ？

【JMeter】
    Apache 的一款基于 Java 的压力测试工具。
注：
    有兴趣的自行研究，此处不过多赘述。
    
【官网下载地址：】
    http://jmeter.apache.org/download_jmeter.cgi
    
【JMeter 简单使用：】
    https://www.cnblogs.com/stulzq/p/8971531.html

（2）说明

【说明：】
    此处仅简单演示，不需要启动集群（单机版 Eureka 即可）。
    eureka_server_7000 作为 服务注册中心。
    eureka_client_producer_8001 作为服务提供者。
    eureka_client_consumer_9001 作为服务提供者。 
注：
    单机版 Eureka 可参考：https://www.cnblogs.com/l-y-h/p/14193443.html#_label2_1
    此处使用 RestTemplate 发送请求，使用上一篇 讲到的 OpenFeign 技术亦可。

【演示说明：】
    在 eureka_client_producer_8001 新定义一个接口 testTimeout()，内部暂停 2 秒模拟业务处理所需时间。
    一般情况下，访问 eureka_client_producer_8001 提供的 getUser() 接口时，会立即响应。
    
    但是如果大量请求访问 testTimeout()，而将系统资源（线程）耗尽时，
    此时若有请求访问 getUser() 就需要等待 前面请求执行完成后，才能继续处理。
    而此时就可能造成 超时等待 的情况，从而引起一系列问题。

即：   
    并发度低时：
        先访问 /consumer/user/testTimeout，再访问 /consumer/user/get/{id} 可以瞬间返回结果。

    并发度高时：
        若有大量请求访问 /consumer/user/testTimeout，导致系统资源（线程）暂时耗尽，
        此时再访问  /consumer/user/get/{id} 就需要等待一些时间才能返回结果。
        严重时请求会出现超时故障，从而引起系统异常。

（3）定义接口

在 eureka_client_producer_8001 中定义一个新接口 testTimeout()。
　　在 eureka_client_consumer_9001 中定义一个新接口调用 testTimeout()。

【eureka_client_producer_8001：】
@GetMapping("/testTimeout")
public Result testTimeout() {
    try {
        Thread.sleep(2000);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    return Result.ok();
}

【eureka_client_consumer_9001：】
@GetMapping("/testTimeout")
public Result testTimeout() {
    return restTemplate.getForObject(PRODUCER_URL + "/producer/user/testTimeout", Result.class);
}

（4）启动服务，并使用 JMeter 测试

并发度低时：
　　先访问 /consumer/user/testTimeout，再访问 /consumer/user/get/{id} 可以瞬间返回结果。
注：
　　看页面的刷新按钮。

并发度高时：
　　使用 JMeter 模拟 200 个线程，循环 100 次，访问 /consumer/user/testTimeout。
　　此时再访问 /consumer/user/get/{id} 时，不能瞬间返回结果（等待一段时间）。

（5）超时故障

前面已经演示了高并发情况下可能出现超时等待情况，而若业务执行时间过长或者服务调用设置了超时时间，那么当访问被阻塞时，将有可能引起故障。

【在声明 RestTemplate 时，定义超时时间】
@Bean
@LoadBalanced // 使用 @LoadBalanced 注解赋予 RestTemplate 负载均衡的能力
public RestTemplate getRestTemplate() {
    SimpleClientHttpRequestFactory httpRequestFactory = new SimpleClientHttpRequestFactory();
    httpRequestFactory.setConnectTimeout(2000);
    httpRequestFactory.setReadTimeout(2000);
    return new RestTemplate(httpRequestFactory);
}

2.3 Hystrix 实现服务降级

（1）服务降级使用场景

服务降级目的是防止服务雪崩，本质也就是在服务调用出问题时，应该如何处理。

【服务降级使用场景：】
    服务器资源耗尽，请求响应慢，导致请求超时。
    服务器宕机 或者 程序执行出错，导致请求出错。
即：
    服务提供者 响应请求超时了，服务消费者 不能一直等待，需要 服务提供者进行 服务降级，保证 请求在一定的时间内被处理。
    服务提供者 宕机了，服务消费者 不能一直等待，需要 服务消费者进行 服务降级，保证 请求在一定的时间内被处理。
    服务提供者正常，但 服务消费者 出现问题了，需要服务消费者 自行 服务降级。
    
注：
    服务降级一般在 服务消费者 中处理，服务提供者 也可以 进行处理。

（2）在服务提供者上实现服务降级（超时自动降级）

在 eureka_client_producer_8001 代码基础上进行补充。
Step1：
　　引入 hystrix 依赖。

【引入依赖：】
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>

Step2：
　　通过 @HystrixCommand 注解编写服务降级策略。

【简单说明：】
    @HystrixCommand 表示指定 服务降级 或者 服务熔断的策略。
    fallbackMethod 表示服务调用失败(请求超时 或者 程序执行异常)后执行的方法（方法参数要与 原方法一致）。
    commandProperties 表示配置参数。
    @HystrixProperty 设置具体参数。
注：
    详细参数情况可以参考 HystrixCommandProperties 类。
    com.netflix.hystrix.HystrixCommandProperties 


【定义服务降级策略:】
public Result testTimeoutReserveCase() {
    return Result.ok().message("当前服务器繁忙，请稍后再试！！！");
}

// 定义服务降级策略
@HystrixCommand(
        // 当请求超时 或者 接口异常时，会调用 fallbackMethod 声明的方法（方法参数要一致）
        fallbackMethod = "testTimeoutReserveCase",
        commandProperties = {
                @HystrixProperty(name="execution.isolation.thread.timeoutInMilliseconds", value="1500")
        }
)
@GetMapping("/testTimeout")
public Result testTimeout() {
    try {
        Thread.sleep(500);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    return Result.ok();
}

Step3：
　　在启动类上添加 @EnableCircuitBreaker 注解，开启服务降级、熔断。

Step4：
　　运行测试（此处演示的是超时自动降级）。
　　此处定义接口超时时间为 1.5 秒，模拟 0.5 秒业务处理时间，使用 JMeter 压测该接口时，与上面演示的类似，会出现请求超时的情况，而一旦请求超时，则会触发 fallbackMethod 方法，直接返回数据，而不会持续等待。
如下图所示。

（3）配置默认服务降级方法

通过上面简单演示可以完成服务降级，但是存在一个问题，如果为每一个接口都绑定一个 fallbackMethod，那么代码将非常冗余。
　　通过 @DefaultProperties 注解定义一个默认的 defaultFallback 方法，接口异常时调用默认的方法，并仅对特殊的接口进行单独处理，从而减少代码冗余。

如下，新增一个运行时异常，访问接口时，将会调用 globalFallBackMethod() 方法。
而前面特殊定义的 testTimeout 超时后，仍调用 testTimeout_reserve_case() 方法。

@DefaultProperties(defaultFallback = "globalFallBackMethod")
public class UserController {
    public Result globalFallBackMethod() {
        return Result.ok().message("系统异常，请稍后再试！！！");
    }

    @GetMapping("/testRuntimeError")
    @HystrixCommand
    public Result testRuntimeError() {
        int temp = 10 / 0;
        return Result.ok();
    }
}

2.4 OpenFeign 实现服务降级

（1）说明

【说明：】
    上面使用 Hystrix 简单演示了 服务提供者 的服务降级。
    这里使用 OpenFeign 演示 服务消费者 的服务降级。
注：
    重新新建一个模块 eureka_openfeign_client_consumer_9007 作为服务消费者用于演示。
    可参考上一篇 OpenFeign 的使用：https://www.cnblogs.com/l-y-h/p/14238203.html#_label3_2
    服务提供者仍然是 eureka_client_producer_8001。

（2）配置 OpenFeign 基本代码环境

Step1：
　　创建模块 eureka_openfeign_client_consumer_9007。
　　修改父工程与当前工程 pom.xml 文件。
　　修改配置类。
　　在启动类上添加 @EnableFeignClients 注解。

【依赖：】
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-openfeign</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>

【application.yml】
server:
  port: 9007
spring:
  application:
    name: eureka-openfeign-client-consumer

eureka:
  instance:
    appname: eureka-openfeign-client-consumer-9007 # 优先级比 spring.application.name 高
    instance-id: ${eureka.instance.appname} # 设置当前实例 ID
  client:
    register-with-eureka: true # 默认为 true，注册到 注册中心
    fetch-registry: true # 默认为 true，从注册中心 获取 注册信息
    service-url:
      # 指向 注册中心 地址，也即 eureka_server_7000 的地址。
      defaultZone: http://localhost:7000/eureka

# 设置 OpenFeign 超时时间（OpenFeign 默认支持 Ribbon）
ribbon:
  # 指的是建立连接所用的超时时间
  ConnectTimeout: 2000
  # 指的是建立连接后从服务器获取资源的超时时间（即请求处理的超时时间）
  ReadTimeout: 2000

Step2：
　　使用 @FeignClient 编写服务调用。

【ProducerFeignService：】
package com.lyh.springcloud.eureka_openfeign_client_consumer_9007.service;

import com.lyh.springcloud.common.tools.Result;
import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.stereotype.Component;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;

@FeignClient(value = "EUREKA-CLIENT-PRODUCER-8001")
@Component
public interface ProducerFeignService {
    @GetMapping("/producer/user/get/{id}")
    Result getUser(@PathVariable Integer id);

    @GetMapping("/producer/user/testTimeout")
    Result testFeignTimeout();

    @GetMapping("/producer/user/testRuntimeError")
    Result testRuntimeError();
}

Step3：
　　编写 controller，并进行测试 openfeign 是否能成功访问服务。

【ConsumerController】
package com.lyh.springcloud.eureka_openfeign_client_consumer_9007.controller;

import com.lyh.springcloud.common.tools.Result;
import com.lyh.springcloud.eureka_openfeign_client_consumer_9007.service.ProducerFeignService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/consumer/user")
public class ConsumerController {
    @Autowired
    private ProducerFeignService producerFeignService;

    @GetMapping("/get/{id}")
    public Result getUser(@PathVariable Integer id) {
        return producerFeignService.getUser(id);
    }

    @GetMapping("/testTimeout")
    public Result testFeignTimeout() {
        return producerFeignService.testFeignTimeout();
    }

    @GetMapping("/testRuntimeError")
    public Result testFeignRuntimeError() {
        return producerFeignService.testRuntimeError();
    }
}

（3）OpenFeign 实现服务降级

【步骤：】
Step1：在配置文件中，配置 feign.feign.enabled=true，开启服务降级。
Step2：定义一个 实现类，实现 服务调用的 接口，并为每个方法重写 调用失败的逻辑。
Step3：在 @FeignClient 注解中，通过 fallback 参数指定 该实现类。

Step1：
　　在配置文件中，开启服务降级。

【application.yml】
# 开启服务降级
feign:
  hystrix:
    enabled: true

Step2：
　　定义一个实现类，实现服务调用的接口。
　　@Component 注解不要忘了，启动时可能会报错。

【ProducerFeignServiceImpl：】
package com.lyh.springcloud.eureka_openfeign_client_consumer_9007.service.impl;

import com.lyh.springcloud.common.tools.Result;
import com.lyh.springcloud.eureka_openfeign_client_consumer_9007.service.ProducerFeignService;
import org.springframework.stereotype.Component;

@Component
public class ProducerFeignServiceImpl implements ProducerFeignService {
    @Override
    public Result getUser(Integer id) {
        return Result.ok().message("系统异常，请稍后再试 -- 11111111111");
    }

    @Override
    public Result testFeignTimeout() {
        return Result.ok().message("系统异常，请稍后再试 -- 222222222222");
    }

    @Override
    public Result testRuntimeError() {
        return Result.ok().message("系统异常，请稍后再试 -- 333333333333");
    }
}

注：
　　未添加 @Component 注解，启动会报下面的错误。

【报错信息：】
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'consumerController': 
Unsatisfied dependency expressed through field 'producerFeignService'; nested exception is org.springframework.beans.factory.BeanCreationException: 
Error creating bean with name 'com.lyh.springcloud.eureka_openfeign_client_consumer_9007.service.ProducerFeignService': 
FactoryBean threw exception on object creation; nested exception is java.lang.IllegalStateException: 
No fallback instance of type class com.lyh.springcloud.eureka_openfeign_client_consumer_9007.service.impl.ProducerFeignServiceImpl found for feign client EUREKA-CLIENT-PRODUCER-8001

Step3：
　　在 @FeignClient 注解上，通过 fallback 参数指定上面定义的实现类。

package com.lyh.springcloud.eureka_openfeign_client_consumer_9007.service;

import com.lyh.springcloud.common.tools.Result;
import com.lyh.springcloud.eureka_openfeign_client_consumer_9007.service.impl.ProducerFeignServiceImpl;
import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.stereotype.Component;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;

@FeignClient(value = "EUREKA-CLIENT-PRODUCER-8001", fallback = ProducerFeignServiceImpl.class)
@Component
public interface ProducerFeignService {
    @GetMapping("/producer/user/get/{id}")
    Result getUser(@PathVariable Integer id);

    @GetMapping("/producer/user/testTimeout")
    Result testFeignTimeout();

    @GetMapping("/producer/user/testRuntimeError")
    Result testRuntimeError();
}