redis死链 org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception

Posted wangjun5159

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了redis死链 org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception相关的知识,希望对你有一定的参考价值。

问题

org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70)
	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
	at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
	at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:268)
	at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.convertLettuceAccessException(LettuceKeyCommands.java:817)
	at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.exists(LettuceKeyCommands.java:80)
	at org.springframework.data.redis.connection.DefaultedRedisConnection.exists(DefaultedRedisConnection.java:55)
	at org.springframework.data.redis.core.RedisTemplate.lambda$hasKey$6(RedisTemplate.java:770)
	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:225)
	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:185)
	at org.springframework.data.redis.core.RedisTemplate.hasKey(RedisTemplate.java:770)

系统间歇性出现上述bug,异常信息很直白,redis 命令执行超时,出现此异常时,分别在客户端和服务器端查看网络连接(netstat -anp|grep 6379),发现了了两种情况,
第一种,客户端链接还在,服务器端链接不存在。
第二种,客户端和服务器端链接都在。
查阅资料得知,redis服务器有保活机制,默认5分钟发送一次心跳,以防止客户端意外中断,在/data/redis-$version/redis.conf

# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
# TCP keepalive.
#
# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence
# of communication. This is useful for two reasons:
#
# 1) Detect dead peers.
# 2) Take the connection alive from the point of view of network
#    equipment in the middle.
#
# On Linux, the specified value (in seconds) is the period used to send ACKs.
# Note that to close the connection the double of the time is needed.
# On other kernels the period depends on the kernel configuration.
#
# A reasonable value for this option is 300 seconds, which is the new
# Redis default starting with Redis 3.2.1.
tcp-keepalive 300

通过wireshark抓包发现,服务器端确实是每隔5分钟发送一次ACK,与配置文件是对应起来的。百度得知,很多人遇到过此问题,说是将lettuce换位jedis就能解决,至于为什么却没说!查看maven pom.xml文件

<dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
 </dependency>

而spring-boot-starter-data-redis确实是依赖了lettuce-core

<dependency>
      <groupId>org.springframework.data</groupId>
      <artifactId>spring-data-redis</artifactId>
      <version>2.1.15.RELEASE</version>
      <scope>compile</scope>
      <exclusions>
        <exclusion>
          <artifactId>jcl-over-slf4j</artifactId>
          <groupId>org.slf4j</groupId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>io.lettuce</groupId>
      <artifactId>lettuce-core</artifactId>
      <version>5.1.8.RELEASE</version>
      <scope>compile</scope>
    </dependency>

解决

更改为jedis依赖

 <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
    <exclusions>
        <exclusion>
            <artifactId>lettuce-core</artifactId>
            <groupId>io.lettuce</groupId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
</dependency>

将lettuce更改为jedis后,抓包发现,确实有心跳机制,redis客户端定时向服务器发送心跳包。

原理

通过看源码找到jedis时的配置,

redis.clients.jedis.JedisPoolConfig
public class JedisPoolConfig extends GenericObjectPoolConfig 
  public JedisPoolConfig() 
    // defaults to make your life with connection pool easier :)
    //在父类GenericObjectPoolConfig中默认值是false,lettuce时没有设置此字段使用了默认值false,所以lettuce没有心跳
    setTestWhileIdle(true);
    setMinEvictableIdleTimeMillis(60000);
    setTimeBetweenEvictionRunsMillis(30000);
    setNumTestsPerEvictionRun(-1);
  

上一步设置了 setTestWhileIdle(true),GenericObjectPool会定时校验链接合法性

org.apache.commons.pool2.impl.GenericObjectPool#evict
//因为上一步,这里的testWhileIdle是true
 if (testWhileIdle) 
    boolean active = false;
    try 
        factory.activateObject(underTest);
        active = true;
     catch (final Exception e) 
        destroy(underTest);
        destroyedByEvictorCount.incrementAndGet();
    
    if (active) 
    //发送心跳,调用factory.validateObject方法
        if (!factory.validateObject(underTest)) 
            destroy(underTest);
            destroyedByEvictorCount.incrementAndGet();
         else 
            try 
                factory.passivateObject(underTest);
             catch (final Exception e) 
                destroy(underTest);
                destroyedByEvictorCount.incrementAndGet();
            
        
    

jedis的factory就是redis.clients.jedis.JedisFactory

redis.clients.jedis.JedisFactory#validateObject
public boolean validateObject(PooledObject<Jedis> pooledJedis) 
    final BinaryJedis jedis = pooledJedis.getObject();
    try 
      HostAndPort hostAndPort = this.hostAndPort.get();

      String connectionHost = jedis.getClient().getHost();
      int connectionPort = jedis.getClient().getPort();

      return hostAndPort.getHost().equals(connectionHost)
          && hostAndPort.getPort() == connectionPort && jedis.isConnected()
          //发送了PING并回复了PONG,则认为链接可用,wireshark抓包得知,客户端的确发送了PING,服务端恢复了PONG
          && jedis.ping().equals("PONG");
     catch (final Exception e) 
      return false;
    
  

总结

至此真像大白,当网络出现波动,通信中间节点将链接桥段,因为lettuce没有心跳,链接变为死链了,但没有校验机制所以仍在连接池上,当使用到死链时,就会执行超时;
jedis设置了testWhileIdle是true,会对空闲链接进行校验,如果发现死链则会从连接池中销毁,所以jedis不会出现问题。
至于开头提到的遇到死链的两种情况,

  • 客户端有链接,服务器端没链接;
    这种情况很有可能是因为网络波动,通信的中间节点掐断了链接,但客户端不知晓所以客户端的链接仍然在,而服务器端因为会向客户端发送心跳,所以服务器端把链接删掉了;
  • 客户端和服务器端都有链接;
    这种情况跟上边一种情况原因是相同的,很有可能是因为通信的中间节点掐断了链接,但客户端没有心跳所以客户端的链接仍然在,而服务器端每隔5分钟发送一次心跳,可能还未到发送心跳时刻,所以服务器端也认为链接可用。当到了5分钟,服务器端发送心跳发现链接不可用时,关闭服务器端链接,就会变成第一种情况。

以上是关于redis死链 org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception的主要内容,如果未能解决你的问题,请参考以下文章

Caused by java lang NoClassDefFoundError org/springframewo

SpringBoot入门二十三,整合Redis

百度站长死链规则提交是啥意思

什么是网站死链?如何查询网站死链?网站死链怎么解决?

什么是网站死链?如何查询网站死链?网站死链怎么解决?

什么是死链接,为什么要检查死链?