Redis Sentinel高可用架构

Posted 陆炫志

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Redis Sentinel高可用架构相关的知识,希望对你有一定的参考价值。

     Redis的高可用架构现在越来越多了,可以见得Redis的发展是有多么的迅速,现在不少公司都用上了Redis,所以Redis高可用也显得尤其重要,现在Redis的高可用架构有比如keepalived+redis,redis cluster,twemproxy,codis,下面我们主要针对Redis Sentinel高可用架构展开学习。

Redis Sentinel主要功能有以下几点:

  • 不时地监控redis是否按照预期良好地运行;

  • 如果发现某个redis节点运行出现状况,能够通知另外一个进程(例如它的客户端);

  • 能够进行自动切换。当一个master节点不可用时,能够选举出master的多个slave(如果有超过一个slave的话)中的一个来作为新的master,其它的slave节点会将它所追随的master的地址改为被提升为master的slave的新地址。

    Sentinel是一个监视器,它可以根据被监视实例的身份和状态来判断应该执行何种动作。Sentinel是如何发现其他Sentinel的呢?Sentinel会通过命令连接向被监视的主从服务器发送HELLO信息,该消息包含Sentinel的IP、端口号、ID等内容,以此来向其他Sentinel宣告自己的存在。与此同时,Sentinel会通过订阅连接接收其他Sentinel的HELLO信息,以此来发现监视同一个主服务器的其他Sentinel。

Sentinel之间会互相创建命令连接,用于进行通信。因为已经有主从服务器作发送和接收HELLO信息的中介,所以Sentinel之间不会创建订阅连接:

 

 以下是Redis Sentinel的架构图,Sentinel节点数最好是单数,至于为什么,请参考以下的资料:

http://segmentfault.com/a/1190000002680804

http://segmentfault.com/a/1190000002685515

 

下面进行Redis Sentinel的部署和测试,本次实验的版本是redis-3.0.7版本,环境说明:

 192.168.10.128  Sentinel_1
 192.168.10.129  Sentinel_2
 192.168.10.130  Sentinel_3
 192.168.10.131  Redis_Master
 192.168.10.132  Redis_Slave

一、 在五台服务器上分别执行下redis-3.0.7的安装,以Sentinel_1服务为例:

[root@Sentinel_1 ~]# wget http://download.redis.io/releases/redis-3.0.7.tar.gz
[root@Sentinel_1 ~]# tar xf redis-3.0.7.tar.gz 
[root@Sentinel_1 ~]# cd redis-3.0.7/src/
[root@Sentinel_1 ~]# make PREFIX=/data/service/redis install

安装完成后,会在/data/service/redis下会产生一个bin目录:

[root@Sentinel_1 ~]# ll /data/service/redis/
total 12
drwxr-xr-x. 2 root root 4096 Mar  7 19:19 bin
[root@Sentinel_1 ~]# 

分别在五台服务器上添加redis的bin目录的环境变量(不是必需的),方便命令的使用,编辑vim /etc/profile.d/redis.sh 添加以下内容:

export PATH=/data/service/redis/bin:$PATH

执行source /etc/profile.d/redis.sh 让环境变量生效:

[root@Sentinel_1 ~]# source /etc/profile.d/redis.sh

 

二、配置Redis主从环境,主从环境的部署很简单,这里不演示搭建过程,Redis_Master: 192.168.10.131  Redis_Slave: 192.168.10.132

Redis_Master启动的Log:

[root@Redis_Master redis]# tail -f logs/redis_6379.log 
1974:M 07 Mar 22:03:05.381 * DB loaded from disk: 0.001 seconds
1974:M 07 Mar 22:03:05.381 * The server is now ready to accept connections on port 6379
1974:M 07 Mar 22:03:44.592 * Slave 192.168.10.132:6379 asks for synchronization
1974:M 07 Mar 22:03:44.593 * Full resync requested by slave 192.168.10.132:6379
1974:M 07 Mar 22:03:44.593 * Starting BGSAVE for SYNC with target: disk
1974:M 07 Mar 22:03:44.594 * Background saving started by pid 1977
1977:C 07 Mar 22:03:44.632 * DB saved on disk
1977:C 07 Mar 22:03:44.632 * RDB: 4 MB of memory used by copy-on-write
1974:M 07 Mar 22:03:44.649 * Background saving terminated with success
1974:M 07 Mar 22:03:44.650 * Synchronization with slave 192.168.10.132:6379 succeeded

在Redis_Slave启动的Log:

[root@Redis_Slave redis]# tail -f logs/redis_6379.log 
2437:S 07 Mar 22:03:44.246 * Connecting to MASTER 192.168.10.131:6379
2437:S 07 Mar 22:03:44.247 * MASTER <-> SLAVE sync started
2437:S 07 Mar 22:03:44.262 * Non blocking connect for SYNC fired the event.
2437:S 07 Mar 22:03:44.268 * Master replied to PING, replication can continue...
2437:S 07 Mar 22:03:44.269 * Partial resynchronization not possible (no cached master)
2437:S 07 Mar 22:03:44.270 * Full resync from master: 5d1fbf46ddd1eb0a7728abbbad61e78908dd7963:1
2437:S 07 Mar 22:03:44.326 * MASTER <-> SLAVE sync: receiving 34 bytes from master
2437:S 07 Mar 22:03:44.326 * MASTER <-> SLAVE sync: Flushing old data
2437:S 07 Mar 22:03:44.328 * MASTER <-> SLAVE sync: Loading DB in memory
2437:S 07 Mar 22:03:44.329 * MASTER <-> SLAVE sync: Finished with success

可以看到主从环境是正常的!

 

三、进行Sentinel配置,及配置文件的解释。

在三台Sentinel服务器下创建conf目录和log目录,存放配置文件和log:

[root@Sentinel_1 ~]# mkdir -p /data/service/redis/sentinel/conf

[root@Sentinel_1 ~]# mkdir -p /data/service/redis/sentinel/log

 进到conf目录,编辑文件26379.conf,三台Sentinel服务器,配置都一样:

[root@Sentinel_1 conf]# pwd
/data/service/redis/sentinel/conf
[root@Sentinel_1 conf]# cat 26379.conf 
port 26379
dir "/data/service/redis/sentinel"
daemonize yes
logfile "/data/service/redis/sentinel/log/sentinel.log"

# 6379
sentinel monitor master-6379 192.168.10.131 6379 2
sentinel down-after-milliseconds master-6379 15000
sentinel parallel-syncs master-6379 1
sentinel failover-timeout master-6379 180000
sentinel auth-pass master-6379 123456
sentinel client-reconfig-script master-6379 /data/script/python/notify.py
[root@Sentinel_1 conf]# 

26379.conf配置文件解释:
1、前4行是定义sentinel的一些基本信息,跟redis很类似,不作过多解释。

2、sentinel monitor master-6379 192.168.10.131 6379 2(这一行代表sentinel监控的master的名字叫做master-6379,地址为192.168.10.131:6379,这个2代表,当集群中有2个sentinel认为master死了时,才能真正认为该master已经不可用了)

3、down-after-milliseconds (sentinel会向master发送心跳PING来确认master是否存活,如果master在“一定时间范围”内不回应PONG 或者是回复了一个错误消息,那么这个sentinel会主观地(单方面地)认为这个master已经不可用,而这个down-after-milliseconds就是用来指定这个“一定时间范围”的,单位是毫秒。

4、parallel-syncs(在发生failover主备切换时,这个选项指定了最多可以有多少个slave同时对新的master进行同步,这个数字越小,完成failover所需的时间就越长,但是如果这个数字越大,就意味着越多的slave因为replication而不可用。可以通过将这个值设为 1 来保证每次只有一个slave处于不能处理命令请求的状态

5、failover-timeout(sentinel集群都遵守一个规则:如果sentinel A推荐sentinel B去执行failover,B会等待一段时间后,自行再次去对同一个master执行failover,这个等待的时间是通过failover-timeout配置项去配置的。从这个规则可以看出,sentinel集群中的sentinel不会再同一时刻并发去failover同一个master,第一个进行failover的sentinel如果失败了,另外一个将会在一定时间内进行重新进行failover,以此类推

6、auth-pass(这选项主要针对redis master/slave架构设置了密码认证,如果配置主从时没有设定密码,就不需要些选项,若有密码,这里要指定连接的密码)

7、client-reconfig-script (该参数是定义故障转移脚本,当master故障转移后,执行发短信或者IP切换等)

 

故障转移后发邮件的notify.py脚本是参考了大神的博客:http://www.cnblogs.com/gomysql/p/5040847.html

#!/usr/bin/python
#coding:utf8

import sys
import time
import smtplib
import logging
from email.mime.text import MIMEText
from email.message import Message
from email.header import Header


alarm_mail =[\'1111111111@qq.com\']

def main():
  
    failover_time=time.strftime("%Y-%m-%d %H:%M:%S")

    logging.basicConfig(level=logging.DEBUG,
                format=\'%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s\',
                datefmt=\'%Y-%m-%d %H:%M:%S\',
                filename=\'/data/service/redis/failover.log\',
                filemode=\'a\')

    console = logging.StreamHandler()
    console.setLevel(logging.INFO)
    formatter = logging.Formatter(\'%(name)-12s: %(levelname)-8s %(message)s\')
    console.setFormatter(formatter)
    logging.getLogger(\'\').addHandler(console)

    mail_host=\'smtp.163.com\'
    mail_port=25
    mail_user=\'\'
    mail_pass=\'\'
    mail_send_from = \'\'

    def send_mail(to_list,sub,content):
        me=mail_send_from
        msg = MIMEText(content, _subtype=\'html\', _charset=\'utf-8\')
        msg[\'Subject\'] = Header(sub,\'utf-8\')
        msg[\'From\'] = Header(me,\'utf-8\')
        msg[\'To\'] = ";".join(to_list)
        try:
            smtp = smtplib.SMTP()
            smtp.connect(mail_host,mail_port)
            smtp.login(mail_user,mail_pass)
            smtp.sendmail(me,to_list, msg.as_string())
            smtp.close()
            return True
        except Exception as error:
            logging.error("邮件发送失败: %s" % (error))
            return False

    try:
        master_name = sys.argv[1]
        role = sys.argv[2]
        from_ip = sys.argv[4]
        from_port = sys.argv[5]
        to_ip = sys.argv[6]
        to_port = sys.argv[7]
    except Exception as error:
        logging.error(\'从 Sentinel 获取参数错误: %s \' % (error))
        sys.exit(1)

    sub=\'redis %s faiover\' % (master_name)
    nodify_message = "%s %s is failover end. sentinel find redis master %s:%s is down. failover to slave %s:%s" % (failover_time,master_name,from_ip,from_port,to_ip,to_port)
    
    if role == \'leader\':
        logging.info(nodify_message)
        send_mail(alarm_mail,sub,nodify_message)

if __name__ == "__main__":
    main()
View Code

 

四、下面启动Sentinel服务,启动方式有两种:

方式一:

redis-sentinel /path/to/sentinel.conf

方式二:

redis-server /path/to/sentinel.conf --sentinel

我习惯用第一种方法,分别在三台Sentinel服务器进行启动:

第一台Sentinel_1启动log:

[root@Sentinel_1 sentinel]# redis-sentinel /data/service/redis/sentinel/conf/26379.conf 
[root@Sentinel_1 sentinel]# tail -f log/sentinel.log 
 |    `-._`-._        _.-\'_.-\'    |                                  
  `-._    `-._`-.__.-\'_.-\'    _.-\'                                   
      `-._    `-.__.-\'    _.-\'                                       
          `-._        _.-\'                                           
              `-.__.-\'                                               

5153:X 07 Mar 22:37:16.290 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
5153:X 07 Mar 22:37:16.290 # Sentinel runid is 21e629e6d2b26682e660258787d5fb995010e6c8
5153:X 07 Mar 22:37:16.290 # +monitor master master-6379 192.168.10.131 6379 quorum 2
5153:X 07 Mar 22:37:17.330 * +slave slave 192.168.10.132:6379 192.168.10.132 6379 @ master-6379 192.168.10.131 6379
5153:X 07 Mar 22:38:29.406 * +sentinel sentinel 192.168.10.129:26379 192.168.10.129 26379 @ master-6379 192.168.10.131 6379
5153:X 07 Mar 22:38:45.024 * +sentinel sentinel 192.168.10.130:26379 192.168.10.130 26379 @ master-6379 192.168.10.131 6379

第二台Sentinel_2启动log:

[root@Sentinel_2 sentinel]# redis-sentinel /data/service/redis/sentinel/conf/26379.conf
[root@Sentinel_2 sentinel]# tail -f log/sentinel.log 
  `-._    `-._`-.__.-\'_.-\'    _.-\'                                   
      `-._    `-.__.-\'    _.-\'                                       
          `-._        _.-\'                                           
              `-.__.-\'                                               

4647:X 07 Mar 22:38:27.570 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
4647:X 07 Mar 22:38:27.570 # Sentinel runid is f391228f430177d881464e908c683bfc73d61c24
4647:X 07 Mar 22:38:27.571 # +monitor master master-6379 192.168.10.131 6379 quorum 2
4647:X 07 Mar 22:38:28.582 * +slave slave 192.168.10.132:6379 192.168.10.132 6379 @ master-6379 192.168.10.131 6379
4647:X 07 Mar 22:38:29.218 * +sentinel sentinel 192.168.10.128:26379 192.168.10.128 26379 @ master-6379 192.168.10.131 6379
4647:X 07 Mar 22:38:45.200 * +sentinel sentinel 192.168.10.130:26379 192.168.10.130 26379 @ master-6379 192.168.10.131 6379

第三台Sentinel_3启动log:

[root@Sentinel_3 sentinel]# redis-sentinel /data/service/redis/sentinel/conf/26379.conf
[root@Sentinel_3 sentinel]# tail -f log/sentinel.log 
      `-._    `-.__.-\'    _.-\'                                       
          `-._        _.-\'                                           
              `-.__.-\'                                               

2115:X 07 Mar 22:38:43.161 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2115:X 07 Mar 22:38:43.161 # Sentinel runid is 7fbee9138d4e5c1e2def7bbc4f888cef04d95677
2115:X 07 Mar 22:38:43.161 # +monitor master master-6379 192.168.10.131 6379 quorum 2
2115:X 07 Mar 22:38:44.167 * +slave slave 192.168.10.132:6379 192.168.10.132 6379 @ master-6379 192.168.10.131 6379
2115:X 07 Mar 22:38:44.818 * +sentinel sentinel 192.168.10.129:26379 192.168.10.129 26379 @ master-6379 192.168.10.131 6379
2115:X 07 Mar 22:38:44.851 * +sentinel sentinel 192.168.10.128:26379 192.168.10.128 26379 @ master-6379 192.168.10.131 6379

可以看到Sentinel整个集群都开始工作了,我们可以随便登录一台Sentinel看下现在监视的状态:

[root@Sentinel_1 sentinel]# redis-cli -p 26379
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=master-6379,status=ok,address=192.168.10.131:6379,slaves=1,sentinels=3
127.0.0.1:26379> 

可以看到状态是status=ok,slaves=1有一个从节点。

 

五、Redis down机测试

测试一、停掉Redis_Master,看Sentinel会不会把存活的Slave节点提升为Master节点

[root@Redis_Master redis]# sh redis stop
Stopping ...
Waiting for Redis to shutdown ...
Redis stopped
[root@Redis_Master redis]# 

1、随便查看一台Sentinel的log,tail -f log/sentinel.log:

5153:X 07 Mar 22:48:20.986 # +sdown master master-6379 192.168.10.131 6379
5153:X 07 Mar 22:48:21.047 # +odown master master-6379 192.168.10.131 6379 #quorum 2/2
5153:X 07 Mar 22:48:21.049 # +new-epoch 1
5153:X 07 Mar 22:48:21.050 # +try-failover master master-6379 192.168.10.131 6379
5153:X 07 Mar 22:48:21.053 # +vote-for-leader 21e629e6d2b26682e660258787d5fb995010e6c8 1
5153:X 07 Mar 22:48:21.057 # 192.168.10.130:26379 voted for 7fbee9138d4e5c1e2def7bbc4f888cef04d95677 1
5153:X 07 Mar 22:48:21.062 # 192.168.10.129:26379 voted for 7fbee9138d4e5c1e2def7bbc4f888cef04d95677 1
5153:X 07 Mar 22:48:22.441 # +config-update-from sentinel 192.168.10.130:26379 192.168.10.130 26379 @ master-6379 192.168.10.131 6379
5153:X 07 Mar 22:48:22.442 # +switch-master master-6379 192.168.10.131 6379 192.168.10.132 6379
5153:X 07 Mar 22:48:22.443 * +slave slave 192.168.10.131:6379 192.168.10.131 6379 @ master-6379 192.168.10.132 6379
5153:X 07 Mar 22:48:37.496 # +sdown slave 192.168.10.131:6379 192.168.10.131 6379 @ master-6379 192.168.10.132 6379

2、再查看Redis_Slave的log:

2437:S 07 Mar 22:48:18.023 * Connecting to MASTER 192.168.10.131:6379
2437:S 07 Mar 22:48:18.026 * MASTER <-> SLAVE sync started
2437:S 07 Mar 22:48:18.029 # Error condition on socket for SYNC: Connection refused
2437:S 07 Mar 22:48:19.050 * Connecting to MASTER 192.168.10.131:6379
2437:S 07 Mar 22:48:19.053 * MASTER <-> SLAVE sync started
2437:S 07 Mar 22:48:19.055 # Error condition on socket for SYNC: Connection refused
2437:S 07 Mar 22:48:20.074 * Connecting to MASTER 192.168.10.131:6379
2437:S 07 Mar 22:48:20.077 * MASTER <-> SLAVE sync started
2437:S 07 Mar 22:48:20.079 # Error condition on socket for SYNC: Connection refused
2437:M 07 Mar 22:48:20.724 * Discarding previously cached master state.
2437:M 07 Mar 22:48:20.725 * MASTER MODE enabled (user request from \'id=7 addr=192.168.10.130:60991 fd=11 name=sentinel-7fbee913-cmd age=577 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec\')
2437:M 07 Mar 22:48:20.745 # CONFIG REWRITE executed with success.
2437:M 07 Mar 22:48:20.796 * 1 changes in 900 seconds. Saving...
2437:M 07 Mar 22:48:20.870 * Background saving started by pid 2442
2442:C 07 Mar 22:48:20.915 * DB saved on disk
2442:C 07 Mar 22:48:20.915 * RDB: 4 MB of memory used by copy-on-write
2437:M 07 Mar 22:48:20.974 * Background saving terminated with success

3、现在再登录Sentinel查看现在的主节点是谁:

[root@Sentinel_1 sentinel]# redis-cli -p 26379       
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=master-6379,status=ok,address=192.168.10.132:6379,slaves=1,sentinels=3
127.0.0.1:26379> 

可以看到,新的Master已经变成192.168.10.132了。切换后的邮件通知:

4、把down机的redis启动后,会自动添加为slave角色:

[root@Redis_Master redis]# sh redis start
Starting Redis server...
[root@Redis_Master redis]# tail -f logs/redis_6379.log 
  `-._    `-._`-.__.-\'_.-\'    _.-\'                                   
      `-._    `-.__.-\'    _.-\'                                       
          `-._        _.-\'                                           
              `-.__.-\'                                               

2050:M 07 Mar 22:55:21.357 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2050:M 07 Mar 

以上是关于Redis Sentinel高可用架构的主要内容,如果未能解决你的问题,请参考以下文章

redis高可用 哨兵(Sentinel),主从复制架构

Redis高可用架构之哨兵模式 - Sentinel

Redis技术探索「高可用架构模式」哨兵(sentinel)模式实现主从故障互切换模式详解

Redis 实战搭建高可用架构

Redis入门到高可用(十八)——Redis Sentinel

基于Sentinel的Redis高可用方案