Redis Sentinel高可用架构
Posted 陆炫志
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Redis Sentinel高可用架构相关的知识,希望对你有一定的参考价值。
Redis的高可用架构现在越来越多了,可以见得Redis的发展是有多么的迅速,现在不少公司都用上了Redis,所以Redis高可用也显得尤其重要,现在Redis的高可用架构有比如keepalived+redis,redis cluster,twemproxy,codis,下面我们主要针对Redis Sentinel高可用架构展开学习。
Redis Sentinel主要功能有以下几点:
-
不时地监控redis是否按照预期良好地运行;
-
如果发现某个redis节点运行出现状况,能够通知另外一个进程(例如它的客户端);
-
能够进行自动切换。当一个master节点不可用时,能够选举出master的多个slave(如果有超过一个slave的话)中的一个来作为新的master,其它的slave节点会将它所追随的master的地址改为被提升为master的slave的新地址。
Sentinel是一个监视器,它可以根据被监视实例的身份和状态来判断应该执行何种动作。Sentinel是如何发现其他Sentinel的呢?Sentinel会通过命令连接向被监视的主从服务器发送HELLO信息,该消息包含Sentinel的IP、端口号、ID等内容,以此来向其他Sentinel宣告自己的存在。与此同时,Sentinel会通过订阅连接接收其他Sentinel的HELLO信息,以此来发现监视同一个主服务器的其他Sentinel。
Sentinel之间会互相创建命令连接,用于进行通信。因为已经有主从服务器作发送和接收HELLO信息的中介,所以Sentinel之间不会创建订阅连接:
以下是Redis Sentinel的架构图,Sentinel节点数最好是单数,至于为什么,请参考以下的资料:
http://segmentfault.com/a/1190000002680804
http://segmentfault.com/a/1190000002685515
下面进行Redis Sentinel的部署和测试,本次实验的版本是redis-3.0.7版本,环境说明:
192.168.10.128 Sentinel_1 192.168.10.129 Sentinel_2 192.168.10.130 Sentinel_3 192.168.10.131 Redis_Master 192.168.10.132 Redis_Slave
一、 在五台服务器上分别执行下redis-3.0.7的安装,以Sentinel_1服务为例:
[root@Sentinel_1 ~]# wget http://download.redis.io/releases/redis-3.0.7.tar.gz [root@Sentinel_1 ~]# tar xf redis-3.0.7.tar.gz [root@Sentinel_1 ~]# cd redis-3.0.7/src/ [root@Sentinel_1 ~]# make PREFIX=/data/service/redis install
安装完成后,会在/data/service/redis下会产生一个bin目录:
[root@Sentinel_1 ~]# ll /data/service/redis/ total 12 drwxr-xr-x. 2 root root 4096 Mar 7 19:19 bin [root@Sentinel_1 ~]#
分别在五台服务器上添加redis的bin目录的环境变量(不是必需的),方便命令的使用,编辑vim /etc/profile.d/redis.sh 添加以下内容:
export PATH=/data/service/redis/bin:$PATH
执行source /etc/profile.d/redis.sh 让环境变量生效:
[root@Sentinel_1 ~]# source /etc/profile.d/redis.sh
二、配置Redis主从环境,主从环境的部署很简单,这里不演示搭建过程,Redis_Master: 192.168.10.131 Redis_Slave: 192.168.10.132
Redis_Master启动的Log:
[root@Redis_Master redis]# tail -f logs/redis_6379.log 1974:M 07 Mar 22:03:05.381 * DB loaded from disk: 0.001 seconds 1974:M 07 Mar 22:03:05.381 * The server is now ready to accept connections on port 6379 1974:M 07 Mar 22:03:44.592 * Slave 192.168.10.132:6379 asks for synchronization 1974:M 07 Mar 22:03:44.593 * Full resync requested by slave 192.168.10.132:6379 1974:M 07 Mar 22:03:44.593 * Starting BGSAVE for SYNC with target: disk 1974:M 07 Mar 22:03:44.594 * Background saving started by pid 1977 1977:C 07 Mar 22:03:44.632 * DB saved on disk 1977:C 07 Mar 22:03:44.632 * RDB: 4 MB of memory used by copy-on-write 1974:M 07 Mar 22:03:44.649 * Background saving terminated with success 1974:M 07 Mar 22:03:44.650 * Synchronization with slave 192.168.10.132:6379 succeeded
在Redis_Slave启动的Log:
[root@Redis_Slave redis]# tail -f logs/redis_6379.log 2437:S 07 Mar 22:03:44.246 * Connecting to MASTER 192.168.10.131:6379 2437:S 07 Mar 22:03:44.247 * MASTER <-> SLAVE sync started 2437:S 07 Mar 22:03:44.262 * Non blocking connect for SYNC fired the event. 2437:S 07 Mar 22:03:44.268 * Master replied to PING, replication can continue... 2437:S 07 Mar 22:03:44.269 * Partial resynchronization not possible (no cached master) 2437:S 07 Mar 22:03:44.270 * Full resync from master: 5d1fbf46ddd1eb0a7728abbbad61e78908dd7963:1 2437:S 07 Mar 22:03:44.326 * MASTER <-> SLAVE sync: receiving 34 bytes from master 2437:S 07 Mar 22:03:44.326 * MASTER <-> SLAVE sync: Flushing old data 2437:S 07 Mar 22:03:44.328 * MASTER <-> SLAVE sync: Loading DB in memory 2437:S 07 Mar 22:03:44.329 * MASTER <-> SLAVE sync: Finished with success
可以看到主从环境是正常的!
三、进行Sentinel配置,及配置文件的解释。
在三台Sentinel服务器下创建conf目录和log目录,存放配置文件和log:
[root@Sentinel_1 ~]# mkdir -p /data/service/redis/sentinel/conf
[root@Sentinel_1 ~]# mkdir -p /data/service/redis/sentinel/log
进到conf目录,编辑文件26379.conf,三台Sentinel服务器,配置都一样:
[root@Sentinel_1 conf]# pwd /data/service/redis/sentinel/conf [root@Sentinel_1 conf]# cat 26379.conf port 26379 dir "/data/service/redis/sentinel" daemonize yes logfile "/data/service/redis/sentinel/log/sentinel.log" # 6379 sentinel monitor master-6379 192.168.10.131 6379 2 sentinel down-after-milliseconds master-6379 15000 sentinel parallel-syncs master-6379 1 sentinel failover-timeout master-6379 180000 sentinel auth-pass master-6379 123456 sentinel client-reconfig-script master-6379 /data/script/python/notify.py [root@Sentinel_1 conf]#
26379.conf配置文件解释:
1、前4行是定义sentinel的一些基本信息,跟redis很类似,不作过多解释。
2、sentinel monitor master-6379 192.168.10.131 6379 2(这一行代表sentinel监控的master的名字叫做master-6379,地址为192.168.10.131:6379,这个2代表,当集群中有2个sentinel认为master死了时,才能真正认为该master已经不可用了)
3、down-after-milliseconds (sentinel会向master发送心跳PING来确认master是否存活,如果master在“一定时间范围”内不回应PONG 或者是回复了一个错误消息,那么这个sentinel会主观地(单方面地)认为这个master已经不可用,而这个down-after-milliseconds就是用来指定这个“一定时间范围”的,单位是毫秒。)
4、parallel-syncs(在发生failover主备切换时,这个选项指定了最多可以有多少个slave同时对新的master进行同步,这个数字越小,完成failover所需的时间就越长,但是如果这个数字越大,就意味着越多的slave因为replication而不可用。可以通过将这个值设为 1 来保证每次只有一个slave处于不能处理命令请求的状态)
5、failover-timeout(sentinel集群都遵守一个规则:如果sentinel A推荐sentinel B去执行failover,B会等待一段时间后,自行再次去对同一个master执行failover,这个等待的时间是通过failover-timeout
配置项去配置的。从这个规则可以看出,sentinel集群中的sentinel不会再同一时刻并发去failover同一个master,第一个进行failover的sentinel如果失败了,另外一个将会在一定时间内进行重新进行failover,以此类推)
6、auth-pass(这选项主要针对redis master/slave架构设置了密码认证,如果配置主从时没有设定密码,就不需要些选项,若有密码,这里要指定连接的密码)
7、client-reconfig-script (该参数是定义故障转移脚本,当master故障转移后,执行发短信或者IP切换等)
故障转移后发邮件的notify.py脚本是参考了大神的博客:http://www.cnblogs.com/gomysql/p/5040847.html
#!/usr/bin/python #coding:utf8 import sys import time import smtplib import logging from email.mime.text import MIMEText from email.message import Message from email.header import Header alarm_mail =[\'1111111111@qq.com\'] def main(): failover_time=time.strftime("%Y-%m-%d %H:%M:%S") logging.basicConfig(level=logging.DEBUG, format=\'%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s\', datefmt=\'%Y-%m-%d %H:%M:%S\', filename=\'/data/service/redis/failover.log\', filemode=\'a\') console = logging.StreamHandler() console.setLevel(logging.INFO) formatter = logging.Formatter(\'%(name)-12s: %(levelname)-8s %(message)s\') console.setFormatter(formatter) logging.getLogger(\'\').addHandler(console) mail_host=\'smtp.163.com\' mail_port=25 mail_user=\'\' mail_pass=\'\' mail_send_from = \'\' def send_mail(to_list,sub,content): me=mail_send_from msg = MIMEText(content, _subtype=\'html\', _charset=\'utf-8\') msg[\'Subject\'] = Header(sub,\'utf-8\') msg[\'From\'] = Header(me,\'utf-8\') msg[\'To\'] = ";".join(to_list) try: smtp = smtplib.SMTP() smtp.connect(mail_host,mail_port) smtp.login(mail_user,mail_pass) smtp.sendmail(me,to_list, msg.as_string()) smtp.close() return True except Exception as error: logging.error("邮件发送失败: %s" % (error)) return False try: master_name = sys.argv[1] role = sys.argv[2] from_ip = sys.argv[4] from_port = sys.argv[5] to_ip = sys.argv[6] to_port = sys.argv[7] except Exception as error: logging.error(\'从 Sentinel 获取参数错误: %s \' % (error)) sys.exit(1) sub=\'redis %s faiover\' % (master_name) nodify_message = "%s %s is failover end. sentinel find redis master %s:%s is down. failover to slave %s:%s" % (failover_time,master_name,from_ip,from_port,to_ip,to_port) if role == \'leader\': logging.info(nodify_message) send_mail(alarm_mail,sub,nodify_message) if __name__ == "__main__": main()
四、下面启动Sentinel服务,启动方式有两种:
方式一:
redis-sentinel /path/to/sentinel.conf
方式二:
redis-server /path/to/sentinel.conf --sentinel
我习惯用第一种方法,分别在三台Sentinel服务器进行启动:
第一台Sentinel_1启动log:
[root@Sentinel_1 sentinel]# redis-sentinel /data/service/redis/sentinel/conf/26379.conf [root@Sentinel_1 sentinel]# tail -f log/sentinel.log | `-._`-._ _.-\'_.-\' | `-._ `-._`-.__.-\'_.-\' _.-\' `-._ `-.__.-\' _.-\' `-._ _.-\' `-.__.-\' 5153:X 07 Mar 22:37:16.290 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 5153:X 07 Mar 22:37:16.290 # Sentinel runid is 21e629e6d2b26682e660258787d5fb995010e6c8 5153:X 07 Mar 22:37:16.290 # +monitor master master-6379 192.168.10.131 6379 quorum 2 5153:X 07 Mar 22:37:17.330 * +slave slave 192.168.10.132:6379 192.168.10.132 6379 @ master-6379 192.168.10.131 6379 5153:X 07 Mar 22:38:29.406 * +sentinel sentinel 192.168.10.129:26379 192.168.10.129 26379 @ master-6379 192.168.10.131 6379 5153:X 07 Mar 22:38:45.024 * +sentinel sentinel 192.168.10.130:26379 192.168.10.130 26379 @ master-6379 192.168.10.131 6379
第二台Sentinel_2启动log:
[root@Sentinel_2 sentinel]# redis-sentinel /data/service/redis/sentinel/conf/26379.conf [root@Sentinel_2 sentinel]# tail -f log/sentinel.log `-._ `-._`-.__.-\'_.-\' _.-\' `-._ `-.__.-\' _.-\' `-._ _.-\' `-.__.-\' 4647:X 07 Mar 22:38:27.570 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 4647:X 07 Mar 22:38:27.570 # Sentinel runid is f391228f430177d881464e908c683bfc73d61c24 4647:X 07 Mar 22:38:27.571 # +monitor master master-6379 192.168.10.131 6379 quorum 2 4647:X 07 Mar 22:38:28.582 * +slave slave 192.168.10.132:6379 192.168.10.132 6379 @ master-6379 192.168.10.131 6379 4647:X 07 Mar 22:38:29.218 * +sentinel sentinel 192.168.10.128:26379 192.168.10.128 26379 @ master-6379 192.168.10.131 6379 4647:X 07 Mar 22:38:45.200 * +sentinel sentinel 192.168.10.130:26379 192.168.10.130 26379 @ master-6379 192.168.10.131 6379
第三台Sentinel_3启动log:
[root@Sentinel_3 sentinel]# redis-sentinel /data/service/redis/sentinel/conf/26379.conf [root@Sentinel_3 sentinel]# tail -f log/sentinel.log `-._ `-.__.-\' _.-\' `-._ _.-\' `-.__.-\' 2115:X 07 Mar 22:38:43.161 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 2115:X 07 Mar 22:38:43.161 # Sentinel runid is 7fbee9138d4e5c1e2def7bbc4f888cef04d95677 2115:X 07 Mar 22:38:43.161 # +monitor master master-6379 192.168.10.131 6379 quorum 2 2115:X 07 Mar 22:38:44.167 * +slave slave 192.168.10.132:6379 192.168.10.132 6379 @ master-6379 192.168.10.131 6379 2115:X 07 Mar 22:38:44.818 * +sentinel sentinel 192.168.10.129:26379 192.168.10.129 26379 @ master-6379 192.168.10.131 6379 2115:X 07 Mar 22:38:44.851 * +sentinel sentinel 192.168.10.128:26379 192.168.10.128 26379 @ master-6379 192.168.10.131 6379
可以看到Sentinel整个集群都开始工作了,我们可以随便登录一台Sentinel看下现在监视的状态:
[root@Sentinel_1 sentinel]# redis-cli -p 26379 127.0.0.1:26379> INFO sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 master0:name=master-6379,status=ok,address=192.168.10.131:6379,slaves=1,sentinels=3 127.0.0.1:26379>
可以看到状态是status=ok,slaves=1有一个从节点。
五、Redis down机测试
测试一、停掉Redis_Master,看Sentinel会不会把存活的Slave节点提升为Master节点
[root@Redis_Master redis]# sh redis stop Stopping ... Waiting for Redis to shutdown ... Redis stopped [root@Redis_Master redis]#
1、随便查看一台Sentinel的log,tail -f log/sentinel.log:
5153:X 07 Mar 22:48:20.986 # +sdown master master-6379 192.168.10.131 6379 5153:X 07 Mar 22:48:21.047 # +odown master master-6379 192.168.10.131 6379 #quorum 2/2 5153:X 07 Mar 22:48:21.049 # +new-epoch 1 5153:X 07 Mar 22:48:21.050 # +try-failover master master-6379 192.168.10.131 6379 5153:X 07 Mar 22:48:21.053 # +vote-for-leader 21e629e6d2b26682e660258787d5fb995010e6c8 1 5153:X 07 Mar 22:48:21.057 # 192.168.10.130:26379 voted for 7fbee9138d4e5c1e2def7bbc4f888cef04d95677 1 5153:X 07 Mar 22:48:21.062 # 192.168.10.129:26379 voted for 7fbee9138d4e5c1e2def7bbc4f888cef04d95677 1 5153:X 07 Mar 22:48:22.441 # +config-update-from sentinel 192.168.10.130:26379 192.168.10.130 26379 @ master-6379 192.168.10.131 6379 5153:X 07 Mar 22:48:22.442 # +switch-master master-6379 192.168.10.131 6379 192.168.10.132 6379 5153:X 07 Mar 22:48:22.443 * +slave slave 192.168.10.131:6379 192.168.10.131 6379 @ master-6379 192.168.10.132 6379 5153:X 07 Mar 22:48:37.496 # +sdown slave 192.168.10.131:6379 192.168.10.131 6379 @ master-6379 192.168.10.132 6379
2、再查看Redis_Slave的log:
2437:S 07 Mar 22:48:18.023 * Connecting to MASTER 192.168.10.131:6379 2437:S 07 Mar 22:48:18.026 * MASTER <-> SLAVE sync started 2437:S 07 Mar 22:48:18.029 # Error condition on socket for SYNC: Connection refused 2437:S 07 Mar 22:48:19.050 * Connecting to MASTER 192.168.10.131:6379 2437:S 07 Mar 22:48:19.053 * MASTER <-> SLAVE sync started 2437:S 07 Mar 22:48:19.055 # Error condition on socket for SYNC: Connection refused 2437:S 07 Mar 22:48:20.074 * Connecting to MASTER 192.168.10.131:6379 2437:S 07 Mar 22:48:20.077 * MASTER <-> SLAVE sync started 2437:S 07 Mar 22:48:20.079 # Error condition on socket for SYNC: Connection refused 2437:M 07 Mar 22:48:20.724 * Discarding previously cached master state. 2437:M 07 Mar 22:48:20.725 * MASTER MODE enabled (user request from \'id=7 addr=192.168.10.130:60991 fd=11 name=sentinel-7fbee913-cmd age=577 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec\') 2437:M 07 Mar 22:48:20.745 # CONFIG REWRITE executed with success. 2437:M 07 Mar 22:48:20.796 * 1 changes in 900 seconds. Saving... 2437:M 07 Mar 22:48:20.870 * Background saving started by pid 2442 2442:C 07 Mar 22:48:20.915 * DB saved on disk 2442:C 07 Mar 22:48:20.915 * RDB: 4 MB of memory used by copy-on-write 2437:M 07 Mar 22:48:20.974 * Background saving terminated with success
3、现在再登录Sentinel查看现在的主节点是谁:
[root@Sentinel_1 sentinel]# redis-cli -p 26379 127.0.0.1:26379> INFO sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 master0:name=master-6379,status=ok,address=192.168.10.132:6379,slaves=1,sentinels=3 127.0.0.1:26379>
可以看到,新的Master已经变成192.168.10.132了。切换后的邮件通知:
4、把down机的redis启动后,会自动添加为slave角色:
[root@Redis_Master redis]# sh redis start Starting Redis server... [root@Redis_Master redis]# tail -f logs/redis_6379.log `-._ `-._`-.__.-\'_.-\' _.-\' `-._ `-.__.-\' _.-\' `-._ _.-\' `-.__.-\' 2050:M 07 Mar 22:55:21.357 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 2050:M 07 Mar以上是关于Redis Sentinel高可用架构的主要内容,如果未能解决你的问题,请参考以下文章
Redis技术探索「高可用架构模式」哨兵(sentinel)模式实现主从故障互切换模式详解