4keepalived高可用nginx负载均衡

Posted Hi

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了4keepalived高可用nginx负载均衡相关的知识,希望对你有一定的参考价值。

keepalived:

HTTP_GET        //使用keepalived获取后端real server健康状态检测

SSL_GET(https)  //这里以为这后端使用的是http协议

TCP_CHECK       

下面演示基于TCP_CHECK做检测

# man keepalived    //查看TCP_CHECK配置段

 

# TCP healthchecker
TCP_CHECK
{
# ======== generic connection options
# Optional IP address to connect to.
# The default is the realserver IP     //默认使用real server的IP
connect_ip <IP ADDRESS>     //可省略
# Optional port to connect to
# The default is the realserver port
connect_port <PORT>         //可省略
# Optional interface to use to
# originate the connection
bindto <IP ADDRESS>
# Optional source port to
# originate the connection from
bind_port <PORT>
# Optional connection timeout in seconds.
# The default is 5 seconds
connect_timeout <INTEGER>
# Optional fwmark to mark all outgoing
# checker packets with
fwmark <INTEGER>

 

# Optional random delay to start the initial check
# for maximum N seconds.
# Useful to scatter multiple simultaneous
# checks to the same RS. Enabled by default, with
# the maximum at delay_loop. Specify 0 to disable
warmup <INT>
# Retry count to make additional checks if check
# of an alive server fails. Default: 1
retry <INT>
# Delay in seconds before retrying. Default: 1
delay_before_retry <INT>
} #TCP_CHECK

 

# cd /etc/keepalived

# vim keepalived.conf   //两台keepalived都要设置

 

 1 virtual_server 192.168.184.150 80 {    //这里可以合并
 2     delay_loop 6
 3     lb_algo wrr 
 4     lb_kind DR
 5     net_mask 255.255.0.0
 6     protocol TCP 
 7     sorry_server 127.0.0.1 80
 8 
 9     real_server 192.168.184.143 80 {
10         weight 1
11         TCP_CHECK {
12             connect_timeout 3
13         }   
14     }   
15 
16     real_server 192.168.184.144 80 {
17         weight 2
18         TCP_CHECK {
19             connect_timeout 3
20         }   
21     }   
22 }

systemctl restart keepalived

# systemctl status keepalived

 1 ● keepalived.service - LVS and VRRP High Availability Monitor
 2    Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
 3    Active: active (running) since Thu 2018-12-13 23:11:06 CST; 1min 32s ago
 4   Process: 6233 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 5  Main PID: 6234 (keepalived)
 6    CGroup: /system.slice/keepalived.service
 7            ├─6234 /usr/sbin/keepalived -D
 8            ├─6235 /usr/sbin/keepalived -D
 9            └─6236 /usr/sbin/keepalived -D
10 
11 Dec 13 23:11:11 node1 Keepalived_healthcheckers[6235]: Check on service [192.168.184.144]:80 failed after 1 retry.
12 Dec 13 23:11:11 node1 Keepalived_healthcheckers[6235]: Removing service [192.168.184.144]:80 from VS [192.168.184.150]:80
13 Dec 13 23:11:11 node1 Keepalived_healthcheckers[6235]: Remote SMTP server [127.0.0.1]:25 connected.
14 Dec 13 23:11:11 node1 Keepalived_healthcheckers[6235]: SMTP alert successfully sent.
15 Dec 13 23:11:14 node1 Keepalived_vrrp[6236]: Sending gratuitous ARP on eth0 for 192.168.184.150
16 Dec 13 23:11:14 node1 Keepalived_vrrp[6236]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.184.150
17 Dec 13 23:11:14 node1 Keepalived_vrrp[6236]: Sending gratuitous ARP on eth0 for 192.168.184.150
18 Dec 13 23:11:14 node1 Keepalived_vrrp[6236]: Sending gratuitous ARP on eth0 for 192.168.184.150
19 Dec 13 23:11:14 node1 Keepalived_vrrp[6236]: Sending gratuitous ARP on eth0 for 192.168.184.150
20 Dec 13 23:11:14 node1 Keepalived_vrrp[6236]: Sending gratuitous ARP on eth0 for 192.168.184.150    //发送广播地址已经添加
21 You have new mail in /var/spool/mail/root

 

 

 

 

 

 

 

 

示例:

HTTP_GET {

url {

path /

status_code 200

}

connect_timeout 3

nb_get_retry 3

delay_before_retry 3

}

 

TCP_CHECK {

connect_timeout 3

}

 

HA Services:

nginx

 

 

100: -25

96: -20 79 --> 99 --> 79

 

博客作业:

keepalived 高可用 ipvs

nginx

 

active/active

 

Linux HA Cluster

 

LB, HA, HP, hadoop

LB:

传输层:lvs

应用层:nginx, haproxy, httpd, perlbal, ats, varnish

HA:

vrrp: keepalived

AIS: heartbeat, OpenAIS, corosync/pacemaker, cman/rgmanager(conga) RHCS

 

HA:

故障场景:

硬件故障:

设计缺陷

使用过久自然损坏

人为故障

…… ……

软件故障

设计缺陷

bug

人为误操作

……

 

A=MTBF/(MTBF+MTTR)

MTBF: Mean Time Between Failure

MTTR: Mean Time To Repair

 

0<A<1: 百分比

90%, 95%, 99%

99.9%, 99.99%, 99.999%

 

提供冗余:

 

network partition: vote system

隔离:

STONITH:shoot the other node on the head  节点级别隔离

Fence: 资源级别的隔离

 

failover domain:

fda: node1, node5

fdb: node2, node5

fdc: node3, node5

fdd: node4, node5

 

资源的约束性:

位置约束:资源对节点的倾向性;

排列约束:资源彼此间是否能运行于同一节点的倾向性;

顺序约束:多个资源启动顺序依赖关系;

 

vote system:

少数服从多数:quorum

> total/2

with quorum: 拥有法定票数

without quorum: 不拥有法定票数

 

两个节点(偶数个节点):

Ping node

qdisk

 

 

failover

failback

 

Messaging Layer:

heartbeat

v1

v2

v3

corosync

cman

 

Cluster Resource Manager(CRM):

heartbeat v1 haresources (配置接口:配置文件haresources)

heartbeat v2 crm (在每个节点运行一个crmd(5560/tcp)守护进程,有命令行接口crmsh; GUI: hb_gui)

heartbeat v3, pacemaker (配置接口:crmsh, pcs; GUI: hawk(suse), LCMC, pacemaker-gui)

rgmanager (配置接口:cluster.conf, system-config-cluster, conga(webgui), cman_tool, clustat)

 

组合方式:

heartbeat v1 (haresources)

heartbeat v2 (crm)

heartbeat v3 + pacemaker

corosync + pacemaker

corosync v1 + pacemaker (plugin)

corosync v2 + pacemaker (standalone service)

 

cman + rgmanager

corosync v1 + cman + pacemaker

 

RHCS: Red Hat Cluster Suite

RHEL5: cman + rgmanager + conga (ricci/luci)

RHEL6: cman + rgmanager + conga (ricci/luci)

corosync + pacemaker

corosync + cman + pacemaker

RHEL7: corosync + pacemaker

 

Resource Agent:

service: /etc/ha.d/haresources.d/目录下的脚本;

LSB: /etc/rc.d/init.d/目录下的脚本;

OCF:Open Cluster Framework

provider:

STONITH:

Systemd:

 

资源类型:

primitive:主资源,原始资源;在集群中只能运行一个实例;

clone:克隆资源,在集群中可运行多个实例;

匿名克隆、全局惟一克隆、状态克隆(主动、被动)

multi-state(master/slave):克隆资源的特殊实现;多状态资源;

group: 组资源;

启动或停止;

资源监视

相关性:

 

资源属性:

priority: 优先级;

target-role:started, stopped, master;

is-managed: 是否允许集群管理此资源;

resource-stickiness: 资源粘性;

allow-migrate: 是否允许迁移;

 

约束:score

位置约束:资源对节点的倾向性;

(-oo, +oo):

任何值+无穷大=无穷大

任何值+负无穷大=负无穷大

无穷大+负无穷大=负无穷大

排列约束:资源彼此间是否能运行于同一节点的倾向性;

(-oo, +oo)

顺序约束:多个资源启动顺序依赖关系;

(-oo, +oo)

Mandatory

 

安装配置:

CentOS 7: corosync v2 + pacemaker

corosync v2: vote system

pacemaker: 独立服务

 

集群的全生命周期管理工具:

pcs: agent(pcsd)

crmsh: agentless (pssh)

 

配置集群的前提:

(1) 时间同步;

(2) 基于当前正在使用的主机名互相访问;

(3) 是否会用到仲裁设备;

 

web serivce:

vip: 172.16.100.91

httpd

 

回顾:AIS HA

Messaging Layer:

heartbeat v1, v2, v3

corosync v1, v2(votequorum)

OpenAIS

CRM:

pacemaker

配置接口:crmsh (agentless), pssh

pcs (agent), pcsd

conga(ricci/luci)

 

group, constraint

 

rgmanager(cman)

resource group:

failover domain

 

配置:

全局属性:property, stonith-enable等等;

高可用服务:资源,通过RA

 

RA:

LSB: /etc/rc.d/init.d/

systemd:/etc/systemd/system/multi-user.wants

处于enable状态的服务;

OCF: [provider]

heartbeat

pacemaker

linbit

service

stonith

 

高可用集群的可用方案:

heartbeat v1

heartbeat v2

heartbeat v3 + pacemaker X

corosync + pacemaker

cman + rgmanager

corosync + cman + pacemaker

 

corosync + pacemaker

keepalived

 

HA Cluster(2)

 

Heartbeat信息传递:

Unicast, udpu

Mutlicast, udp

Broadcast

 

组播地址:用于标识一个IP组播域;IANA把D类地址留给组播使用:224.0.0.0-239.255.255.255

永久组播地址:224.0.0.0-224.0.0.255

临时组播地址:224.0.1.0-238.255.255.255

本地组播地址:239.0.0.0-239.255.255.255

 

示例配置文件:

 

totem {

version: 2

 

crypto_cipher: aes128

crypto_hash: sha1

secauth: on

 

interface {

ringnumber: 0

bindnetaddr: 172.16.0.0

mcastaddr: 239.185.1.31

mcastport: 5405

ttl: 1

}

}

 

nodelist {

node {

ring0_addr: 172.16.100.67

nodeid: 1

}

node {

ring0_addr: 172.16.100.68

nodeid: 2

}

node {

ring0_addr: 172.16.100.69

nodeid: 3

}

}

 

logging {

fileline: off

to_stderr: no

to_logfile: yes

logfile: /var/log/cluster/corosync.log

to_syslog: no

debug: off

timestamp: on

logger_subsys {

subsys: QUORUM

debug: off

}

}

 

quorum {

provider: corosync_votequorum

}

 

HA Web Service:

vip: 172.16.100.92, ocf:heartbeat:IPaddr

httpd: systemd

nfs shared storage: ocf:heartbeat:Filesystem

 

HA Cluster工作模型:

A/P:两节点集群; active/passive;

without-quorum-policy={stop|ignore|suicide|freeze}

A/A:双主模型

N-M: N个节点,M个服务,N>M;

N-N: N个节点,N个服务;

 

network partition:

brain-split:块级别的共享存储时,非常危险;

vote quorum:

with quorum > total/2

without quorum <= total/2

stop

ignore

suicide

freeze

 

CAP:

C: consistency

A: availiability

P: partition tolerance

 

webip, webstore, webserver

node1: 100 + 0 + 0

node2: 0 + 0 + 0

node3: 0 + 0 + 0

 

node2: 50+50+50

 

A --> B --> C

C --> B --> A

 

pcs:

cluster

auth

setup

resource

describe

list

create

delete

constraint

colocation

order

location

property

list

set

status

config

 

博客作业:

(1) 手动配置,多播:corosync+pacemaker+crmsh, 配置高可用的mysql集群,datadir指向的路径为nfs导出路径;

(2) pcs/pcsd,单播:corosync+pacemaker, 配置高可用的web集群;

 

单播配置示例:

某些环境中可能不支持组播。这时应该配置 Corosync 使用单播,下面是使用单播的 Corosync 配置文件的一部分:

 

totem {

#...

interface {

ringnumber: 0

bindnetaddr: 192.168.42.0

broadcast: yes

mcastport: 5405

}

interface {

ringnumber: 1

bindnetaddr: 10.0.42.0

broadcast: yes

mcastport: 5405

}

transport: udpu

}

 

nodelist {

node {

ring0_addr: 192.168.42.1

ring1_addr: 10.0.42.1

nodeid: 1

}

node {

ring0_addr: 192.168.42.2

ring1_addr: 10.0.42.2

nodeid: 2

}

}

 

如果将 broadcast 设置为 yes ,集群心跳将通过广播实现。设置该参数时,不能设置 mcastaddr 。

 

transport 配置项决定集群通信方式。要完全禁用组播,应该配置单播传输参数 udpu 。这要求将所有的节点服务器信息写入 nodelist ,也就是需要在配署 HA 集群之前确定节点组成。配认配置是 udp 。通信方式类型还支持 udpu 和 iba 。

 

在 nodelist 之下可以为某一节点设置只与该节点相关的信息,这些设置项只能包含在 node 之中,即只能对属于集群的节点服务器进行设置,而且只应包括那些与默认设置不同的参数。每台服务器都必须配置 ring0_addr 。

以上是关于4keepalived高可用nginx负载均衡的主要内容,如果未能解决你的问题,请参考以下文章

高可用架构用Nginx实现负载均衡

nginx+keepalive实现负载均衡高可用

Nginx负载均衡高可用

Nginx反向代理负载均衡, keepalived高可用

nginx高可用负载均衡配置

Keepalived+Nginx实现负载均衡高可用