高可用集群之Corosync+Pacemaker及用CRM命令和NFS-server构建一个HA高可用集群

Posted 2020-08-19

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了高可用集群之Corosync+Pacemaker及用CRM命令和NFS-server构建一个HA高可用集群相关的知识，希望对你有一定的参考价值。

红帽5.0使用的是OpenAIS作为内核中的信息通信API，然后借助CMAN作为Messager Layer,再使用ramanager作为CRM进行资源的管理

Corosync具有比heartbeat在设计上更好的信息通信机制

红帽6.0直接使用Corosync用来作为集群的Messager Layer

不同的公司的API机制所调用的库，函数类型，返回方式各不相同，这就必须使用一个标准，使不同公司的API保持最大的兼容

比如你买了华硕的主板使用其他公司的鼠标照样可以使用

应用接口规范（AIS）就是用来定义应用程序接口（API）的开放性规范的集合，这些应用程序作为中间件作为应用服务提供了一种开放，高移植性的程序接口，使用AIS的应用程序接口API，减少了应用程序的复杂性和开放时间

OpenAIS组件：CLM CKPT EVT LCK MSG......

OpenAIS的版本：Picacho Whitetank Wilson 其中Wilson是最新的

Corosync是OpenAIS发展到Wilson版本后独立出来的开放性集群引擎工程

OpenAIS从0.9开始分为wilson和Corosync

Corosync本身只是一个集群引擎，用来处理集群的事物信息传递，也就是用来作为Mssager Layer，而Corosync并不具备集群资源的管理功能，其CRM必须有pacemaker扮演提供资源管理pacemaker是由heartbeat V3独立出去的项目,并且Pacemaker独立后的开发着重点也是Corosync而不是heartbeat V3

Corosync可以完全使用命令来进行集群资源的配置，但也有许多图形化工具

corosync是高可用集群的底层信息传递层，主要负责与上层交互并完成心跳和上层所要发送的事务信息。还有，为了防止发生Split brain以后所带来的问题，还有法定票数（quorum）这一概念。这里所要安装的是1.4版本的，负责集群票数的统计，每个节点一张票，到了2.*版本以后有了投票的功能，可以设定某节点可以持有多少张票。最后完成票数的统计并交于CRM层来决策节点集群是否还要运行。更多概念朋友们自己去查吧，我自己对这方面了解的也少。而且我打字真的很慢。
pacemaker是高可用集群中的CRM（Cluster Resource Manager)资源管理层，它是一个服务，可以做为一个单独的服务启动，不过在我们使用corosync-1.4的版本中，可以设置为corosync来启动pacemaker.
pacemaker的配置接口可以在任意节点上安装crmsh或者pcs还有一些GUI界面的软件来完成。crmsh好像在RrdHat6.4以后都不是官方自带的了，官方的是pcs。而crmsh好像是OpenSUSE所开发的。

Corosync的官网www.corosync.org

OPenAIS的官网www.openais.org

Pacemaker官网www.clusterlabs.org

所以集群的Messager Layer与CRM 组合如下：

1 haresource + heartbeat v1/v2

2 crm + heartbeat v2

3 pacemaker + corosync

4 pacemaker + heartbeat v3

5 cman + ragmanager

今天将使用Pacemaker + Corosync用来定义并管理一个集群服务

可以用rpm装也可以进行源码编译,也可以用yum直接装

________________________________________________________________________________________________________

192.168.139.2

[[email protected] ~]# ntpdate cn.ntp.org.cn \\ntp同步时间，我找的是中国区的一个全球ntp-server

[[email protected] .ssh]# ssh-keygen -t rsa -P ‘‘ //做ssh双机互信

[[email protected] .ssh]# ssh-copy-id -i ./id_rsa.pub [email protected]

[[email protected] html]# uname -n \\本节点名称

www.rs1.com

[[email protected] mysql]# yum install corosync pacemaker \\直接yum安装

________________________________________________________________________________________________________

192.168.139.4

[[email protected] ~]# ntpdate cn.ntp.org.cn

[[email protected] .ssh]# ssh-keygen -t rsa -P ‘‘

[[email protected] .ssh]# ssh-copy-id -i ./id_rsa.pub [email protected]

[[email protected] html]# uname -n

www.rs2.com

[[email protected] mysql]# yum install corosync pacemaker

Installed:

corosync.x86_64 0:1.4.7-5.el6 pacemaker.x86_64 0:1.1.14-8.el6_8.1

Dependency Installed:

clusterlib.x86_64 0:3.0.12.1-78.el6 corosynclib.x86_64 0:1.4.7-5.el6 libibverbs.x86_64 0:1.1.8-4.el6 libqb.x86_64 0:0.17.1-2.el6 librdmacm.x86_64 0:1.0.21-0.el6 lm_sensors-libs.x86_64 0:3.1.1-17.el6

net-snmp-libs.x86_64 1:5.5-57.el6_8.1 pacemaker-cli.x86_64 0:1.1.14-8.el6_8.1

pacemaker-cluster-libs.x86_64 0:1.1.14-8.el6_8.1

pacemaker-libs.x86_64 0:1.1.14-8.el6_8.1 pciutils.x86_64 0:3.1.10-4.el6 rdma.noarch 0:6.8_4.1-1.el6

[[email protected] mysql]# rpm -ql corosync

/etc/corosync //此目录下有Corosync的配置文件

/etc/corosync/corosync.conf.example //Corosync的配置文件样例

/usr/sbin/corosync-keygen //可以用此命令生成秘钥

[[email protected] mysql]# cd /etc/corosync

[[email protected] corosync]# ll

total 16

-rw-r--r--. 1 root root 2663 May 11 2016 corosync.conf.example

[[email protected] corosync]# cp corosync.conf.example corosync.conf

[[email protected] corosync]# vim corosync.conf

# Please read the corosync.conf.5 manual page

compatibility: whitetank

totem {

version: 2 //配置文件版本号

secauth: off //开启安全认证功能，安全的认证，当使用aisexec时，会非常消耗CPU

threads: 0 //线程数，根据CPU个数和核心数确定，secauth为off时无意义

interface {

ringnumber: 0 //冗余环号，防止多播环路定义每个节点的环号，每个节点 //一个网卡就不用指，默认为0

bindnetaddr: 192.168.139.0 //网卡的网络地址不是IP地址

mcastaddr: 239.255.1.1 //心跳信息传递的组播地址

mcastport: 5405 //组播使用的端口

ttl: 1 //

}

logging {

fileline: off //指定要打印的行

to_stderr: no //错误信息的是否发到标准错误前段，建议不开启

to_logfile: yes //定义是否记录到日志文件

logfile: /var/log/cluster/corosync.log //定义独立日志文件的位置，此目录要自己创 //建

to_syslog: no //定义是否记录到syslog，和to_logfile只启用一个即可

debug: off //是否开启debug功能

timestamp: on //是否打印时间戳，利于错误定位，但每次记录都要通过系统调用获取时 //间，消耗CPU

logger_subsys {

subsys: AMF //是否记录AMF子系统的信息，没有启用OpenAIS,则不用启用

debug: off

}

amf {

mode: disabled //与编程相关的，可以不设置

}

server {

ver: 0

name: pacemaker //启动pacemaker

}

aisexec { //这项可以不用加

user: root

group: root

}

___________________________________________________________________________________________

[[email protected] ~]# corosync-keygen //生成通信密钥，并保存在/etc/corosync/authkey

Writing corosync key to /etc/corosync/authkey

[[email protected] cluster]# corosync-keygen //

由于要使用/dev/random生成随机数，因此如果新装的系统操作不多，如果没有足够的熵，可能会出现如下的提示.................... 一定要在本地乱敲键盘,ssh登录的好像没有用
Gathering 1024 bits for key from //random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 240).

[[email protected] ~]# cd /etc/corosync/

[[email protected] cluster]#scp /etc/corosync/corosync.conf 192.168.139.2:/etc/corosync/ //将文件复制到另一个节点

[[email protected] ~]# service corosync start //开启本节点的corosync

[[email protected] ~]# ssh 192.168.139.2 service corosync start //开启另一个节点的corosync

__________________________________________________________________________________________

//看启动中是否出现错误,网上搜了也不知道为啥，但我仍然顺利完成了整个实验，看来不是什么大错误

[[email protected] cluster]# grep ERROR: /var/log/cluster/corosync.log

Nov 11 15:05:10 www corosync[3470]: [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Nov 11 15:05:10 www corosync[3470]: [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of ‘Clusters from Scratch‘ (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

__________________________________________________________________________________________

[[email protected] ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log //查看corosync引擎是否启动正常

Nov 11 16:34:19 corosync [MAIN ] Corosync Cluster Engine (‘1.4.7‘): started and ready to provide service.

Nov 11 16:34:19 corosync [MAIN ] Successfully read main configuration file ‘/etc/corosync/corosync.conf‘.

Nov 11 16:34:19 [1908] www.rs2.com cib: info: retrieveCib:Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: /var/lib/pacemaker/cib/cib.xml.sig)

Nov 11 16:34:19 [1908] www.rs2.com cib: info: cib_file_write_with_digest:Reading cluster configuration file /var/lib/pacemaker/cib/cib.DU5D4x (digest: /var/lib/pacemaker/cib/cib.zBJmL2)

__________________________________________________________________________________________

[[email protected] ~]# grep TOTEM /var/log/cluster/corosync.log //查看初始化成员节点通知是否正常

Nov 11 16:34:07 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Nov 11 16:34:07 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Nov 11 16:34:08 corosync [TOTEM ] The network interface [192.168.139.4] is now up.

Nov 11 16:34:08 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

__________________________________________________________________________________________

[[email protected] ~]# grep error /var/log/cluster/corosync.log //看启动中是否出现错误.主要是没有 //配置STONISH设备，可以忽略的错误,最后用crm命令 prorerty stonith-enabled=false 便可禁用

Nov 11 16:34:32 [2174] www.rs2.com pengine: error: unpack_resources:Resource start-up disabled since no STONITH resources have been defined

Nov 11 16:34:32 [2174] www.rs2.com pengine: error: unpack_resources:Either configure some or disable STONITH with the stonith-enabled option

Nov 11 16:34:32 [2174] www.rs2.com pengine: error: unpack_resources:NOTE: Clusters with shared data need STONITH to ensure data integrity

___________________________________________________________________________________________

[[email protected] ~]# grep pcmk_startup /var/log/cluster/corosync.log //查看pacemaker是否正常 //启动

Nov 11 16:34:08 corosync [pcmk ] info: pcmk_startup: CRM: Initialized

Nov 11 16:34:08 corosync [pcmk ] Logging: Initialized pcmk_startup

Nov 11 16:34:08 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615

Nov 11 16:34:08 corosync [pcmk ] info: pcmk_startup: Service: 9

Nov 11 16:34:08 corosync [pcmk ] info: pcmk_startup: Local hostname:www.rs2.com

___________________________________________________________________________________________

[[email protected] ~]# crm_mon \\可以用来监控集群的当前状态

Last updated: Fri Nov 11 16:19:10 2016 Last change: Fri Nov 11 16:10:18 2016 by hacluster via crmd on www.rs2.com

Stack: classic openais (with plugin)

Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum

2 nodes and 0 resources configured, 2 expected votes

//两个节点，0个资源，但不知道为什么rs1 为UNCLEAN (offline)

Node www.rs1.com: UNCLEAN (offline)

Online: [ www.rs2.com ]

//将一切停掉，重新生成了一个corosync配置文件后再此启动又变好了

[[email protected] .ssh]# crm_mon

Last updated: Fri Oct 28 21:29:51 2016 Last change: Fri Nov 11 22:33:32 2016 by hacluster via crmd on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 0 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ] //两个节点正常

__________________________________________________________________________________________

用crm命令配置集群的资源

[[email protected] ~]# crm

-bash: crm: command not found

[[email protected] ~]# rpm -qa pacemaker //pacemaker为1.1.14

pacemaker-1.1.14-8.el6_8.1.x86_64

从pacemaker 1.1.8开始，crm发展成了一个独立项目，叫crmsh。也就是说，我们安装了pacemaker后，并没有crm这个命令，我们要实现对集群资源管理，还需要独立安装crmsh。crmsh的rpm安装可从如下地址下载：

http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/

crmsh依赖于许多包如：pssh，因此也需要通过上面地址下载pssh.rpm 上面链接还可以下载corosync和pacemaker但我用的是yum直接装的

http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/

https://build.opensuse.org/package/binary/network:ha-clustering:Stable/crmsh?arch=x86_64&filename=crmsh-2.3.2-1.1.noarch.rpm&repository=RedHat_RHEL-6

或者直接下载openSUSE的ha集群yum源直接安装

[[email protected] tool]# wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/network:ha-clustering:Stable.repo

就一个yum库：

[network_ha-clustering_Stable]
name=Stable High Availability/Clustering packages (CentOS_CentOS-6)
type=rpm-md
baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/
gpgcheck=1
gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6//repodata/repomd.xml.key
enabled=1

[[email protected] tool]# mv network\:ha-clustering\:Stable.repo /etc/yum.repos.d/

[[email protected] yum.repos.d]# ll //这是我主机上的所有yum源

total 52

-rw-r--r--. 1 root root CentOS-Base.repo

-rw-r--r--. 1 root root CentOS-Debuginfo.repo

-rw-r--r--. 1 root root 2015 CentOS-fasttrack.repo

-rw-r--r--. 1 root root 2015 CentOS-Media.repo

-rw-r--r--. 1 root root 2015 CentOS-Vault.repo

-rw-r--r--. 1 root root 2014 elrepo.repo

-rw-r--r--. 1 root root 2012 epel.repo

-rw-r--r--. 1 root roo 2012 epel-testing.repo

-rw-r--r--. 1 root root network:ha-clustering:Stable.repo

-rw-r--r--. 1 root root openSUSE-13.2-NonFree-Update.repo.back

-rw-r--r--. 1 root root openSUSE-Leap-42.1-Update.repo.bak

-rw-r--r--. 1 root root zxl.repo

[[email protected] tool]# yum install crmsh //直接yum安装

http://www.111cn.net/sys/linux/73074.htm 网上找到的很详细的一篇关于crm命令使用

[[email protected] tool]# crm

crm(live)# help //获取帮助

cib //cib管理模块

resource //资源管理模块

configure //crm配置，包括资源的粘性，资源的类型，资源的约束等

node //集群节点管理子命令

options //用户优先级

history //crm命令的历史

site //地理集群支持

ra //管理资源代理

status //查看集群的状态

help，？ //查看帮助

end.cd.up //返回上一级

quit,bye,exit //退出crm

crm(live)# cd resource

crm(live)resource# help

.........................

crm(live)resource# cd

crm(live)# configure //进入配置模式

crm(live)configure# show //查看集群的当前配置

node www.rs1.com

node www.rs2.com

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2

crm(live)configure# verify //查看配置语法，因为没有安装STONITH设备，所以报错

ERROR: error: unpack_resources:Resource start-up disabled since no STONITH resources have been defined

error: unpack_resources:Either configure some or disable STONITH with the stonith-enabled option

error: unpack_resources:NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

crm(live)configure# property stonith-enabled=false //禁用STONISH设备

crm(live)configure# show

node www.rs1.com

node www.rs2.com

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=flase

crm(live)configure# verify //继续检查，不再报错误

crm(live)configure# commit //提交让配置生效

crm(live)configure# cd

crm(live)# ra

crm(live)ra# help

Resource Agents (RA) lists and documentation

Commands:

classes //查看RA类型和提供商

info //查看RA的详细信息

list //查看某一个类别下某个提供商所提供的所有RA

providers //查看指定资源的提供商和类型

validate //

meta //显示一个RA的源信息

cd //返回上一层

help

quit

up //返回上一层

如何获取一个命令的详细信息？

crm(live)ra# help list //获取list命令的详细使用信息

List RA for a class (and provider)

List available resource agents for the given class. If the class

is ocf, supply a provider to get agents which are available

only from that provider.

Usage:

list <class> [<provider>]

Example:

list ocf pacemaker

crm(live)ra# classes //查看RA类型

lsb //lsb类别

ocf / heartbeat pacemaker //ocf 有两个提供商heartbeat和pacemaker

service

stonith //stonith类别

crm(live)ra# list ocf pacemaker //显示ocf类型下由pacemaker提供的所有RA

ClusterMon Dummy HealthCPU HealthSMART Stateful SysInfo SystemHealth controld ping pingd remote

crm(live)ra# list lsb //显示所有lsb类型所提供的所有RA

auditd blk-availability corosync corosync-notifyd crond halt heartbeat htcacheclean

crm(live)ra# help meta //meta用来显示一个RA的源信息

Usage:

info [<class>:[<provider>:]]<type> 哪一个类型：哪一个提供商：哪一个资源代理（RA）

info <type> <class> [<provider>] (obsolete)

如：

info apache

info ocf:pacemaker:Dummy //ocf类型：pacemaker所提供的：Dummy为资源代理

info stonith:ipmilan

info pengine

crm(live)ra# meta ocf:heartbeat:IPaddr //查看ocf类别由heartbeat提供资源代理微IPaddr的源信息

Parameters (*: required, []: default): //带*的为必须的，[ ]为默认的

ip* (string): IPv4 or IPv6 address //ip必须有

The IPv4 (dotted quad notation) or IPv6 address (colon hexadecimal notation)

example IPv4 "192.168.1.1".

example IPv6 "2001:db8:DC28:0:0:FC57:D4C8:1FFF".

nic (string): Network interface

......................

........................

Operations‘ defaults (advisory minimum) //对资源来说，建议的监控最小默认值

start timeout=20s //启动资源时最多等待20秒

stop timeout=20s //停止资源时最多等待20秒

status timeout=20s interval=10s

monitor timeout=20s interval=10s //每隔10秒检测一次，若梅检测到等待20秒，否则资源转移

如何得知一个RA是有谁提供的？

在ra子模式下用providers命令可以如？

crm(live)ra# providers IPaddr //查看IPaddr这个资源的提供商，有heartbeat提供

heartbeat

___________________________________________________________________________________________

配置资源

crm(live)ra# cd

crm(live)# configure

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.139.10 nic=eth0 cidr_netmask=24

primitive定义主资源 webip为资源名称 ocf资源类别：heartbeat为provider：IPaddr为RA

params指定参数 ip 192.168.139.10（必须有） nic=eth0 （默认就是eth0）cidr_netmask=24 （掩码24）

crm(live)configure# show

node www.rs1.com

node www.rs2.com

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false

crm(live)configure# verify //看有没有错误

crm(live)configure# commit //无错误后提交

crm(live)configure# show xml //也可以查看xml格式的配置，更加详细

<?xml version="1.0" ?>

<crm_config>

<cluster_property_set id="cib-bootstrap-options">

</cluster_property_set>

</crm_config>

<nodes>

</nodes>

<instance_attributes id="webip-instance_attributes">

</instance_attributes>

</primitive>

</resources>

</configuration>

</cib>

crm(live)configure# cd

crm(live)#

crm(live)# status //此时资源其实已经开始运行，查看资源运行情况

Online: [ www.rs1.com www.rs2.com ]

Full list of resources:

webip(ocf::heartbeat:IPaddr):Started www.rs1.com \\可以看到rs1被选为了DC，资源webip运行 \\在www.rs1.com上

___________________________________________________________________________________________

192.168.139.2

[[email protected] corosync]# ip addr show //可以看到VIP192.168.139.10在eth0:0上

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

inet 192.168.139.2/24 brd 192.168.139.255 scope global eth0

inet 192.168.139.10/24 brd 192.168.139.255 scope global secondary eth0

[[email protected] .ssh]# crm

crm(live)# resource

crm(live)resource# stop webip //停止webip资源

crm(live)resource# list

webip(ocf::heartbeat:IPaddr):(target-role:Stopped) Stopped

crm(live)resource# start webip

crm(live)resource# list

webip(ocf::heartbeat:IPaddr):Started

crm(live)resource# migrate webip //有风险实验迁移资源报错，用强制方法后webip资源启动不了，只能重启corosync

ERROR: resource.move: No target node: Move requires either a target node or ‘force‘

用status,可以看到如下错误

* webip_start_0 on www.rs2.com ‘not configured‘ (6): call=12, status=complete, exitreason=‘none‘,

last-rc-change=‘Sat Oct 29 08:55:24 2016‘, queued=1ms, exec=250ms

最后发现我rs2主机是克隆的，上面没有eth0网卡，只有eth1，而webip是定义在eth0上的（^_^）最后将eth1网卡改为了eth0,然后重启操作系统好了，以下是一个改网卡名称的文章

http://www.linuxidc.com/Linux/2015-06/118969.htm

在定义一个httpd资源

_____________________________________________________________

192.168.139.4

[[email protected] corosync]# rpm -qa httpd //本机无httpd

[[email protected] corosync]# yum install httpd //直接yum装

[[email protected] html]# vim index.html

[[email protected] html]# service httpd stop

Stopping httpd: [ OK ]

[[email protected] html]# chkconfig httpd off //集群资源千万别让开机自启动

___________________________________________________________________________________________

192.168.139.2

[[email protected] corosync]# rpm -qa httpd //本机无httpd

[[email protected] corosync]# yum install httpd //直接yum装

[[email protected] html]# vim index.html \\编辑httpd主页面，以区别不同的主机

[[email protected] html]# service httpd stop

Stopping httpd: [ OK ]

[[email protected] html]# chkconfig httpd off \\集群资源千万不能开机自启动

___________________________________________________________________________________________

192.168.139.4

[[email protected] corosync]# rpm -qa httpd //本机无httpd

[[email protected] corosync]# yum install httpd //直接yum装

[[email protected] html]# vim index.html

[[email protected] html]# service httpd stop

Stopping httpd: [ OK ]

[[email protected] html]# chkconfig httpd off

___________________________________________________________________________________________

192.168.139.2

[[email protected] ~]# crm

crm(live)# cd resource

crm(live)resource# list

webip(ocf::heartbeat:IPaddr):Started

crm(live)resource# cd ..

crm(live)# cd ra

crm(live)ra# providers httpd //可以看到httpd无提供商

crm(live)ra# list lsb //httpd这个ra属于ocf类别

auditd blk-availability corosync corosync-notifyd crond halt htcacheclean httpd

crm(live)ra# meta lsb:httpd //且用meta可以看到无其他参数，只有一些Operation

start and stop Apache HTTP Server (lsb:httpd)

server implementing the current HTTP standards.

Operations‘ defaults (advisory minimum):

start timeout=15

stop timeout=15

status timeout=15

restart timeout=15

force-reload timeout=15

monitor timeout=15 interval=15

crm(live)ra# cd

crm(live)# configure

crm(live)configure# primitive httpd lsb:httpd op start timeout=20 \\定义httpd主资源

crm(live)configure# show

node www.rs1.com

node www.rs2.com

primitive httpd lsb:httpd \

op start timeout=20 interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

meta target-role=Started

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# cd

crm(live)# status

Last updated: Sat Oct 29 10:39:04 2016Last change: Sat Oct 29 08:33:08 2016 by root via cibadmin on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\可以看到webip运行在rs1，而httpd运行在rs2

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs2.com

___________________________________________________________________________________________

192.168.139.4

[[email protected] ~]# netstat -tnlp |grep httpd

tcp 0 0 :::80 LISTEN 1718/httpd

浏览器访问192.168.139.4

___________________________________________________________________________________________

192.168.139.2

将两个资源定义为一个组，让一起运行在同一个节点

crm(live)configure# help group \\不懂就help

Define a group

Usage:

group <name> <rsc> [<rsc>...]

\\group 组名资源1 资源2 还可以描述组description，定义组的params，及meta属性，组的params有哪些要查官方文档

[description=<description>] \\描述

[meta attr_list] \\meta属性

[params attr_list] \\组的params

attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>

Example:

group internal_www disk0 fs0 internal_ip apache \

meta target_role=stopped

group vm-and-services vm vm-sshd meta container="vm" \\vm-and-service 组名 vm 资源1 vm-sshd 资源2 meta container="vm" meta属性

crm(live)configure# group webserver webip httpd \\webserver 组名 webip httpd为组中的两个资 \\源

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show

node www.rs1.com

node www.rs2.com

primitive httpd lsb:httpd \

op start timeout=20 interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

meta target-role=Started

group webserver webip httpd

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

crm(live)configure# cd

crm(live)# status

cibadmin on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ]

Full list of resources:

Resource Group: webserver \\资源组webserver定以后，两个资源会运行在一个节点上

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs1.com

浏览器测试192.168.139.10

crm(live)# node

crm(live)node# standby \\让rs1成为备用节点，资源转移到rs2上

crm(live)node# cd

crm(live)# status \\资源成功从rs1转移到了rs2

Last updated: Sat Oct 29 11:32:08 2016Last change: Sat Oct 29 11:31:51 2016 by root via crm_attribute on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

*{这里为什么不是without qurum,难道standby后还可以投票？}

2 nodes and 2 resources configured, 2 expected votes

Node www.rs1.com: standby

Online: [ www.rs2.com ]

Full list of resources: \\并且rs1被standby后资源照样运行正常,应该是只剩下rs2后票数只有一票

Resource Group: webserver \\票数只有一票，没有超过一半，资源被stop

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

crm(live)# node

crm(live)node# online \\让重新上线

crm(live)node# cd

crm(live)# status

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ]

Full list of resources:

Resource Group: webserver \\重新上线，票数够了，资源又启动

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

这次直接让rs2停掉

192.168.139.4

[[email protected] ~]# service corosync stop

Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ]

Waiting for corosync services to unload:. [ OK ]

192.168.139.2

crm(live)# status

Last updated: Sat Oct 29 11:53:25 2016Last change: Sat Oct 29 11:52:39 2016 by root via crm_attribute on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum

{这次是without quorum 没有达到法定票数,看来只有停掉服务才不能投票，standby后仍然可以}

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com ]

OFFLINE: [ www.rs2.com ]

Full list of resources:

Resource Group: webserver \\票数没有到法定票数，默认会stop资源

webip(ocf::heartbeat:IPaddr):Stopped

httpd(lsb:httpd):Stopped

192.168.139.4

[[email protected] ~]# service corosync start

192.168.139.2

crm(live)# status

Last updated: Sat Oct 29 11:59:36 2016Last change: Sat Oct 29 11:52:39 2016 by root via crm_attribute on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\rs2启动后，资源又启动了

Resource Group: webserver

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs1.com

将不够法定票数时的默认操作改为ignore

crm(live)# configure

crm(live)configure# property no-quorum-policy=ignore

crm(live)configure# show

node www.rs1.com \

attributes standby=off

node www.rs2.com \

attributes standby=off

primitive httpd lsb:httpd \

op start timeout=20 interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

meta target-role=Started

group webserver webip httpd

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false \

no-quorum-policy=ignore

crm(live)configure# verify

crm(live)configure# commit

192.168.139.4

[[email protected] ~]# service corosync stop

192.168.139.2

crm(live)# status

Last updated: Sat Oct 29 12:03:53 2016Last change: Sat Oct 29 12:03:25 2016 by root via cibadmin on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum

{without quorum 不够法定票数}

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com ]

OFFLINE: [ www.rs2.com ]

Full list of resources: \\但是服务照样运行，因为ignore

Resource Group: webserver

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs1.com

192.168.139.4

[[email protected] ~]# service corosync start

[[email protected] ~]# crm

crm(live)# node

crm(live)node# standby

crm(live)node# cd

crm(live)# status

Last updated: Sat Oct 29 10:03:51 2016Last change: Sat Oct 29 10:03:46 2016 by root via crm_attribute on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

{此处仍然够票数，看来standby后仍然可以投票是对的}

2 nodes and 2 resources configured, 2 expected votes

Node www.rs2.com: standby

Online: [ www.rs1.com ]

Full list of resources:

Resource Group: webserver \\已经为ignore,票数够不够资源都运行

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs1.com

crm(live)# node

crm(live)node# online

不用定义组直接用约束，让资源在一起运行

crm(live)# resource

crm(live)resource# stop webserver

crm(live)resource# cleanup webserver

crm(live)resource# cd

crm(live)# configure

crm(live)configure# delete webserver

crm(live)configure# show

node www.rs1.com \

attributes standby=off

node www.rs2.com \

attributes standby=off

primitive httpd lsb:httpd \

op start timeout=20 interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

meta target-role=Started

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false \

no-quorum-policy=ignore \

last-lrm-refresh=1477714758

crm(live)configure# verify

crm(live)configure# commit

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\可以看到两个资源又运行在不同节点上了

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs2.com

定义colocation（资源与资源是否能运行在同一个节点，inf表示无穷大）

crm(live)# configure

crm(live)configure# colocation webip_with_httpd inf: webip httpd \\定义排列约束，约束两个资源

crm(live)configure# show

.........

colocation webip_with_httpd inf: webip httpd \\好像定义反了，这是httpd在哪，webip在哪；应该改为webip在哪，httpd在哪，谁在后谁做主

crm(live)configure# edit \\直接用edit编辑改

colocation webip_with_httpd inf: webip httpd

改为

colocation webip_with_httpd inf: httpd webip

crm(live)configure# show xml

<rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>

crm(live)configure# commit

crm(live)configure# cd

crm(live)# status

.........

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\两个资源又运行在了一个节点上

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs1.com

这样就用 colocation 排列约束将两个资源绑定了，资源启动也有先后顺序，定义Order顺序约束

crm(live)# configure

crm(live)configure# help order

Usage:

order <id> [{kind|<score>}:] first then [symmetrical=<bool>]

order <id> [{kind|<score>}:] resource_sets [symmetrical=<bool>]

kind :: Mandatory | Optional | Serialize 强制的|随意的|连续

first :: <rsc>[:<action>] \\资源后还可以定义action，将一个资源启动后采取什么操作在启动另一个，这些操作在resource下如start stop promote......

then :: <rsc>[:<action>]

resource_sets :: resource_set [resource_set ...]

crm(live)configure# order webip_before_httpd mandatory: webip httpd \\webip_before_httpd 是id mandatory 是kind,还可以是score： webip先启动 httpd后启动

crm(live)configure# commit

crm(live)configure# show xml

<rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>

<rsc_order id="webip_before_httpd" kind="Mandatory" first="webip" then="httpd"/>

first webip,then httpd

crm(live)configure# cd

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\当前在rs1上运行

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs1.com

crm(live)# node

crm(live)node# standby \\让rs1变为standby

crm(live)node# cd

crm(live)# status

Node www.rs1.com: standby

Online: [ www.rs2.com ]

Full list of resources: \\切换太快，没看出谁先启动的（^_^），反正资源转移了

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

crm(live)# node

crm(live)node# online \\让rs1再上线

crm(live)node# cd

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\但是资源没有回来

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

如果想让上线后资源又转移回来怎么办？

定义location,位置约束（资源倾向运行在哪个节点）

crm(live)# configure

crm(live)configure# help location

Usage:

location <id> <rsc> [<attributes>] {<node_pref>|<rules>}

........

node_pref :: <score>: <node>

rules :: \\规则可以用表达式定义

rule [id_spec] [$role=<role>] <score>: <expression>

[rule [id_spec] [$role=<role>] <score>: <expression> ...]

location conn_1 internal_www \ conn_1 是id/名称 internal_www 是资源名

rule 50: #uname eq node1 \ 规则为当uname等于node1时分数为50

crm(live)configure# location wibip_on_rs1 webip rule 100: #uname eq www.rs1.com

\\当uname等于www.rs1.com时location的分数为100

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show xml

<rsc_location id="wibip_on_rs1" rsc="webip">

crm(live)configure# cd

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\location已经生效所以资源自动转移到了rs1

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs1.com

crm(live)# node

crm(live)node# standby \\rs1转为备节点

crm(live)node# cd

crm(live)# status

Node www.rs1.com: standby

Online: [ www.rs2.com ]

Full list of resources:\\资源转移到了rs2

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

crm(live)# node

crm(live)node# online

crm(live)node# cd

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\rs1上线后资源从rs2转移回来了

webip(ocf::heartbeat:IPaddr):Started www.rs1.com

httpd(lsb:httpd):Started www.rs1.com

为资源定义粘性（资源是否倾向运行在当前节点）

crm(live)# configure

crm(live)configure# rsc_defaults resource-stickiness=200 \\定义资源的粘性为200

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show xml

<meta_attributes id="rsc-options">

</meta_attributes>

crm(live)configure# cd

crm(live)# node standby

crm(live)# status

Node www.rs1.com: standby

Online: [ www.rs2.com ]

Full list of resources: \\资源转移到了rs2

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

crm(live)# node online \\重新上线

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\因为粘性stickiness（200）大于倾向性location（100），所以资源不会 \\再转移回rs1

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

再加一个FileSystem，及192.168.139.8 NFS-Server，共享一个主页面让无论哪个节点运行资源，其通过浏览器访问的页面相同

_____________________________________________________________

192.168.139.8

[[email protected] ~]# vim /etc/exports

/web/htdocs 192.168.139.0/24 (ro)

[[email protected] local]# cd /web/htdocs/

[[email protected] htdocs]# vim index.html

[[email protected] ~]# service iptables stop

[[email protected] ~]# service nfs start

___________________________________________________________________________________________
192.168.139.4

[email protected] ~]# mount 192.168.139.8:/web/htdocs /mnt

[[email protected] ~]# cd /mnt

[[email protected] mnt]# ll

total 4

-rw-r--r--. 1 nobody nobody 21 Nov 12 2016 index.html

[[email protected] mnt]# cd

[[email protected] ~]# umount /mnt/

[[email protected] ~]# crm

crm(live)# ra

crm(live)ra# list ocf \\Filesystem属于ocf类别

Filesystem HealthCPU HealthSMART IPaddr

crm(live)ra# providers Filesystem \\Filesystem由heartbeat提供

heartbeat

crm(live)ra# meta ocf:heartbeat:Filesystem

device* (string): block device \\ddevice必须有

The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.

directory* (string): mount point \\挂载点必须有

The mount point for the filesystem.

fstype* (string): filesystem type \\文件系统必须有

The type of filesystem to be mounted.

options (string): \\-o 指定挂载时的操作

Any extra options to be given as -o options to mount.

For bind mounts, add "bind" here and set fstype to "none".

We will do the right thing for options such as "bind,ro".

crm(live)ra# cd

crm(live)# configure

crm(live)configure# primitive nfs ocf:heartbeat:Filesystem params device=192.168.139.8:/web/htdocs/ directory=/var/www/html/ fstype=nfs op monitor timeout=60s

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show

primitive nfs Filesystem \

params device="192.168.139.8:/web/htdocs/" directory="/var/www/html/" fstype=nfs \

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

order webip_before_httpd Mandatory: webip httpd

colocation webip_with_httpd inf: httpd webip

location wibip_on_rs1 webip \

rule 100: #uname eq www.rs1.com \

expected-quorum-votes=2 \

stonith-enabled=false \

no-quorum-policy=ignore \

last-lrm-refresh=1477714758

rsc_defaults rsc-options: \

resource-stickiness=200

crm(live)configure# cd

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\可以看到三个资源都启动了，webip和httpd在一起都运行在rs2上，而nfs \\运行在rs1上，并且

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

nfs(ocf::heartbeat:Filesystem):Started www.rs1.com

___________________________________________________________________________________________

192.168.139.2

[[email protected] ~]# cd /var/www/html/

[[email protected] html]# ll

total 4

-rw-r--r--. 1 nobody nobody 21 Nov 12 2016 index.html

[[email protected] html]# vim index.html

<h1>www.NFS.com</h1> \\NFS共享的页面已经挂载了

如何让三个资源运行在一个节点上？

为Filestytem定义location和order

crm(live)configure# colocation nfs_with_webip inf: nfs webip \\nfs跟随webip,webip在哪nfs \\在哪

crm(live)configure# order webip_before_nfs mandatory: webip nfs \\先启动webip，再启动nfs

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show

primitive nfs Filesystem \

params device="192.168.139.8:/web/htdocs/" directory="/var/www/html/" fstype=nfs \

op monitor timeout=60s interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

colocation nfs_with_webip inf: nfs webip

order webip_before_httpd Mandatory: webip httpd

colocation webip_with_httpd inf: httpd webip

location wibip_on_rs1 webip \

rule 100: #uname eq www.rs1.com

expected-quorum-votes=2 \

stonith-enabled=false \

no-quorum-policy=ignore \

resource-stickiness=200

crm(live)configure# show xml

<rsc_order id="webip_before_httpd" kind="Mandatory" first="webip" then="httpd"/>

<rsc_order id="webip_before_nfs" kind="Mandatory" first="webip" then="nfs"/>

<rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>

<rsc_colocation id="nfs_with_webip" score="INFINITY" rsc="nfs" with-rsc="webip"/>

crm(live)# status

2 nodes and 3 resources configured, 2 expected votes \\三个资源两个节点，期望票数为两票

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\可以看到所有的资源都在rs2上了，因为资源黏性200，webip在rs1上location只有100，且在未配置Filesystem前，webip和httpd都运行在rs2上，所以现在三个资源都在rs2上

webip(ocf::heartbeat:IPaddr):Started www.rs2.com

httpd(lsb:httpd):Started www.rs2.com

nfs(ocf::heartbeat:Filesystem):Started www.rs2.com

crm(live)# q

bye

[[email protected] html]# mount \\rs2上可以看到nfs已经挂载

192.168.139.8:/web/htdocs/ on /var/www/html type nfs (rw,vers=4,addr=192.168.139.8,clientaddr=192.168.139.4)

[[email protected] html]# cd /var/www/html/

[[email protected] html]# ll

total 4

-rw-r--r--. 1 nobody nobody 21 Nov 12 2016 index.html

[[email protected] html]# vim index.html \\可以看到NFS-Server共享的页面

浏览器测试

[[email protected] html]# crm

crm(live)# node

crm(live)node# standby \\让rs2 standby

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\资源全部转移到了rs1

webip (ocf::heartbeat:IPaddr): Started www.rs1.com

httpd (lsb:httpd): Started www.rs1.com

nfs (ocf::heartbeat:Filesystem): Started www.rs1.com

浏览器访问，仍然是www.NFS.com 无论访问哪个节点，web页面一样

本文出自 “11097124” 博客，请务必保留此出处http://11107124.blog.51cto.com/11097124/1872079

以上是关于高可用集群之Corosync+Pacemaker及用CRM命令和NFS-server构建一个HA高可用集群的主要内容，如果未能解决你的问题，请参考以下文章

corosync+pacemaker+crmsh实现高可用

CentOS7/RHEL7 pacemaker+corosync高可用集群搭建

corosync+pacemaker的高可用集群

pacemaker+corosync实现zabbix高可用集群

corosync+pacemaker使用pcs构建高可用集群

Pacemaker+corosync实现高可用集群