corosync+pacemaker实验记录

Posted 2020-08-26

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了corosync+pacemaker实验记录相关的知识，希望对你有一定的参考价值。

OS: RHEL 6.5 64bit
corosync: 1.4.7 --yum方式安装
pacemaker: 1.1.2 --作为corosync依赖包自动安装

pacemaker是heartbeat发展到3.0独立出来的产物，红帽6.0系列上，使用yum安装corosync，默认会安装pacemaker作为CRM。

pacemaker常用配置工具：crmsh pcs
crmsh需要独立安装rpm包

主要配置文件：

/etc/corosync/corosync.conf
/etc/crm/crm.conf

实验主机：A B

资源    RA提供方(crm->ra->classes;crm->ra->info XXX)
webip    ocf:heartbeat:Ipaddr2
webfs    ocf:heartbeat:Filesystem
webdb    lsb:mysqld
apache    ocf:heartbeat:apache(params比较多)

定义资源：
crm->configure->group/primitive

primitive apache ocf:heartbeat:apache params configfile="/usr/local/apache2/conf/httpd.conf" httpd="/usr/local/apache2/bin/httpd" port="80" statusurl="http://127.0.0.1/server-status" op monitor timeout=20s interval=10s op start timeout=40s op stop timeout=60s op status timeout=30s

定义约束：
crm->configure->colocation/order/其他

order start_order Mandatory: webfs:start webdb:start apache:start
关键字    ID    kind或者分数：[资源ID:选项]..

展示配置：
crm->configure->show/show xml

编辑配置文件：
crm->configure->edit

注：配置完成请先执行verify命令，再执行commit

standby/online:
crm->node->standby [node名，默认为本地节点] /online

监控：
crm->status
crm_mon

重要的地方：
1.请在一台主机上启动和停止本地和其他节点（ssh互信方式）的corosync服务！否则有可能出现脑裂的情况。（在实验中，在A节点进行先启后停corosync服务再启操作后，出现了与B节点资源争用的情况；而在B节点上通过SSH启动A节点服务，则没有出现资源争用，原因不明。）
2.双机情况下应配置票数不足时集群策略，如：
crm->configure->property no-quorum-policy=ignore
否则资源将无法切换

有趣的地方：
本次实验中，我使用了共享文件系统/share下的目录www作为web服务器虚拟地址的DocumentRoot:

<VirtualHost *:80>
    ServerAdmin [email protected]
    DocumentRoot "/share/www/"
    ServerName www.test.kc
    ServerAlias test.kc
    ErrorLog "logs/test.kc-error_log"
    CustomLog "logs/test.kc-access_log" common
</VirtualHost>

在实验过程对主机服务器多次启停的操作过程中，多次出现了资源apache无法正常启动，并标注为unmanaged，而其他资源能够成功切换的情况：

Online: [ ha-test1 ha-test2 ]
webip   (ocf::heartbeat:IPaddr2):       Started ha-test2
webfs   (ocf::heartbeat:Filesystem):    Started ha-test2
webdb   (lsb:mysqld):   Started ha-test2
apache  (lsb:apache2):  FAILED ha-test1 (unmanaged)
Failed actions:
    apache_stop_0 on ha-test1 ‘unknown error‘ (1): call=611, status=complete, last-rc-change=‘Sun Jan  1 17:51:51 2017‘, queued=0ms, exec=51ms
    apache_stop_0 on ha-test1 ‘unknown error‘ (1): call=611, status=complete, last-rc-change=‘Sun Jan  1 17:51:51 2017‘, queued=0ms, exec=51ms

接下来怎么做呢？
首先执行service corosync stop，会一直提示unload，猜想原因是因为共享文件系统挂载在A节点，而apache服务运行在B节点，B节点上执行apache服务的stop操作，会因为找不到配置文件中配置的共享文件存储路径而报错，无法完成stop操作：

Syntax error on line 66 of /usr/local/apache2/conf/httpd.conf:
DocumentRoot must be a directory

于是我只能想到用杀进程的方式来强制停止corosync了：

ps -ef | grep corosync | grep -v grep |awk ‘{print $2}‘  | xargs kill -9

后来我一想，既然找不到该路径，导致关不掉apache，那我给他mkdir一个本地的目录让他找好了，在A/B主机上执行：

mkdir /share/www

再次启动双机的corosync服务时，多次启停操作，apache资源都成功切过去了，再没有出现上面的情况。
不知道算不算解决了问题，欢迎指正。

以上是关于corosync+pacemaker实验记录的主要内容，如果未能解决你的问题，请参考以下文章

创建pacemaker+corosync集群

corosync+pacemaker+http高可用操作手记

Corosync + Pacemaker 搭建高可用MariaDB服务

Centos 6.7高可用web集群corosync+pacemaker实现方案

corosync和pacemaker使用pcs构建高可用集群