postgresql repmgr (MHA)

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了postgresql repmgr (MHA)相关的知识,希望对你有一定的参考价值。

选举原理

在发生 Auto Failover 时,备节点在尝试多次连接主节点失败后(尝试次数及尝试间隔可以通过 repmgr.conf 配置文件修改),repmgrd 会在所有备节点中选举一个候选备节点(选举机制参考下文)提升为新主节点,其他备节点去 Follow 到该新主上,形成一个新的集群。

repmgr 选举候选备节点按照以下顺序选举:LSN > Priority > Node_ID

  • 系统将优先选举一个 LSN 较大的节点,作为候选备节点;
  • 若 LSN 一样,会根据 Priority 优先级进行比较(该优先级是在配置文件中进行参数配置,如果 Priority 为 0,则代表该节点被禁止提升为主节点);
  • 若优先级也一样,会比较节点的 Node ID,小者会优先选举。

​PostgreSQL​​ LSN即Log sequence number,日志序列号,这是WAL日志唯一的、全局的标识


61:主库: repmgr+master

62: 从库1:repmgr+standby

63: 从库2:repmgr+standby

64: 从库3:repmgr+witness


环境准备

所有机器:安装操作系统、创建用户目录、安装PG

主库:只初始化主库,启动主库归档

01.初始化环境

yum install -y cmake make gcc zlib gcc-c++ perl readline readline-devel zlib zlib-devel perl python36 tcl openssl ncurses-devel openldap pam
yum -y groupinstall "Development Tools"
yum -y install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl

02.主机互信

03.安装repmgr

主从安装 (su - pgsql)

cd /postgresql/soft10.100.2.0
tar zxvf repmgr*
cd /postgresql/soft/repmgr-5.0.0
./configure
make
make install

04.主库创建相关用户

su - pgsql
pg_ctl start

createuser -s repmgr -h 127.0.0.1
createdb repmgr -O repmgr -h 127.0.0.1
psql -h 127.0.0.1 -c "alter user repmgr with password repmgr"
psql -h 127.0.0.1 -c "alter user repmgr set search_path to repmgr, \\"\\$user\\",public";


05.主库改pg_hba.conf

/postgresql/pgdata/pg_hba.conf

local repmgr repmgr md5
host repmgr repmgr 127.0.0.1/32 md5
host repmgr repmgr 10.100.2.0/24 md5

local replication repmgr md5
host replication repmgr 127.0.0.1/32 md5
host replication repmgr 10.100.2.0/24 md5

pg_ctl reload


06.修改repmgr.conf

61:
vi /postgresql/pg12/repmgr.conf
node_id=1
node_name=pg1
conninfo=host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin

62:
vi /postgresql/pg12/repmgr.conf
node_id=2
node_name=pg2
conninfo=host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin

63:
vi /postgresql/pg12/repmgr.conf
node_id=3
node_name=pg3
conninfo=host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin


07.注册主库服务

repmgr -f /postgresql/pg12/repmgr.conf primary register
repmgr -f /postgresql/pg12/repmgr.conf cluster show
psql -U repmgr -d repmgr -h 10.100.2.250

select * from repmgr.nodes;
repmgr=# select * from repmgr.nodes;
node_id | upstream_node_id | active | node_name | type | location | priority | conninfo
| repluser | slot_name | config_file
---------+------------------+--------+-----------+---------+----------+----------+-----------------------------------------------------------------
--------------+----------+-----------+------------------------------
3 | | t | pg3 | primary | default | 100 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr conn
ect_timeout=2 | repmgr | | /postgresql/pg12/repmgr.conf
(1 row)


08.配置备库.pgpass文件密码文件

su - pgsql
echo "#ip:port:db:user:pwd" >> ~/.pgpass
echo "10.100.2.31:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.57:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.250:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.130:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
chmod 0600 ~/.pgpass

备库1:

repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone --dry-run  试运行
repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone
pg_ctl -D /postgresql/pgdata start
psql -h 10.100.2.31 -U repmgr
select * from pg_stat_wal_receiver;
postgres=# select * from pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | received_lsn | received_tli | last_msg_send_time | last_msg_receipt_tim
e | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port |
conninfo

-------+-----------+-------------------+-------------------+--------------+--------------+-------------------------------+-------------------------
------+----------------+-------------------------------+-----------+--------------+-------------+--------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------
27507 | streaming | 0/8000000 | 1 | 0/A000970 | 1 | 2023-03-01 10:36:14.223493+08 | 2023-03-01 10:36:14.2204
26+08 | 0/A000970 | 2023-03-01 10:24:43.053232+08 | | 10.100.2.250 | 5432 | user=repmgr password=******** connect_timeout=2 d
bname=replication host=10.100.2.250 port=5432 application_name=pg1 fallback_application_name=walreceiver sslmode=disable sslcompression=0 gssencmod
e=disable krbsrvname=postgres target_session_attrs=any
(1 row)

备库2:

repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone --dry-run
repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone

09.注册从库服务

备库62/63

repmgr -f /postgresql/pg12/repmgr.conf standby register
repmgr -f /postgresql/pg12/repmgr.conf cluster show
[pgsql@pg1:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | pg3 | default | 100 | 1 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | running | pg3 | default | 100 | 1 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | primary | * running | | default | 100 | 1 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

10.配置witness节点

64witness节点:

vi /postgresql/pg12/repmgr.conf
node_id=4
node_name=pg4
conninfo=host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin

初始化数据库

/postgresql/pg12/bin/initdb -D /postgresql/pgdata -E UTF8 --lc-collate=C --locale=en_US.utf8 -U postgres

拷贝postgresql.conf,pg_hba.conf

createuser -s repmgr -h 127.0.0.1
createdb repmgr -O repmgr -h 127.0.0.1
psql -h 127.0.0.1 -c "alter user repmgr with password repmgr"
psql -h 127.0.0.1 -c "alter user repmgr set search_path to repmgr, \\"\\$user\\",public";

注册为witness节点

repmgr -f /postgresql/pg12/repmgr.conf -h 10.100.2.250 -U repmgr -d repmgr witness register
repmgr -f /postgresql/pg12/repmgr.conf cluster show
[pgsql@pg4:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | pg3 | default | 100 | 1 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | running | pg3 | default | 100 | 1 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | primary | * running | | default | 100 | 1 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | pg3 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

11.switchover正常主从切换

所有库

echo "max_replication_slots=10" >> /postgresql/pgdata/postgresql.conf
echo "wal_log_hints=on" >> /postgresql/pgdata/postgresql.conf
echo "shared_preload_libraries=repmgr" >> /postgresql/pgdata/postgresql.conf
pg_ctl stop
pg_ctl start

把备库62变为主库:

在备库

repmgr -f /postgresql/pg12/repmgr.conf cluster show
repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind 试切
repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --force-rewind

测试写入数据

repmgr -f /postgresql/pg12/repmgr.conf cluster show
psql -h 10.100.2.31 -U itpux -d itpuxdb
insert into t_itpux values(杉欣虞浙峰);
insert into t_itpux values(杉欣陈志豪);
select * from t_itpux;

测试写入数据到主库

psql -h 10.100.2.57 -U itpux -d itpuxdb
insert into t_itpux values(开拓刘开杰);
insert into t_itpux values(点春);
insert into t_itpux values(证券);
insert into t_itpux values(基金);
select * from t_itpux;


12.failover自动主从切换

vi /postgresql/pg12/repmgr.conf
monitoring_history=yes
monitor_interval_secs=5
failover=automatic
reconnect_attempts=6
reconnect_interval=5
promote_command=repmgr standby promote -f /postgresql/pg12/repmgr.conf --log-to-file
follow_command=repmgr standby follow -f /postgresql/pg12/repmgr.conf --log-to-file --upstream-node-id=%n
log_level=INFO
log_status_interval=10
log_file=/postgresql/pg12/repmgr.log
vi /etc/logrotate.conf
/postgresql/pg12/repmgr.log
missingok
compress
rotate 30
daily
dateext
create 0600 pgsql pgsql

repmgrd启动与停止

启动:

repmgrd -f /postgresql/pg12/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize

停止:

kill `cat /tmp/repmgrd.pid`

[pgsql@pg4:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | ? unreachable | | default | 100 | ? | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | ! running as primary | | default | 100 | 3 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | ! pg2 | default | 100 | 2 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | ? pg1 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "pg1" (ID: 1)
- node "pg1" (ID: 1) is registered as an active primary but is unreachable
- node "pg2" (ID: 2) is registered as standby but running as primary
- node "pg3" (ID: 3) reports a different upstream (reported: "pg2", expected "pg1")
- unable to connect to node "pg4" (ID: 4)s upstream node "pg1" (ID: 1)
[pgsql@pg1:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | * running | | default | 100 | 2 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | ! running as primary | | default | 100 | 3 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | ! pg2 | default | 100 | 3 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | pg1 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
- node "pg2" (ID: 2) is registered as standby but running as primary
- node "pg3" (ID: 3) reports a different upstream (reported: "pg2", expected "pg1")
[pgsql@pg3:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | ! running | | default | 100 | 2 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | * running | | default | 100 | 3 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg2 | default | 100 | 3 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | pg1 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
- node "pg1" (ID: 1) is running but the repmgr node record is inactive

61:重新加入集群

pg_ctl stop
repmgr -f /postgresql/pg12/repmgr.conf node rejoin -d host=10.100.2.57 user=repmgr dbname=repmgr connect_timeout=2 --force-rewind --dry-run --verbose
repmgr -f /postgresql/pg12/repmgr.conf node rejoin -d host=10.100.2.57 user=repmgr dbname=repmgr connect_timeout=2 --force-rewind --verbose
[pgsql@pg3:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | pg2 | default | 100 | 2 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | * running | | default | 100 | 3 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg2 | default | 100 | 3 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | pg1 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2​

以上是关于postgresql repmgr (MHA)的主要内容,如果未能解决你的问题,请参考以下文章

PostgreSQL集群管理—repmgr

PostgreSQL高可用套件repmgr+pgpool

PostgreSQL高可用套件repmgr+pgpool

postgresql repmgr setup

三postgresql-14+repmgr-5.3.3高可用安装配置

repmgr+pg12构建高可用集群