postgresql repmgr (MHA)
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了postgresql repmgr (MHA)相关的知识,希望对你有一定的参考价值。
选举原理
在发生 Auto Failover 时,备节点在尝试多次连接主节点失败后(尝试次数及尝试间隔可以通过 repmgr.conf 配置文件修改),repmgrd 会在所有备节点中选举一个候选备节点(选举机制参考下文)提升为新主节点,其他备节点去 Follow 到该新主上,形成一个新的集群。
repmgr 选举候选备节点按照以下顺序选举:LSN > Priority > Node_ID
- 系统将优先选举一个 LSN 较大的节点,作为候选备节点;
- 若 LSN 一样,会根据 Priority 优先级进行比较(该优先级是在配置文件中进行参数配置,如果 Priority 为 0,则代表该节点被禁止提升为主节点);
- 若优先级也一样,会比较节点的 Node ID,小者会优先选举。
PostgreSQL LSN即Log sequence number,日志序列号,这是WAL日志唯一的、全局的标识
61:主库: repmgr+master
62: 从库1:repmgr+standby
63: 从库2:repmgr+standby
64: 从库3:repmgr+witness
环境准备
所有机器:安装操作系统、创建用户目录、安装PG
主库:只初始化主库,启动主库归档
01.初始化环境
yum install -y cmake make gcc zlib gcc-c++ perl readline readline-devel zlib zlib-devel perl python36 tcl openssl ncurses-devel openldap pam
yum -y groupinstall "Development Tools"
yum -y install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl
02.主机互信
03.安装repmgr
主从安装 (su - pgsql)
cd /postgresql/soft10.100.2.0
tar zxvf repmgr*
cd /postgresql/soft/repmgr-5.0.0
./configure
make
make install
04.主库创建相关用户
su - pgsql
pg_ctl start
createuser -s repmgr -h 127.0.0.1
createdb repmgr -O repmgr -h 127.0.0.1
psql -h 127.0.0.1 -c "alter user repmgr with password repmgr"
psql -h 127.0.0.1 -c "alter user repmgr set search_path to repmgr, \\"\\$user\\",public";
05.主库改pg_hba.conf
/postgresql/pgdata/pg_hba.conf
local repmgr repmgr md5
host repmgr repmgr 127.0.0.1/32 md5
host repmgr repmgr 10.100.2.0/24 md5
local replication repmgr md5
host replication repmgr 127.0.0.1/32 md5
host replication repmgr 10.100.2.0/24 md5
pg_ctl reload
06.修改repmgr.conf
61:
vi /postgresql/pg12/repmgr.conf
node_id=1
node_name=pg1
conninfo=host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin
62:
vi /postgresql/pg12/repmgr.conf
node_id=2
node_name=pg2
conninfo=host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin
63:
vi /postgresql/pg12/repmgr.conf
node_id=3
node_name=pg3
conninfo=host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin
07.注册主库服务
repmgr -f /postgresql/pg12/repmgr.conf primary register
repmgr -f /postgresql/pg12/repmgr.conf cluster show
psql -U repmgr -d repmgr -h 10.100.2.250
select * from repmgr.nodes;
repmgr=# select * from repmgr.nodes;
node_id | upstream_node_id | active | node_name | type | location | priority | conninfo
| repluser | slot_name | config_file
---------+------------------+--------+-----------+---------+----------+----------+-----------------------------------------------------------------
--------------+----------+-----------+------------------------------
3 | | t | pg3 | primary | default | 100 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr conn
ect_timeout=2 | repmgr | | /postgresql/pg12/repmgr.conf
(1 row)
08.配置备库.pgpass文件密码文件
su - pgsql
echo "#ip:port:db:user:pwd" >> ~/.pgpass
echo "10.100.2.31:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.57:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.250:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.130:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
chmod 0600 ~/.pgpass
备库1:
repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone --dry-run 试运行
repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone
pg_ctl -D /postgresql/pgdata start
psql -h 10.100.2.31 -U repmgr
select * from pg_stat_wal_receiver;
postgres=# select * from pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | received_lsn | received_tli | last_msg_send_time | last_msg_receipt_tim
e | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port |
conninfo
-------+-----------+-------------------+-------------------+--------------+--------------+-------------------------------+-------------------------
------+----------------+-------------------------------+-----------+--------------+-------------+--------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------
27507 | streaming | 0/8000000 | 1 | 0/A000970 | 1 | 2023-03-01 10:36:14.223493+08 | 2023-03-01 10:36:14.2204
26+08 | 0/A000970 | 2023-03-01 10:24:43.053232+08 | | 10.100.2.250 | 5432 | user=repmgr password=******** connect_timeout=2 d
bname=replication host=10.100.2.250 port=5432 application_name=pg1 fallback_application_name=walreceiver sslmode=disable sslcompression=0 gssencmod
e=disable krbsrvname=postgres target_session_attrs=any
(1 row)
备库2:
repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone --dry-run
repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone
09.注册从库服务
备库62/63
repmgr -f /postgresql/pg12/repmgr.conf standby register
repmgr -f /postgresql/pg12/repmgr.conf cluster show
[pgsql@pg1:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | pg3 | default | 100 | 1 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | running | pg3 | default | 100 | 1 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | primary | * running | | default | 100 | 1 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
10.配置witness节点
64witness节点:
vi /postgresql/pg12/repmgr.conf
node_id=4
node_name=pg4
conninfo=host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin
初始化数据库
/postgresql/pg12/bin/initdb -D /postgresql/pgdata -E UTF8 --lc-collate=C --locale=en_US.utf8 -U postgres
拷贝postgresql.conf,pg_hba.conf
createuser -s repmgr -h 127.0.0.1
createdb repmgr -O repmgr -h 127.0.0.1
psql -h 127.0.0.1 -c "alter user repmgr with password repmgr"
psql -h 127.0.0.1 -c "alter user repmgr set search_path to repmgr, \\"\\$user\\",public";
注册为witness节点
repmgr -f /postgresql/pg12/repmgr.conf -h 10.100.2.250 -U repmgr -d repmgr witness register
repmgr -f /postgresql/pg12/repmgr.conf cluster show
[pgsql@pg4:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | pg3 | default | 100 | 1 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | running | pg3 | default | 100 | 1 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | primary | * running | | default | 100 | 1 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | pg3 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
11.switchover正常主从切换
所有库
echo "max_replication_slots=10" >> /postgresql/pgdata/postgresql.conf
echo "wal_log_hints=on" >> /postgresql/pgdata/postgresql.conf
echo "shared_preload_libraries=repmgr" >> /postgresql/pgdata/postgresql.conf
pg_ctl stop
pg_ctl start
把备库62变为主库:
在备库
repmgr -f /postgresql/pg12/repmgr.conf cluster show
repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind 试切
repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --force-rewind
测试写入数据
repmgr -f /postgresql/pg12/repmgr.conf cluster show
psql -h 10.100.2.31 -U itpux -d itpuxdb
insert into t_itpux values(杉欣虞浙峰);
insert into t_itpux values(杉欣陈志豪);
select * from t_itpux;
测试写入数据到主库
psql -h 10.100.2.57 -U itpux -d itpuxdb
insert into t_itpux values(开拓刘开杰);
insert into t_itpux values(点春);
insert into t_itpux values(证券);
insert into t_itpux values(基金);
select * from t_itpux;
12.failover自动主从切换
vi /postgresql/pg12/repmgr.conf
monitoring_history=yes
monitor_interval_secs=5
failover=automatic
reconnect_attempts=6
reconnect_interval=5
promote_command=repmgr standby promote -f /postgresql/pg12/repmgr.conf --log-to-file
follow_command=repmgr standby follow -f /postgresql/pg12/repmgr.conf --log-to-file --upstream-node-id=%n
log_level=INFO
log_status_interval=10
log_file=/postgresql/pg12/repmgr.log
vi /etc/logrotate.conf
/postgresql/pg12/repmgr.log
missingok
compress
rotate 30
daily
dateext
create 0600 pgsql pgsql
repmgrd启动与停止
启动:
repmgrd -f /postgresql/pg12/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize
停止:
kill `cat /tmp/repmgrd.pid`
[pgsql@pg4:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | ? unreachable | | default | 100 | ? | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | ! running as primary | | default | 100 | 3 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | ! pg2 | default | 100 | 2 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | ? pg1 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "pg1" (ID: 1)
- node "pg1" (ID: 1) is registered as an active primary but is unreachable
- node "pg2" (ID: 2) is registered as standby but running as primary
- node "pg3" (ID: 3) reports a different upstream (reported: "pg2", expected "pg1")
- unable to connect to node "pg4" (ID: 4)s upstream node "pg1" (ID: 1)
[pgsql@pg1:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | * running | | default | 100 | 2 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | ! running as primary | | default | 100 | 3 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | ! pg2 | default | 100 | 3 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | pg1 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- node "pg2" (ID: 2) is registered as standby but running as primary
- node "pg3" (ID: 3) reports a different upstream (reported: "pg2", expected "pg1")
[pgsql@pg3:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | ! running | | default | 100 | 2 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | * running | | default | 100 | 3 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg2 | default | 100 | 3 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | pg1 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- node "pg1" (ID: 1) is running but the repmgr node record is inactive
61:重新加入集群
pg_ctl stop
repmgr -f /postgresql/pg12/repmgr.conf node rejoin -d host=10.100.2.57 user=repmgr dbname=repmgr connect_timeout=2 --force-rewind --dry-run --verbose
repmgr -f /postgresql/pg12/repmgr.conf node rejoin -d host=10.100.2.57 user=repmgr dbname=repmgr connect_timeout=2 --force-rewind --verbose
[pgsql@pg3:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | pg2 | default | 100 | 2 | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | * running | | default | 100 | 3 | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg2 | default | 100 | 3 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | pg4 | witness | * running | pg1 | default | 0 | 1 | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
以上是关于postgresql repmgr (MHA)的主要内容,如果未能解决你的问题,请参考以下文章