postgresql repmgr (MHA)

Posted 2023-03-02

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了postgresql repmgr (MHA)相关的知识，希望对你有一定的参考价值。

选举原理

在发生 Auto Failover 时，备节点在尝试多次连接主节点失败后（尝试次数及尝试间隔可以通过 repmgr.conf 配置文件修改），repmgrd 会在所有备节点中选举一个候选备节点（选举机制参考下文）提升为新主节点，其他备节点去 Follow 到该新主上，形成一个新的集群。

repmgr 选举候选备节点按照以下顺序选举：LSN > Priority > Node_ID

系统将优先选举一个 LSN 较大的节点，作为候选备节点；
若 LSN 一样，会根据 Priority 优先级进行比较（该优先级是在配置文件中进行参数配置，如果 Priority 为 0，则代表该节点被禁止提升为主节点）；
若优先级也一样，会比较节点的 Node ID，小者会优先选举。

PostgreSQL LSN即Log sequence number，日志序列号，这是WAL日志唯一的、全局的标识

61：主库： repmgr+master

62: 从库1：repmgr+standby

63: 从库2：repmgr+standby

64: 从库3：repmgr+witness

环境准备

所有机器：安装操作系统、创建用户目录、安装PG

主库：只初始化主库，启动主库归档

01.初始化环境

yum install -y cmake make gcc zlib gcc-c++ perl readline readline-devel zlib zlib-devel perl python36 tcl openssl ncurses-devel openldap pam
yum -y groupinstall "Development Tools"
yum -y install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl

02.主机互信

03.安装repmgr

主从安装 (su - pgsql)

cd /postgresql/soft10.100.2.0
tar zxvf repmgr*
cd /postgresql/soft/repmgr-5.0.0
./configure
make 
make install

04.主库创建相关用户

su - pgsql
pg_ctl start

createuser -s repmgr -h 127.0.0.1
createdb repmgr -O repmgr -h 127.0.0.1
psql -h 127.0.0.1 -c "alter user repmgr with password repmgr"
psql -h 127.0.0.1 -c "alter user repmgr set search_path to repmgr, \\"\\$user\\",public";

05.主库改pg_hba.conf

/postgresql/pgdata/pg_hba.conf

local repmgr repmgr md5
host repmgr repmgr 127.0.0.1/32 md5
host repmgr repmgr 10.100.2.0/24 md5

local replication repmgr md5
host replication repmgr 127.0.0.1/32 md5
host replication repmgr 10.100.2.0/24 md5

pg_ctl reload

06.修改repmgr.conf

61：
vi /postgresql/pg12/repmgr.conf
node_id=1
node_name=pg1
conninfo=host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin

62：
vi /postgresql/pg12/repmgr.conf
node_id=2
node_name=pg2
conninfo=host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin

63：
vi /postgresql/pg12/repmgr.conf
node_id=3
node_name=pg3
conninfo=host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin

07.注册主库服务

repmgr -f /postgresql/pg12/repmgr.conf primary register
repmgr -f /postgresql/pg12/repmgr.conf cluster show

psql -U repmgr -d repmgr -h 10.100.2.250

select * from repmgr.nodes;

repmgr=# select * from repmgr.nodes;
 node_id | upstream_node_id | active | node_name |  type   | location | priority |                                   conninfo                      
              | repluser | slot_name |         config_file          
---------+------------------+--------+-----------+---------+----------+----------+-----------------------------------------------------------------
--------------+----------+-----------+------------------------------
       3 |                  | t      | pg3       | primary | default  |      100 | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr conn
ect_timeout=2 | repmgr   |           | /postgresql/pg12/repmgr.conf
(1 row)

08.配置备库.pgpass文件密码文件

su - pgsql
echo "#ip:port:db:user:pwd" >> ~/.pgpass
echo "10.100.2.31:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.57:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.250:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
echo "10.100.2.130:5432:repmgr:repmgr:repmgr" >> ~/.pgpass
chmod 0600 ~/.pgpass

备库1：

repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone --dry-run  试运行
repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone

pg_ctl -D /postgresql/pgdata start
psql -h 10.100.2.31 -U repmgr
select * from pg_stat_wal_receiver;

postgres=# select * from pg_stat_wal_receiver;
  pid  |  status   | receive_start_lsn | receive_start_tli | received_lsn | received_tli |      last_msg_send_time       |     last_msg_receipt_tim
e     | latest_end_lsn |        latest_end_time        | slot_name | sender_host  | sender_port |                                                  
                                                                        conninfo                                                                   

-------+-----------+-------------------+-------------------+--------------+--------------+-------------------------------+-------------------------
------+----------------+-------------------------------+-----------+--------------+-------------+--------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------
 27507 | streaming | 0/8000000         |                 1 | 0/A000970    |            1 | 2023-03-01 10:36:14.223493+08 | 2023-03-01 10:36:14.2204
26+08 | 0/A000970      | 2023-03-01 10:24:43.053232+08 |           | 10.100.2.250 |        5432 | user=repmgr password=******** connect_timeout=2 d
bname=replication host=10.100.2.250 port=5432 application_name=pg1 fallback_application_name=walreceiver sslmode=disable sslcompression=0 gssencmod
e=disable krbsrvname=postgres target_session_attrs=any
(1 row)

备库2：

repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone --dry-run
repmgr -h 10.100.2.250 -U repmgr -d repmgr -f /postgresql/pg12/repmgr.conf standby clone

09.注册从库服务

备库62/63

repmgr -f /postgresql/pg12/repmgr.conf standby register
repmgr -f /postgresql/pg12/repmgr.conf cluster show

[pgsql@pg1:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg3      | default  | 100      | 1        | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2  | standby |   running | pg3      | default  | 100      | 1        | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 3  | pg3  | primary | * running |          | default  | 100      | 1        | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

10.配置witness节点

64witness节点：

vi /postgresql/pg12/repmgr.conf
node_id=4
node_name=pg4
conninfo=host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
data_directory=/postgresql/pgdata
pg_bindir=/postgresql/pg12/bin

初始化数据库

/postgresql/pg12/bin/initdb -D /postgresql/pgdata -E UTF8 --lc-collate=C --locale=en_US.utf8 -U postgres

拷贝postgresql.conf,pg_hba.conf

createuser -s repmgr -h 127.0.0.1
createdb repmgr -O repmgr -h 127.0.0.1
psql -h 127.0.0.1 -c "alter user repmgr with password repmgr"
psql -h 127.0.0.1 -c "alter user repmgr set search_path to repmgr, \\"\\$user\\",public";

注册为witness节点

repmgr -f /postgresql/pg12/repmgr.conf -h 10.100.2.250 -U repmgr -d repmgr witness register
repmgr -f /postgresql/pg12/repmgr.conf cluster show

[pgsql@pg4:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg3      | default  | 100      | 1        | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2  | standby |   running | pg3      | default  | 100      | 1        | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 3  | pg3  | primary | * running |          | default  | 100      | 1        | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 4  | pg4  | witness | * running | pg3      | default  | 0        | 1        | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

11.switchover正常主从切换

所有库

echo "max_replication_slots=10" >> /postgresql/pgdata/postgresql.conf
echo "wal_log_hints=on" >> /postgresql/pgdata/postgresql.conf
echo "shared_preload_libraries=repmgr"  >> /postgresql/pgdata/postgresql.conf
pg_ctl stop
pg_ctl start

把备库62变为主库：

在备库

repmgr -f /postgresql/pg12/repmgr.conf cluster show
repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind 试切
repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --force-rewind

测试写入数据

repmgr -f /postgresql/pg12/repmgr.conf cluster show
psql -h 10.100.2.31 -U itpux -d itpuxdb
insert into t_itpux values(杉欣虞浙峰);
insert into t_itpux values(杉欣陈志豪);
select * from t_itpux;

测试写入数据到主库

psql -h 10.100.2.57 -U itpux -d itpuxdb
insert into t_itpux values(开拓刘开杰);
insert into t_itpux values(点春);
insert into t_itpux values(证券);
insert into t_itpux values(基金);
select * from t_itpux;

12.failover自动主从切换

vi /postgresql/pg12/repmgr.conf
monitoring_history=yes
monitor_interval_secs=5
failover=automatic
reconnect_attempts=6
reconnect_interval=5
promote_command=repmgr standby promote -f /postgresql/pg12/repmgr.conf --log-to-file
follow_command=repmgr standby follow -f /postgresql/pg12/repmgr.conf --log-to-file --upstream-node-id=%n
log_level=INFO
log_status_interval=10
log_file=/postgresql/pg12/repmgr.log

vi /etc/logrotate.conf
/postgresql/pg12/repmgr.log 
missingok
compress
rotate 30
daily
dateext
create 0600 pgsql pgsql

repmgrd启动与停止

启动：

repmgrd -f /postgresql/pg12/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize

停止：

kill `cat /tmp/repmgrd.pid`

[pgsql@pg4:/postgresql/soft/repmgr-5.0.0]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
 ID | Name | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                                            
----+------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1  | primary | ? unreachable        |          | default  | 100      | ?        | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2  | standby | ! running as primary |          | default  | 100      | 3        | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 3  | pg3  | standby |   running            | ! pg2    | default  | 100      | 2        | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 4  | pg4  | witness | * running            | ? pg1    | default  | 0        | 1        | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "pg1" (ID: 1)
  - node "pg1" (ID: 1) is registered as an active primary but is unreachable
  - node "pg2" (ID: 2) is registered as standby but running as primary
  - node "pg3" (ID: 3) reports a different upstream (reported: "pg2", expected "pg1")
  - unable to connect to node "pg4" (ID: 4)s upstream node "pg1" (ID: 1)

[pgsql@pg1:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
 ID | Name | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                                            
----+------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1  | primary | * running            |          | default  | 100      | 2        | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2  | standby | ! running as primary |          | default  | 100      | 3        | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 3  | pg3  | standby |   running            | ! pg2    | default  | 100      | 3        | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 4  | pg4  | witness | * running            | pg1      | default  | 0        | 1        | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "pg2" (ID: 2) is registered as standby but running as primary
  - node "pg3" (ID: 3) reports a different upstream (reported: "pg2", expected "pg1")

[pgsql@pg3:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1  | primary | ! running |          | default  | 100      | 2        | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2  | primary | * running |          | default  | 100      | 3        | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 3  | pg3  | standby |   running | pg2      | default  | 100      | 3        | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 4  | pg4  | witness | * running | pg1      | default  | 0        | 1        | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "pg1" (ID: 1) is running but the repmgr node record is inactive

61：重新加入集群

pg_ctl stop
repmgr -f /postgresql/pg12/repmgr.conf node rejoin -d host=10.100.2.57 user=repmgr dbname=repmgr connect_timeout=2 --force-rewind --dry-run --verbose
repmgr -f /postgresql/pg12/repmgr.conf node rejoin -d host=10.100.2.57 user=repmgr dbname=repmgr connect_timeout=2 --force-rewind --verbose

[pgsql@pg3:/home/pgsql]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg2      | default  | 100      | 2        | host=10.100.2.31 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2  | primary | * running |          | default  | 100      | 3        | host=10.100.2.57 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 3  | pg3  | standby |   running | pg2      | default  | 100      | 3        | host=10.100.2.250 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 4  | pg4  | witness | * running | pg1      | default  | 0        | 1        | host=10.100.2.130 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

以上是关于postgresql repmgr (MHA)的主要内容，如果未能解决你的问题，请参考以下文章