Configure PostgreSQL Replication With Repmgr

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Configure PostgreSQL Replication With Repmgr相关的知识,希望对你有一定的参考价值。

本文介绍使用开源的repmgr组件配置PostgreSQL 12的replication以及failover。

1、环境信息

技术图片

2、安装PG软件包

所有节点安装PostgreSQL 12以及repmgr软件包。

[root@hwd04 ~]# dnf -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
[root@hwd04 ~]# dnf -qy module disable postgresql
[root@hwd04 ~]# dnf install postgresql12-server postgresql12-contrib repmgr12

3、配置primary节点

3.1 初始化PostgreSQL数据库

[root@hwd04 ~]# /usr/pgsql-12/bin/postgresql-12-setup initdb
Initializing database ... OK

3.2 配置PostgreSQL参数

[root@hwd04 ~]# vi /var/lib/pgsql/12/data/postgresql.conf 
listen_addresses = ‘*‘ 
max_wal_senders = 10
max_replication_slots = 10
wal_level = ‘replica‘
wal_log_hints = on
hot_standby = on
archive_mode = on
archive_command = ‘/bin/true‘

重启PostgreSQL服务:

[root@hwd04 ~]# systemctl enable postgresql-12.service
[root@hwd04 ~]# systemctl restart postgresql-12.service 

3.3 创建repmgr数据库以及用户

[root@hwd04 ~]# su - postgres
[postgres@hwd04 ~]$ createuser --superuser repmgr
[postgres@hwd04 ~]$ createdb --owner=repmgr repmgr
[postgres@hwd04 ~]$ psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"

编辑postgresql.conf文件,加入以下内容,表示当pg启动的时候载入repmgr组件:

[root@hwd04 ~]# vi /var/lib/pgsql/12/data/postgresql.conf 
shared_preload_libraries = ‘repmgr‘

3.4 配置repmgr服务

repmgr默认的配置文件路径为/etc/repmgr/12/repmgr.conf,主备节点分别加入以下内容。

--hwd04(primary)
[root@hwd04 ~]# vi /etc/repmgr/12/repmgr.conf
node_id=1
node_name=‘hwd04‘
conninfo=‘host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2‘ 
data_directory=‘/var/lib/pgsql/12/data‘
--hwd05(standby)
[root@hwd05 ~]# vi /etc/repmgr/12/repmgr.conf
node_id=2
node_name=‘hwd05‘
conninfo=‘host=192.168.120.26 user=repmgr dbname=repmgr connect_timeout=2‘ 
data_directory=‘/var/lib/pgsql/12/data‘
--hwd06(standby)
[root@hwd06 ~]# vi /etc/repmgr/12/repmgr.conf
node_id=3
node_name=‘hwd06‘
conninfo=‘host=192.168.120.27 user=repmgr dbname=repmgr connect_timeout=2‘ 
data_directory=‘/var/lib/pgsql/12/data‘

3.5 配置primary节点的pg_hba.conf

#For Replication
local   replication     repmgr                              trust
host    replication     repmgr      127.0.0.1/32            trust
host    replication     repmgr      192.168.120.0/24        trust

local   repmgr          repmgr                              trust
host    repmgr          repmgr      127.0.0.1/32            trust
host    repmgr          repmgr      192.168.120.0/24        trust

重启pg服务:

[root@hwd04 ~]# systemctl restart postgresql-12.service

standby节点验证是否可以访问primary节点:

[postgres@hwd05 ~]$ psql ‘host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2‘
psql (12.3)
Type "help" for help.

repmgr=# q
[postgres@hwd06 ~]$ psql ‘host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2‘
psql (12.3)
Type "help" for help.

repmgr=# q

3.6 向repmgr中注册primary节点

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

注册完成后,使用下面的命令验证集群状态:
技术图片

4、克隆standby节点

在正式克隆之前,可以先进行预演,如果没有报错正式进行克隆,否则根据预演的报错信息,排查完成后,进行正式克隆。

4.1 克隆standby预演

[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met

[postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met

4.2 正式克隆standby

有N个Standby节点,就执行N次standby克隆操作。

[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/var/lib/pgsql/12/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/pgsql/12/data -h 192.168.120.25 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/pgsql/12/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
[postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/var/lib/pgsql/12/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/pgsql/12/data -h 192.168.120.25 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/pgsql/12/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

克隆完成后,启动各个standby节点的PostgreSQL服务:

[root@hwd05 ~]# systemctl enable postgresql-12.service
[root@hwd05 ~]# systemctl restart postgresql-12.service

4.3 向repmgr注册standby节点

[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register
INFO: connecting to local node "hwd05" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "hwd05" (ID: 2) successfully registered
[postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register
INFO: connecting to local node "hwd06" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "hwd06" (ID: 3) successfully registered

注册完成后,检查集群状态:

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster show --compact

技术图片
到此,整个流复制服务配置完成。

5、配置automatic failover服务

5.1 配置PostgreSQL服务

[root@hwd12 ~]# /usr/pgsql-12/bin/postgresql-12-setup initdb
Initializing database ... OK
[root@hwd12 ~]# vi /var/lib/pgsql/12/data/postgresql.conf
listen_addresses = ‘*‘
shared_preload_libraries = ‘repmgr‘
[root@hwd12 ~]# vi /var/lib/pgsql/12/data/pg_hba.conf 
local   replication     repmgr                              trust
host    replication     repmgr      127.0.0.1/32            trust
host    replication     repmgr      192.168.120.0/24        trust
local   repmgr          repmgr                              trust
host    repmgr          repmgr      127.0.0.1/32            trust
host    repmgr          repmgr      192.168.120.0/24        trust
[root@hwd12 ~]# systemctl enable postgresql-12.service
[root@hwd12 ~]# systemctl restart postgresql-12.service

5.2 创建repmgr数据库以及用户

[root@hwd12 ~]# su - postgres
[postgres@hwd12 ~]$ createuser --superuser repmgr
[postgres@hwd12 ~]$ createdb --owner=repmgr repmgr
[postgres@hwd12 ~]$ psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"

主节点连接witness节点测试:

[postgres@hwd04 ~]$ psql ‘host=192.168.120.50 user=repmgr dbname=repmgr connect_timeout=2‘        
psql (12.3)
Type "help" for help.

repmgr=# q

5.3 编辑repmgr配置文件

[root@hwd12 ~]# vi /etc/repmgr/12/repmgr.conf 
node_id=4
node_name=‘hwd12‘
conninfo=‘host=192.168.120.50 user=repmgr dbname=repmgr connect_timeout=2‘
data_directory=‘/var/lib/pgsql/12/data‘

5.4 向repmgr注册witness节点

[postgres@hwd12 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf witness register -h 192.168.120.25
INFO: connecting to witness node "hwd12" (ID: 4)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "hwd12" (ID: 4) successfully registered

注册完成后,查询集群状态如下图所示:
技术图片

5.5 所有节点编辑 sudoers文件

加入以下内容:

[root@hwd12 ~]# vi /etc/sudoer
Defaults:postgres !requiretty
postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-12.service, /usr/bin/systemctl start postgresql-12.service, /usr/bin/systemctl restart postgresql-12.service, /usr/bin/systemctl reload postgresql-12.service, /usr/bin/systemctl start repmgr12.service, /usr/bin/systemctl stop repmgr12.service

5.6 配置repmgr参数

编辑所有节点的repmgr.conf文件,加入以下内容:

failover=‘automatic‘                    
priority=60                             
connection_check_type=ping              
reconnect_attempts=6                    
reconnect_interval=10                   
promote_command=‘/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file‘
follow_command=‘/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n‘
monitoring_history=yes
monitor_interval_secs=2
standby_disconnect_on_failover=true
primary_visibility_consensus=true
log_status_interval=60
service_start_command = ‘sudo /usr/bin/systemctl start postgresql-12.service‘
service_stop_command = ‘sudo /usr/bin/systemctl stop postgresql-12.service‘
service_restart_command = ‘sudo /usr/bin/systemctl restart postgresql-12.service‘
service_reload_command = ‘sudo /usr/bin/systemctl reload postgresql-12.service‘
repmgrd_service_start_command = ‘sudo /usr/bin/systemctl start repmgr12.service‘
repmgrd_service_stop_command = ‘sudo /usr/bin/systemctl stop repmgr12.service‘

注意:standby的priority值需要更改,因为默认是100,而primary使用的是默认值。这里设置hwd05的priority为60,hwd06的priority为40。而witness节点hwd12不需要设置priority参数。另外,priority的值越大,成为primary的优先级就越高。
编辑完成后,启动各个节点的repmgr服务:

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start --dry-run
INFO: prerequisites for starting repmgrd met
DETAIL: following command would be executed:
  sudo /usr/bin/systemctl start repmgr12.service
[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start
NOTICE: executing: "sudo /usr/bin/systemctl start repmgr12.service"
NOTICE: repmgrd was successfully started

启动完成后,可以在primary或者standby节点查询集群的events,如下:

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster event --event=repmgrd_start

技术图片
也可以通过操作系统日志文件,查询repmgr相关信息。

5.7 Primary故障模拟测试

这里将hwd04的PostgreSQL服务停止掉,然后通过日志信息,是否可以实现自动将standby角色转为primary角色,其他正常节点重新连接到新的primary节点。

  • 停止primary节点服务
    [postgres@hwd04 ~]$ sudo systemctl stop postgresql-12.service

    停止后,查看集群信息,发现primary节点状态变为unreachable。
    技术图片
    1分钟后,再查看witness节点的日志,就会发现hwd05已成为新的primary,其他节点已重新连接至hwd05,witness日志如下:
    技术图片
    技术图片
    旧primary故障恢复后,并不会自动转换为standby,而是以primary角色独自运行,这时就需要手工将旧primary强制(-F)转换为standby,如下:

    [postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.26 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone -F
    NOTICE: destination directory "/var/lib/pgsql/12/data" provided
    INFO: connecting to source node
    DETAIL: connection string is: host=192.168.120.26 user=repmgr dbname=repmgr
    DETAIL: current installation size is 15 GB
    NOTICE: checking for available walsenders on the source node (2 required)
    NOTICE: checking replication connections can be made to the source server (2 required)
    WARNING: directory "/var/lib/pgsql/12/data" exists but is not empty
    NOTICE: -F/--force provided - deleting existing data directory "/var/lib/pgsql/12/data"
    NOTICE: starting backup (using pg_basebackup)...
    HINT: this may take some time; consider using the -c/--fast-checkpoint option
    INFO: executing:
    /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/pgsql/12/data -h 192.168.120.26 -p 5432 -U repmgr -X stream 
    NOTICE: standby clone (using pg_basebackup) complete
    NOTICE: you can now start your PostgreSQL server
    HINT: for example: sudo /usr/bin/systemctl start postgresql-12.service
    HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
    [postgres@hwd04 ~]$ sudo systemctl start postgresql-12.service
    [postgres@hwd04 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf standby register -F                                                  
    INFO: connecting to local node "hwd04" (ID: 1)
    INFO: connecting to primary database
    INFO: standby registration complete
    NOTICE: standby node "hwd04" (ID: 1) successfully registered

    技术图片
    也可以通过查询pg_stat_replication视图获取相关信息,如下:

    postgres=# select pid,usesysid,usename,application_name,client_addr,client_port,state,sent_lsn,write_lsn,flush_lsn,sync_state from pg_stat_replication;  

    技术图片

以上是关于Configure PostgreSQL Replication With Repmgr的主要内容,如果未能解决你的问题,请参考以下文章

错误:请安装PostgreSQL服务器开发包并重新运行configure

安装postgreSQL出现configure: error: zlib library not found解决方法

postgresql安装指南

PostgreSQL11.2 configure卡住 checking for DocBook XML V4.2

postgresql复制参考

ubuntu编译安装postgresql