HAC集群中,计划重新初始化数据库使用原集群配置的操作方法
Posted 瀚高PG实验室
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HAC集群中,计划重新初始化数据库使用原集群配置的操作方法相关的知识,希望对你有一定的参考价值。
瀚高数据库
目录
环境
文档用途
详细信息
环境
系统平台:N/A
版本:4.5
文档用途
HAC集群环境中,因某种特殊原因需要删除当前data目录并重建数据库,能够快速搭建集群;避免重新安装。
详细信息
1、所有节点停止hghac服务,删除原data目录,重新在主节点initdb(原配置的HAC集群文件不变)
[root@db data]# systemctl stop hghac-vip
[root@db data]# initdb -e sm4 -c "echo *******" -D /db/hgdbdata/data
2、启动节点1的HAC服务,此时集群信息显示异常
[root@db data]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) +-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+------+------+-------+----+-----------+
+--------+------+------+-------+----+-----------+
[root@db data]# systemctl status hghac-vip
● hghac-vip.service - hghac
Loaded: loaded (/etc/systemd/system/hghac-vip.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2022-03-18 12:16:07 CST; 3min 7s ago
Process: 44961 ExecStart=/opt/HighGo/tools/hghac/hghac /opt/HighGo/tools/hghac/hghac.yaml (code=exited, status=1/FAILURE)
Main PID: 44961 (code=exited, status=1/FAILURE)
Mar 18 12:16:05 db systemd[1]: Started hghac.
Mar 18 12:16:07 db systemd[1]: hghac-vip.service: main process exited, code=exited, status=1/FAILURE
Mar 18 12:16:07 db systemd[1]: Unit hghac-vip.service entered failed state.
Mar 18 12:16:07 db systemd[1]: hghac-vip.service failed.
[root@db data]# systemctl start hghac-vip
[root@db data]# systemctl status hghac-vip
● hghac-vip.service - hghac
Loaded: loaded (/etc/systemd/system/hghac-vip.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2022-03-18 12:19:26 CST; 2min 13s ago
Process: 45581 ExecStart=/opt/HighGo/tools/hghac/hghac /opt/HighGo/tools/hghac/hghac.yaml (code=exited, status=1/FAILURE)
Main PID: 45581 (code=exited, status=1/FAILURE)
Mar 18 12:19:24 db systemd[1]: Started hghac.
Mar 18 12:19:26 db systemd[1]: hghac-vip.service: main process exited, code=exited, status=1/FAILURE
Mar 18 12:19:26 db systemd[1]: Unit hghac-vip.service entered failed state.
Mar 18 12:19:26 db systemd[1]: hghac-vip.service failed.
3、HAC集群日志中会报错集群的identifier与原来不一致(因为重新建库了):
[root@db hghalog]# pwd
/db/hgdbdata/hghalog
[root@db hghalog]# tail -f patroni.log
2022-03-18 12:16:06,807 INFO: Selected new etcd server http://192.168.80.111:2379
2022-03-18 12:16:06,828 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-03-18 12:16:06,890 CRITICAL: system ID mismatch, node hghaca belongs to a different cluster: 7072987311974756506 != 7076286699020760566
2022-03-18 12:19:25,967 INFO: Selected new etcd server http://192.168.80.113:2379
2022-03-18 12:19:25,992 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-03-18 12:19:26,063 CRITICAL: system ID mismatch, node hghaca belongs to a different cluster: 7072987311974756506 != 7076286699020760566
4、各节点重启etcd和hghac服务后,还是报错如上。
[root@db ~]# /opt/HighGo/tools/hghac/etcd/amd64/etcdctl endpoint status --write-out=table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.80.111:2379 | ddbfd190d03ca278 | 3.4.15 | 20 kB | false | false | 218 | 1686066 | 1686066 | |
| http://192.168.80.112:2379 | 1c703f0b65f7bddb | 3.4.15 | 20 kB | false | false | 218 | 1686066 | 1686066 | |
| http://192.168.80.113:2379 | 92255e8f5c9ebfcd | 3.4.15 | 20 kB | true | false | 218 | 1686066 | 1686066 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@db ~]# systemctl start hghac-vip
[root@db ~]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7072987311974756506) +-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+------+------+-------+----+-----------+
+--------+------+------+-------+----+-----------+
5、原因分析:因为etcd的库文件中记录了此信息,需重新生成etcd的相关信息
[root@db etcd]# pwd
/opt/HighGo/tools/etcd
[root@db etcd]# ls
hgdw1.etcd
[root@db etcd]# pwd
/opt/HighGo/tools/etcd
[root@db etcd]# ls
hgdw1.etcd
[root@db etcd]# mv hgdw1.etcd hgdw1.etcd.bak <--所有节点都改名此目录或删除此目录
[root@db etcd]# systemctl stop etcd
[root@db etcd]# systemctl start etcd
[root@db etcd]# pwd
/opt/HighGo/tools/etcd
[root@db etcd]# ll
total 0
drwx------ 3 root root 20 Mar 18 12:34 hgdw1.etcd
drwx------. 3 root root 20 Mar 18 12:27 hgdw1.etcd.bak <--重启etcd会重新生成该目录及其下的所有文件,
[root@db etcd]# /opt/HighGo/tools/hghac/etcd/amd64/etcdctl endpoint status --write-out=table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.80.111:2379 | ddbfd190d03ca278 | 3.4.15 | 20 kB | true | false | 2 | 8 | 8 | |
| http://192.168.80.112:2379 | 1c703f0b65f7bddb | 3.4.15 | 20 kB | false | false | 2 | 8 | 8 | |
| http://192.168.80.113:2379 | 92255e8f5c9ebfcd | 3.4.15 | 20 kB | false | false | 2 | 8 | 8 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@db etcd]#
[root@db etcd]# systemctl start hghac-vip <--此时启动HAC,集群信息显示正常
[root@db etcd]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7076286699020760566) ----+---------+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart |
+--------+---------------------+--------+---------+----+-----------+-----------------+
| hghaca | 192.168.80.111:5866 | Leader | running | 2 | | * |
+--------+---------------------+--------+---------+----+-----------+-----------------+
[root@db etcd]#
启动其他节点的HAC,结果如下:
[root@db etcd]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list
+ Cluster: ha (7076286699020760566) -----+---------+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart |
+--------+---------------------+---------+---------+----+-----------+-----------------+
| hghaca | 192.168.80.111:5866 | Leader | running | 2 | | * |
| hghacb | 192.168.80.112:5866 | Replica | running | 2 | 0 | * |
| hghacc | 192.168.80.113:5866 | Replica | running | 2 | 0 | * |
+--------+---------------------+---------+---------+----+-----------+--------------以上是关于HAC集群中,计划重新初始化数据库使用原集群配置的操作方法的主要内容,如果未能解决你的问题,请参考以下文章