HAC集群中，计划重新初始化数据库使用原集群配置的操作方法

Posted 2022-06-27 瀚高PG实验室

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了HAC集群中，计划重新初始化数据库使用原集群配置的操作方法相关的知识，希望对你有一定的参考价值。

瀚高数据库
目录
环境
文档用途
详细信息

环境
系统平台：N/A
版本：4.5
文档用途
HAC集群环境中，因某种特殊原因需要删除当前data目录并重建数据库，能够快速搭建集群；避免重新安装。

详细信息
1、所有节点停止hghac服务，删除原data目录，重新在主节点initdb（原配置的HAC集群文件不变）

[root@db data]# systemctl stop hghac-vip

[root@db data]# initdb -e sm4 -c "echo *******" -D  /db/hgdbdata/data

2、启动节点1的HAC服务，此时集群信息显示异常

[root@db data]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list

+ Cluster: ha (7072987311974756506) +-----------+

| Member | Host | Role | State | TL | Lag in MB |

+--------+------+------+-------+----+-----------+

+--------+------+------+-------+----+-----------+

[root@db data]# systemctl status hghac-vip

● hghac-vip.service - hghac

  Loaded: loaded (/etc/systemd/system/hghac-vip.service; enabled; vendor preset: disabled)

   Active: failed (Result: exit-code) since Fri 2022-03-18 12:16:07 CST; 3min 7s ago

  Process: 44961 ExecStart=/opt/HighGo/tools/hghac/hghac /opt/HighGo/tools/hghac/hghac.yaml (code=exited, status=1/FAILURE)

 Main PID: 44961 (code=exited, status=1/FAILURE)



Mar 18 12:16:05 db systemd[1]: Started hghac.

Mar 18 12:16:07 db systemd[1]: hghac-vip.service: main process exited, code=exited, status=1/FAILURE

Mar 18 12:16:07 db systemd[1]: Unit hghac-vip.service entered failed state.

Mar 18 12:16:07 db systemd[1]: hghac-vip.service failed.

[root@db data]# systemctl start hghac-vip

[root@db data]# systemctl status hghac-vip

● hghac-vip.service - hghac

 Loaded: loaded (/etc/systemd/system/hghac-vip.service; enabled; vendor preset: disabled)

   Active: failed (Result: exit-code) since Fri 2022-03-18 12:19:26 CST; 2min 13s ago

  Process: 45581 ExecStart=/opt/HighGo/tools/hghac/hghac /opt/HighGo/tools/hghac/hghac.yaml (code=exited, status=1/FAILURE)

 Main PID: 45581 (code=exited, status=1/FAILURE)



Mar 18 12:19:24 db systemd[1]: Started hghac.

Mar 18 12:19:26 db systemd[1]: hghac-vip.service: main process exited, code=exited, status=1/FAILURE

Mar 18 12:19:26 db systemd[1]: Unit hghac-vip.service entered failed state.

Mar 18 12:19:26 db systemd[1]: hghac-vip.service failed.

3、HAC集群日志中会报错集群的identifier与原来不一致（因为重新建库了）：

[root@db hghalog]# pwd

/db/hgdbdata/hghalog

[root@db hghalog]# tail -f patroni.log 

2022-03-18 12:16:06,807 INFO: Selected new etcd server http://192.168.80.111:2379

2022-03-18 12:16:06,828 INFO: No PostgreSQL configuration items changed, nothing to reload.

2022-03-18 12:16:06,890 CRITICAL: system ID mismatch, node hghaca belongs to a different cluster: 7072987311974756506 != 7076286699020760566

2022-03-18 12:19:25,967 INFO: Selected new etcd server http://192.168.80.113:2379

2022-03-18 12:19:25,992 INFO: No PostgreSQL configuration items changed, nothing to reload.

2022-03-18 12:19:26,063 CRITICAL: system ID mismatch, node hghaca belongs to a different cluster: 7072987311974756506 != 7076286699020760566

4、各节点重启etcd和hghac服务后，还是报错如上。

[root@db ~]# /opt/HighGo/tools/hghac/etcd/amd64/etcdctl endpoint status --write-out=table

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

| http://192.168.80.111:2379 | ddbfd190d03ca278 |  3.4.15 |   20 kB |     false |      false |       218 |    1686066 |            1686066 |        |

| http://192.168.80.112:2379 | 1c703f0b65f7bddb |  3.4.15 |   20 kB |     false |      false |       218 |    1686066 |            1686066 |        |

| http://192.168.80.113:2379 | 92255e8f5c9ebfcd |  3.4.15 |   20 kB |      true |      false |       218 |    1686066 |            1686066 |        |

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

[root@db ~]# systemctl start hghac-vip

[root@db ~]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list

+ Cluster: ha (7072987311974756506) +-----------+

| Member | Host | Role | State | TL | Lag in MB |

+--------+------+------+-------+----+-----------+

+--------+------+------+-------+----+-----------+

5、原因分析：因为etcd的库文件中记录了此信息，需重新生成etcd的相关信息

[root@db etcd]# pwd

/opt/HighGo/tools/etcd

[root@db etcd]# ls

hgdw1.etcd

[root@db etcd]# pwd

/opt/HighGo/tools/etcd

[root@db etcd]# ls

hgdw1.etcd

[root@db etcd]# mv hgdw1.etcd hgdw1.etcd.bak  <--所有节点都改名此目录或删除此目录

[root@db etcd]# systemctl stop etcd

[root@db etcd]# systemctl start etcd

[root@db etcd]# pwd

/opt/HighGo/tools/etcd

[root@db etcd]# ll

total 0

drwx------  3 root root 20 Mar 18 12:34 hgdw1.etcd

drwx------. 3 root root 20 Mar 18 12:27 hgdw1.etcd.bak  <--重启etcd会重新生成该目录及其下的所有文件，

[root@db etcd]# /opt/HighGo/tools/hghac/etcd/amd64/etcdctl endpoint status --write-out=table

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

| http://192.168.80.111:2379 | ddbfd190d03ca278 |  3.4.15 |   20 kB |      true |      false |         2 |          8 |                  8 |        |

| http://192.168.80.112:2379 | 1c703f0b65f7bddb |  3.4.15 |   20 kB |     false |      false |         2 |          8 |                  8 |        |

| http://192.168.80.113:2379 | 92255e8f5c9ebfcd |  3.4.15 |   20 kB |     false |      false |         2 |          8 |                  8 |        |

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

[root@db etcd]# 

[root@db etcd]# systemctl start hghac-vip   <--此时启动HAC，集群信息显示正常

[root@db etcd]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list

+ Cluster: ha (7076286699020760566) ----+---------+----+-----------+-----------------+

| Member | Host                | Role   | State   | TL | Lag in MB | Pending restart |

+--------+---------------------+--------+---------+----+-----------+-----------------+

| hghaca | 192.168.80.111:5866 | Leader | running |  2 |           | *               |

+--------+---------------------+--------+---------+----+-----------+-----------------+

[root@db etcd]#

启动其他节点的HAC，结果如下：

[root@db etcd]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list

+ Cluster: ha (7076286699020760566) -----+---------+----+-----------+-----------------+

| Member | Host                | Role    | State   | TL | Lag in MB | Pending restart |

+--------+---------------------+---------+---------+----+-----------+-----------------+

| hghaca | 192.168.80.111:5866 | Leader  | running |  2 |           | *               |

| hghacb | 192.168.80.112:5866 | Replica | running |  2 |         0 | *               |

| hghacc | 192.168.80.113:5866 | Replica | running |  2 |         0 | *               |

+--------+---------------------+---------+---------+----+-----------+--------------以上是关于HAC集群中，计划重新初始化数据库使用原集群配置的操作方法的主要内容，如果未能解决你的问题，请参考以下文章