lightdb-分布式物理备份脚本
Posted 紫无之紫
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了lightdb-分布式物理备份脚本相关的知识,希望对你有一定的参考价值。
lightdb 分布式物理备份脚本v1.0
lightdb 从lightdb22.4 开始支持分布式物理备份, 提供lt_distributed_probackup.py脚本对分布式集群进行一键备份,不再需要对每个节点执行lt_probackup命令,但不保证某一次备份各节点间数据的一致性(不恢复到最新,只恢复到某一次备份)。
简介
lt_distributed_probackup.py 通过指定cn节点信息,可以实现对整个分布式集群的备份,在原集群异常时对原集群恢复直接恢复即可,恢复到新集群需要指定新的cn和dn信息。
备份恢复流程
此流程基于持续归档,归档模式备份
环境信息
backup server: 192.168.247.126;
backup dir: /home/lightdb/backup
coordinator info: 192.168.247.127:54332
coordinator data dir: /home/lightdb/data
datanode1 info: 192.168.247.128:54332
datanode2 info: 192.168.247.129:54332
new cluster(cn;dn1;dn2): 192.168.247.130;192.168.247.131;192.168.247.132
流程
-
需要配置免密,根据使用方式不同可能需要配置对本机的免密,本机ip 和127.0.0.1
-
初始化备份目录
lt_distributed_probackup.py init -B /home/lightdb/backup
-
添加实例
lt_distributed_probackup.py add-instance -B /home/lightdb/backup -D /home/lightdb/data --instance cn -h192.168.247.128 -p5432
-
配置持续归档
# cn archive_mode=on archive_command="/home/lightdb/lightdb-x/13.8-22.4/bin/lt_probackup archive-push -B /home/lightdb/backup --instance=cn --wal-file-name=%f --remote-host=192.168.247.126" # dn1 --instance 的值在上述添加实例(add-instance)时获取, 或者自己根据规则计算,具体看下面的note archive_mode=on archive_command="/home/lightdb/lightdb-x/13.8-22.4/bin/lt_probackup archive-push -B /home/lightdb/backup --instance=cn_dn_1 --wal-file-name=%f --remote-host=192.168.247.126"
-
备份分布式集群
lt_distributed_probackup.py backup -B /home/lightdb/backup -D /home/lightdb/data --instance cn -h192.168.247.128 -p5432 -b full --parallel-num=1
-
原集群上恢复
# 原集群 lt_distributed_probackup.py restore -B /home/lightdb/backup --instance cn --parallel-num=1 # 新集群 list 顺序为cn;dn1;dn2 , dn1 为node_id 小的dn节点(pg_dist_node) lt_distributed_probackup.py restore -B /home/lightdb/backup --instance cn --parallel-num=1 --remote-host='192.168.247.130;192.168.247.131;192.168.247.132'
-
checkdb 检测数据库物理文件准确性
lt_distributed_probackup.py checkdb -B /home/lightdb/backup --instance cn lt_distributed_probackup.py checkdb -h10.19.70.50 -p54332
-
设置config
lt_distributed_probackup.py set-config -B /home/lightdb/backup --instance cn --archive-timeout=6min
-
查看配置
lt_distributed_probackup.py show-config -B /home/lightdb/backup --instance cn
-
检测备份文件
lt_distributed_probackup.py validate -B /home/lightdb/backup --instance cn
-
查看备份
# 展示每次分布式备份总体情况, 其中status 为总体情况, id 为分布式backupid, 其他字段为cn的backup的属性 lt_distributed_probackup.py show -B /home/lightdb/backup --instance cn [lightdb@lightdb bin]$ lt_distributed_probackup.py show -B /data/lightdb/chuhx/backup --instance cn ================================================================================================================================================= Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Status ================================================================================================================================================= cn 13 DISTRIBUTED_RO1W1T 2023-01-06 14:15:40+08 PAGE ARCHIVE 1/1 14s 60MB 512MB 1.00 A/20000028 A/4001CD38 ERROR cn 13 DISTRIBUTED_RO03Y6 2023-01-05 15:10:59+08 PAGE ARCHIVE 1/1 7s 3698kB 512MB 1.00 9/60000120 9/8000B3D0 OK cn 13 DISTRIBUTED_RO03UR 2023-01-05 15:09:02+08 FULL ARCHIVE 1/0 13s 938MB 512MB 1.00 8/A00000B8 8/C0064920 OK # 展示dn 的backup 属性 使用--detail 选项 [lightdb@lightdb bin]$ lt_distributed_probackup.py show -B /data/lightdb/chuhx/backup --instance cn --detail ********Summary********* BACKUP INSTANCE 'cn' ---------------- ===================================================================================================================================== Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Status ===================================================================================================================================== cn 13 RO1W1T 2023-01-06 14:15:40+08 PAGE ARCHIVE 1/1 14s 60MB 512MB 1.00 A/20000028 A/4001CD38 OK cn 13 RO03Y6 2023-01-05 15:10:59+08 PAGE ARCHIVE 1/1 7s 3698kB 512MB 1.00 9/60000120 9/8000B3D0 OK cn 13 RO03UR 2023-01-05 15:09:02+08 FULL ARCHIVE 1/0 13s 937MB 512MB 1.00 8/A00000B8 8/C0064920 OK BACKUP INSTANCE 'cn_dn_2' ---------------- ===================================================================================================================================== Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Status ===================================================================================================================================== cn_dn_2 13 RO1W27 ---- PAGE ARCHIVE 1/1 3s 0 0 1.00 8/E0000028 0/0 ERROR cn_dn_2 13 RO03YD 2023-01-05 15:11:09+08 PAGE ARCHIVE 1/1 10s 3706kB 512MB 1.00 8/20000028 8/4000C468 OK cn_dn_2 13 RO03V5 2023-01-05 15:09:16+08 FULL ARCHIVE 1/0 13s 937MB 512MB 1.00 7/600000F8 7/8001C8D0 OK BACKUP INSTANCE 'cn_dn_3' ---------------- ===================================================================================================================================== Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Status ===================================================================================================================================== cn_dn_3 13 RO1W2A 2023-01-06 14:15:56+08 PAGE ARCHIVE 1/1 13s 61MB 512MB 1.00 8/A0000028 8/C00158E8 OK cn_dn_3 13 RO03YN 2023-01-05 15:11:17+08 PAGE ARCHIVE 1/1 9s 3930kB 512MB 1.00 7/E0000028 8/124C0 OK cn_dn_3 13 RO03VJ 2023-01-05 15:09:28+08 FULL ARCHIVE 1/0 13s 937MB 512MB 1.00 7/200000F8 7/4001B920 OK
-
删除备份文件
lt_distributed_probackup.py delete -B /home/lightdb/backup --instance cn --delete-expired --retention-redundancy=1
-
删除实例
lt_distributed_probackup.py del-instance -B /home/lightdb/backup --instance cn
-
merge, set-backup 只对具体备份文件起效,与lt_probackup一致
lt_distributed_probackup.py merge -B /home/lightdb/backup --instance cn -i RMICH6 -j 2 --progress --no-validate --no-sync lt_distributed_probackup.py set-backup -B /home/lightdb/backup --instance cn_dn_2 -i RMI6YR --note='cn_dn_2'
note
-
dn节点的instance name 根据cn 指定的instance name 生成,格式如下:cn_name_db_id, id 为pg_dist_node中的node_id.
-
在添加新节点时,通过调用add-instance --no_distribution来添加实例。
-
--backup-id
可以指定为分布式backupid, 通过show命令可以看到分布式backupid -
如果报如下错误,可以通过修改sshd 的配置来解决, 调大/etc/ssh/sshd_config中的MaxStartups配置即可,MaxStartups是用来限制并行认证ssh客户端数量的. 注意是认证的数量,不是登录的数量. 也就是说,已经登录成功的不算在里面. 格式10:30:100 表示超过10个有30%概率失败,到达100一定失败。
ERROR: Agent error: kex_exchange_identification: read: Connection reset by peer
此报错是由于
-j
或--parallel-num
选项设置的过大,导致同时发起ssh 连接,导致并行认证数过多。 -
使用init命令时, 如果没有权限创建目录, 命令会执行成功, 实际目录没有创建。
以上是关于lightdb-分布式物理备份脚本的主要内容,如果未能解决你的问题,请参考以下文章
lightdb_service.py - lightdb一键启停脚本