数据库服务是否运行及运行状态的检查方法
Posted 瀚高PG实验室
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据库服务是否运行及运行状态的检查方法相关的知识,希望对你有一定的参考价值。
目录
文档用途
详细信息
相关文档
本文主要介绍检查数据库单机, 流复制, repmgr集群服务是否运行及检查运行状态的方法。
注意事项:
该文档涉及到recovery.conf或done文件只是针对安全版V4.3.X或企业版V5版本,之后的安全版V4.5.X或企业版V6版本不再有recovery.conf或done文件
详细信息
一、单机版数据库的运行状态查看
1、通过数据库自带的操作系统命令 pg_ctl 查看数据库运行状态
数据库正常运行显示如下:
[highgo@localhost ~]$ pg_ctl status
pg_ctl: server is running (PID: 7482)
/opt/HighGoDB-5.6.4/bin/postgres “-D” “/opt/HighGoDB-5.6.4/data”
数据库关闭状态显示如下:
[highgo@localhost ~]$ pg_ctl status
pg_ctl: no server running
2、通过数据库自带的操作系统命令 pg_controldata 查看数据库运行状态
数据库正常运行显示如下:
[highgo@localhost ~]$ pg_controldata
…
Database cluster state: in production
…
数据库关闭状态显示如下:
[highgo@localhost ~]$ pg_controldata
…
Database cluster state: shut down
…
3、通过后台进程看数据库运行状态
正常运行的数据库会存在如下进程
[highgo@localhost ~]$ ps -ef | grep -v grep | grep postgres
highgo 8002 1 0 10:41 pts/0 00:00:00 /opt/HighGoDB-5.6.4/bin/postgres
highgo 8007 8002 0 10:41 ? 00:00:00 postgres: logger process
highgo 8009 8002 0 10:41 ? 00:00:00 postgres: checkpointer process
highgo 8010 8002 0 10:41 ? 00:00:00 postgres: writer process
highgo 8011 8002 0 10:41 ? 00:00:00 postgres: wal writer process
highgo 8012 8002 0 10:41 ? 00:00:00 postgres: autovacuum launcher process
highgo 8013 8002 0 10:41 ? 00:00:00 postgres: archiver process
highgo 8014 8002 0 10:41 ? 00:00:00 postgres: stats collector process
highgo 8015 8002 0 10:41 ? 00:00:00 postgres: bgworker: pg_wait_sampling collector
highgo 8017 8002 0 10:41 ? 00:00:00 postgres: bgworker: performance diagnosis collector
highgo 8018 8002 0 10:41 ? 00:00:00 postgres: bgworker: hg_job_scheduler
highgo 8019 8002 0 10:41 ? 00:00:00 postgres: bgworker: logical replication launcher
ps:(1)显示内容第一列代表数据库在哪个用户底下运行,上方显示数据库运行在highgo用户底下。
(2)显示内容第一行代表数据库运行的主进程。
4、通过文件系统查看数据库是否处于运行状态
数据库处于运行状态时会存在下列文件:
[highgo@localhost data]$ ls -lrth | grep postmaster.pid
-rw------- 1 highgo highgo 82 Jul 23 10:41 postmaster.pid
数据库处于关闭状态时:
[highgo@localhost data]$ ls -lrth | grep postmaster
ps: (1)运行上方命令需要进到数据库data目录,命令:cd $PGDATA
(2)但是如果数据库由于异常宕机,可能会存在残留进程,所以不能仅凭这一条来判定数据库处于正常运行状态
(3)如果查看数据库处于运行状态,但是连接不到数据库,查看进程存在很少,有可能是数据库异常宕机,此时需要删除掉数据目录中的 postmaster.pid 文件并重启数据库
二、主备流复制数据库运行状态的查询
1、通过数据库自带的操作系统命令 pg_ctl 查看数据库运行状态
数据库正常运行显示如下:
[highgo@localhost ~]$ pg_ctl status
pg_ctl: server is running (PID: 7465)
/opt/HighGoDB-5.6.4/bin/postgres “-D” “/opt/HighGoDB-5.6.4/data”
数据库关闭状态显示如下:
[highgo@localhost ~]$ pg_ctl status
pg_ctl: no server running
2、通过数据库自带的操作系统命令 pg_controldata 查看数据库运行状态
流复制主库正常运行时:
[highgo@node1 ~]$ pg_controldata
pg_control version number: 1002
Catalog version number: 201707211
Database system identifier: 6837288046332662480
Database cluster state: in production
…
流复制主库处于关闭状态时:
[highgo@node1 ~]$ pg_controldata
pg_control version number: 1002
Catalog version number: 201707211
Database system identifier: 6837288046332662480
Database cluster state: shut down
…
流复制备库正常运行时:
[highgo@node2 ~]$ pg_controldata
pg_control version number: 1002
Catalog version number: 201707211
Database system identifier: 6837288046332662480
Database cluster state: in archive recovery
…
流复制备库处于关闭状态时:
[highgo@node2 ~]$ pg_controldata
pg_control version number: 1002
Catalog version number: 201707211
Database system identifier: 6837288046332662480
Database cluster state: shut down in recovery
…
3、通过后台进程看数据库运行状态
流复制主库正常运行时:
[highgo@node1 ~]$ ps -ef | grep postgres
highgo 8038 1 0 11:07 pts/0 00:00:00 /opt/HighGoDB-5.6.4/bin/postgres
highgo 8043 8038 0 11:07 ? 00:00:00 postgres: logger process
highgo 8045 8038 0 11:07 ? 00:00:00 postgres: checkpointer process
highgo 8046 8038 0 11:07 ? 00:00:00 postgres: writer process
highgo 8047 8038 0 11:07 ? 00:00:00 postgres: wal writer process
highgo 8048 8038 0 11:07 ? 00:00:00 postgres: autovacuum launcher process
highgo 8049 8038 0 11:07 ? 00:00:00 postgres: archiver process
highgo 8050 8038 0 11:07 ? 00:00:00 postgres: stats collector process
highgo 8051 8038 0 11:07 ? 00:00:00 postgres: bgworker: pg_wait_sampling collector
highgo 8053 8038 0 11:07 ? 00:00:00 postgres: bgworker: performance diagnosis collector
highgo 8054 8038 0 11:07 ? 00:00:00 postgres: bgworker: logical replication launcher
highgo 8055 8038 0 11:07 ? 00:00:00 postgres: wal sender process hgreplica 192.168.179.161(35068) streaming 0/4A000098
流复制备库正常运行时:
[highgo@node2 ~]$ ps -ef | grep postgres
highgo 8004 1 1 11:07 pts/0 00:00:00 /opt/HighGoDB-5.6.4/bin/postgres
highgo 8009 8004 0 11:07 ? 00:00:00 postgres: logger process
highgo 8010 8004 0 11:07 ? 00:00:00 postgres: startup process recovering 00000001000000000000004A
highgo 8011 8004 0 11:07 ? 00:00:00 postgres: checkpointer process
highgo 8012 8004 0 11:07 ? 00:00:00 postgres: writer process
highgo 8013 8004 0 11:07 ? 00:00:00 postgres: stats collector process
highgo 8014 8004 0 11:07 ? 00:00:00 postgres: bgworker: pg_wait_sampling collector
highgo 8015 8004 1 11:07 ? 00:00:00 postgres: wal receiver process streaming 0/4A000098
从上述加粗的内容可以看到,主备库的 wal 进程是有区别的,流复制中主库会向备库发送 wal 日志,所以主库存在 wal sender 进程,备库则会有 wal receiver 进程
ps:
(1)也可以直接进行 wal 进程的查看来进行主备库的区分
流复制主库显示如下:
[highgo@node1 ~]$ ps -ef | grep wal
highgo 8047 8038 0 11:07 ? 00:00:00 postgres: wal writer process
highgo 8055 8038 0 11:07 ? 00:00:00 postgres: wal sender process hgreplica 192.168.179.161(35068) streaming 0/4A00C5B8
流复制备库显示如下:
[highgo@node2 ~]$ ps -ef | grep wal
highgo 8015 8004 0 11:07 ? 00:00:00 postgres: wal receiver process streaming 0/4A00C5B8
(2)另外,如果存在级联备库(向流复制备库请求 wal ),则流复制备库上也会存在 wal sender 进程
4、通过文件系统查看数据库的运行状态
(1)数据库是否处于运行状态:
数据库处于运行状态时会存在下列文件:
[highgo@localhost data]$ ls -lrth | grep postmaster.pid
-rw------- 1 highgo highgo 82 Jul 23 10:41 postmaster.pid
数据库处于关闭状态时:
[highgo@localhost data]$ ls -lrth | grep postmaster
(2)主备库的差别
备库存在 recovery.conf 文件
[highgo@node2 data]$ ls -lrth | grep recovery
-rw-rw-r-- 1 highgo highgo 375 Jun 17 10:12 recovery.conf
如果主备流复制已经做过至少一次主备切换,主库上会存在 recovery.done 文件
[highgo@node2 data]$ ls -lrth | grep recovery
-rw-rw-r-- 1 highgo highgo 375 Jun 15 10:12 recovery.done
三、repmgr 数据库集群运行状态的查询
1、通过数据库自带的操作系统命令 pg_ctl 查看数据库运行状态
数据库正常运行显示如下:
[highgo@node1 ~]$ pg_ctl status
pg_ctl: server is running (PID: 11727)
/opt/HighGo5.6.5-cluster/bin/postgres “-D” “/opt/HighGo5.6.5-cluster/data”
数据库关闭状态显示如下:
[highgo@localhost ~]$ pg_ctl status
pg_ctl: no server running
2、通过数据库自带的操作系统命令 pg_controldata 查看数据库运行状态
集群主库正常运行状态
[highgo@node2 ~]$ pg_controldata
pg_control version number: 1002
Catalog version number: 201707211
Database system identifier: 6836156492751540059
Database cluster state: in production
…
集群备库正常运行状态
[highgo@node1 ~]$ pg_controldata
pg_control version number: 1002
Catalog version number: 201707211
Database system identifier: 6836156492751540059
Database cluster state: in archive recovery
…
集群主库处于关闭状态时:
[highgo@node2 ~]$ pg_controldata
pg_control version number: 1002
Catalog version number: 201707211
Database system identifier: 6836156492751540059
Database cluster state: shut down
…
集群备库处于关闭状态时:
[highgo@node2 ~]$ pg_controldata
pg_control version number: 1002
Catalog version number: 201707211
Database system identifier: 6836156492751540059
Database cluster state: shut down in recovery
3、通过后台进程看数据库运行状态
集群主库正常运行时:
[highgo@node2 ~]$ ps -ef | grep postgres
highgo 19930 1 0 14:21 pts/1 00:00:00 /opt/HighGo5.6.5-cluster/bin/postgres
highgo 19934 19930 0 14:21 ? 00:00:00 postgres: logger process
highgo 19936 19930 0 14:21 ? 00:00:00 postgres: checkpointer process
highgo 19937 19930 0 14:21 ? 00:00:00 postgres: writer process
highgo 19938 19930 0 14:21 ? 00:00:00 postgres: wal writer process
highgo 19939 19930 0 14:21 ? 00:00:00 postgres: autovacuum launcher process
highgo 19940 19930 0 14:21 ? 00:00:00 postgres: archiver process
highgo 19941 19930 0 14:21 ? 00:00:00 postgres: stats collector process
highgo 19942 19930 0 14:21 ? 00:00:00 postgres: bgworker: pg_wait_sampling collector
highgo 19944 19930 0 14:21 ? 00:00:00 postgres: bgworker: performance diagnosis collector
highgo 19945 19930 0 14:21 ? 00:00:00 postgres: bgworker: logical replication launcher
highgo 19946 19930 0 14:21 ? 00:00:00 postgres: wal sender process highgo 192.168.179.153(63390) streaming 0/27023650
集群备库正常运行时:
[highgo@node1 ~]$ ps -ef | grep postgres
highgo 13966 1 0 14:20 pts/1 00:00:00 /opt/HighGo5.6.5-cluster/bin/postgres
highgo 13970 13966 0 14:20 ? 00:00:00 postgres: logger process
highgo 13971 13966 0 14:20 ? 00:00:00 postgres: startup process recovering 000000060000000000000027
highgo 13972 13966 0 14:20 ? 00:00:00 postgres: checkpointer process
highgo 13973 13966 0 14:20 ? 00:00:00 postgres: writer process
highgo 13974 13966 0 14:20 ? 00:00:00 postgres: stats collector process
highgo 13975 13966 0 14:20 ? 00:00:00 postgres: bgworker: pg_wait_sampling collector
highgo 13981 13966 0 14:21 ? 00:00:00 postgres: wal receiver process streaming 0/27023650
ps: 也可以使用流复制中所描述的另一个命令(ps -ef | grep wal)来看 wal 进程
4、通过文件系统查看数据库的运行状态
(1)数据库是否处于以运行状态:
数据库处于运行状态时会存在下列文件:
[highgo@localhost data]$ ls -lrth | grep postmaster.pid
-rw------- 1 highgo highgo 82 Jul 23 10:41 postmaster.pid
数据库处于关闭状态时:
[highgo@localhost data]$ ls -lrth | grep postmaster
(2)主库备库的判断
没有做过主备切换时,备库中只有一个 recovery.conf 文件
做过主备切换后,主备库都会有 recovery.conf 和 recovery.done 两个文件
[highgo@node1 data]$ ls -lrth | grep recovery
-rw------- 1 highgo highgo 229 Jul 23 09:10 recovery.done
-rw------- 1 highgo highgo 229 Jul 23 09:12 recovery.conf
主库中 recovery.conf 时间早于 recovery.done; 备库中 recovery.done 早于 recovery.conf
可以通过查看数据库集群状态的命令;如果主备库显示一致且如下所示,说明集群状态正常
在 status 一栏中出现 * 表示正常状态,如果出现了其他的符号(比如 !,?),表明集群状态出现了问题
[highgo@node2 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Replication lag | Last replayed LSN
----±----------------±--------±----------±----------------±---------±---------±----------------±------------------
1 | 192.168.179.153 | standby | running | 192.168.179.154 | default | 100 | 0 bytes | 0/2703D720
2 | 192.168.179.154 | primary | * running | | default | 100 | n/a | none
如果两个数据库中都出现了 recovery.conf 时间早于 recovery.done 的情况,数据库集群出错出现了双主
出现双主的例子(status 中出现了 !,表明集群状态不正确)
[highgo@node2 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Replication lag | Last replayed LSN
----±----------------±--------±---------------------±----------------±---------±---------±----------------±------------------
1 | 192.168.179.153 | standby | ! running as primary | 192.168.179.154 | default | 100 | 624 MB | 0/2703D720
2 | 192.168.179.154 | primary | * running | | default | 100 | n/a | none
WARNING: following issues were detected
- node “192.168.179.153” (ID: 1) is registered as standby but running as primary
更多详细信息请登录【瀚高技术支持平台】查看https://support.highgo.com/#/index/docContent/4d39c9fe03ed984d
以上是关于数据库服务是否运行及运行状态的检查方法的主要内容,如果未能解决你的问题,请参考以下文章
使用linux脚本shell检查大数据各节点服务是否运行正常