Oracle集群(RAC)时间同步(ntp和CTSS)

Posted ^_^小麦苗^_^

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Oracle集群(RAC)时间同步(ntp和CTSS)相关的知识,希望对你有一定的参考价值。

Oracle集群(RAC)时间同步(ntp和CTSS)




http://blog.itpub.net/26736162/viewspace-2157130/

 

crsctl stat res -t -init

ps -ef|grep ctss

crsctl check ctss

cluvfy comp clocksync -n all -verbose

 

 crsctl start res ora.ctssd -init

 crsctl stop res ora.ctssd -init

 

 

Network Time Protocol Setting

You have two options for time synchronization: an operating system configured network time protocol (NTP), or Oracle Cluster Time Synchronization Service.

Oracle Cluster Time Synchronization Service is designed for organizations whose cluster servers are unable to access NTP services.

If you use NTP, then the Oracle Cluster Time Synchronization daemon (ctssd) starts up in observer mode. If you do not have NTP daemons, then ctssd starts up in active mode and synchronizes time among cluster members without contacting an external time server..

 

可以采用操作系统的NTP服务,也可以使用Oracle自带的服务ctss,如果ntp没有启用,那么Oracle会自动启用自己的ctssd进程。

oracle 11gR2 RAC开始使用Cluster Time Synchronization Service(CTSS)同步各节点的时间,当安装程序发现NTP协议处于非活动状态时,安装集群时间同步服务将以活动模式(active)自动进行安装并同步所有节点的时间。如果发现配置了 NTP,则以观察者模式(observer mode)启动集群时间同步服务,Oracle Clusterware不会在集群中进行活动的时间同步。

RAC中,集群的时间应该是保持同步的,否则可能导致很多问题,例如:依赖于时间的应用会造成数据的错误,各种日志打印的顺序紊乱,这将会影响问题的诊断,严重的可能会导致集群宕机或者重新启动集群时节点无法加入集群。

Oracle 11gR2前,集群的时间是由NTP同步的,而在11gR2后,Oracle引入了CTSS组件,如果系统没有配置NTP,则由CTSS来同步集群时间。

NTP和CTSS是可以共存的,且NTP的优先级要高于CTSS,也就是说,如果系统中同时有NTPCTSS,那么集群的时间是由NTP同步的,CTSS会处于观望(Observer)模式,只有当集群关闭所有的NTP服务CTSS才会处于激活(Active)模式。在一个集群中,只要有一个节点的ntp处于活动状态,那么集群的所有节点的CTSS都会处于激活(Active)模式。

需要注意的是,要让CTSS处于激活(Active)模式,则不仅要关闭ntp服务(/sbin/service ntpd stop),还要删除/etc/ntp.conf文件(mv /etc/ntp.conf /etc/ntp.conf.bak),否则不能启用CTSS

 

1.1.1      CTSS同步模式

关闭NTP

/sbin/service ntpd stop

mv /etc/ntp.conf /etc/ntp.conf.bak

service ntpd status

chkconfig ntpd off

 

[root@raclhr-11gR2-N2 ~]# ps -ef|grep ctss

root     19678     1  0 19:22 ?        00:00:02 /u01/app/11.2.0/grid/bin/octssd.bin reboot

root     20970 20623  0 19:35 pts/4    00:00:00 grep ctss

[root@raclhr-11gR2-N2 ~]#

[root@raclhr-11gR2-N2 ~]# crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS      

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       raclhr-11gr2-n2          Started            

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.crf

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.crsd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.cssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.cssdmonitor

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.ctssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2          ACTIVE:0           

ora.diskmon

      1        OFFLINE OFFLINE                                                  

ora.evmd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.gipcd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.gpnpd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.mdnsd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

[root@raclhr-11gR2-N2 ~]#

 

节点1ctss状态:

[root@raclhr-11gR2-N1 ~]# crsctl check ctss

CRS-4701: The Cluster Time Synchronization Service is in Active mode.

CRS-4702: Offset (in msec): 0

[root@raclhr-11gR2-N1 ~]#

节点1octssd的日志:

/u01/app/11.2.0/grid/log/raclhr-11gr2-n1/ctssd/octssd.log

2018-06-30 19:25:56.369: [    CTSS][899475200]sclsctss_gvss2: NTP default pid file not found

2018-06-30 19:25:56.369: [    CTSS][899475200]sclsctss_gvss8: Return [0] and NTP status [1].

2018-06-30 19:25:56.369: [    CTSS][899475200]ctss_check_vendor_sw: Vendor time sync software is not detected. status [1].

2018-06-30 19:25:57.002: [    CTSS][916338432]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xcc], offset[0 ms]}, length=[8].

2018-06-30 19:26:01.263: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 19:26:01.264: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 19:26:01.264: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [2] hostname [raclhr-11gr2-n2] )

2018-06-30 19:26:09.267: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

节点1octssd.log中记录没有发现ntp服务,ctss服务为激活模式。

 

节点2ctss状态:

[root@raclhr-11gR2-N2 ~]# crsctl check ctss

CRS-4701: The Cluster Time Synchronization Service is in Active mode.

CRS-4702: Offset (in msec): 0

[root@raclhr-11gR2-N2 ~]#

节点2octssd的日志:

/u01/app/11.2.0/grid/log/raclhr-11gr2-n2/ctssd/octssd.log

2018-06-30 19:28:49.539: [    CTSS][839321344]sclsctss_gvss2: NTP default pid file not found

2018-06-30 19:28:49.539: [    CTSS][839321344]sclsctss_gvss8: Return [0] and NTP status [1].

2018-06-30 19:28:49.539: [    CTSS][839321344]ctss_check_vendor_sw: Vendor time sync software is not detected. status [1].

2018-06-30 19:29:05.544: [    CTSS][839321344]ctsselect_msm: CTSS mode is [0xc4]

2018-06-30 19:29:05.544: [    CTSS][839321344]ctssslave_swm1_2: Ready to initiate new time sync process.

2018-06-30 19:29:05.545: [    CTSS][839321344]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].

2018-06-30 19:29:05.546: [    CTSS][845625088]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].

2018-06-30 19:29:05.546: [    CTSS][845625088]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm2_3: Received time sync message from master.

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm: The system time difference is too small [243] usec. Not adjusting time.

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm17: LT [1530358145sec 546888usec], MT [1530358145sec 140655884523349usec], Delta [2314usec]

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm19: The offset is [243 usec] and sync interval set to [1]

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm: Received from master (mode [0xcc] nodenum [1] hostname [raclhr-11gr2-n1] )

2018-06-30 19:29:05.547: [    CTSS][839321344]ctsselect_msm: Sync interval returned in [1]

2018-06-30 19:29:05.547: [    CTSS][845625088]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler

2018-06-30 19:29:07.910: [    CTSS][860387072]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xc4], offset[0 ms]}, length=[8].

节点2octssd.log中记录没有发现ntp服务,ctss服务为激活模式,同步时间的主节点是节点1,并且会告诉集群的时间有差异,但是因为差异过小,无需调整。

 

校验集群的时间:

 cluvfy comp clocksync -n all -verbose

 

虽然集群时间不一致,但是这种情况下校验结果是通过的,而且略微的差异范围内集群也会自动同步回来。

[grid@raclhr-11gR2-N1 ~]$  cluvfy comp clocksync -n all -verbose

 

Verifying Clock Synchronization across the cluster nodes

 

Checking if Clusterware is installed on all nodes...

Check of Clusterware install passed

 

Checking if CTSS Resource is running on all nodes...

Check: CTSS Resource running on all nodes

  Node Name                             Status                 

  ------------------------------------  ------------------------

  raclhr-11gr2-n2                       passed                 

  raclhr-11gr2-n1                       passed                  

Result: CTSS resource check passed

 

 

Querying CTSS for time offset on all nodes...

Result: Query of CTSS for time offset passed

 

Check CTSS state started...

Check: CTSS state

  Node Name                             State                  

  ------------------------------------  ------------------------

  raclhr-11gr2-n2                       Active                 

  raclhr-11gr2-n1                       Active                 

CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...

Reference Time Offset Limit: 1000.0 msecs

Check: Reference Time Offset

  Node Name     Time Offset               Status                 

  ------------  ------------------------  ------------------------

  raclhr-11gr2-n2  0.0                       passed                 

  raclhr-11gr2-n1  0.0                       passed                 

 

Time offset is within the specified limits on the following set of nodes:

"[raclhr-11gr2-n2, raclhr-11gr2-n1]"

Result: Check of clock time offsets passed

 

 

Oracle Cluster Time Synchronization Services check passed

 

Verification of Clock Synchronization across the cluster nodes was successful.

 

1.1.2      NTP同步模式

开启NTP:

mv /etc/ntp.conf.bak /etc/ntp.conf

service ntpd status

/sbin/service ntpd start

# chkconfig ntpd off

ps -ef|grep ntp

 

节点1

[root@raclhr-11gR2-N1 ~]# crsctl check ctss

CRS-4700: The Cluster Time Synchronization Service is in Observer mode.

 

[root@raclhr-11gR2-N1 ~]#  crsctl stat res -t -init

ora.ctssd

      1        ONLINE  ONLINE       raclhr-11gr2-n1          OBSERVER           

 

节点1ctss日志:

/u01/app/11.2.0/grid/log/raclhr-11gr2-n1/ctssd/octssd.log

2018-06-30 20:51:29.388: [    CTSS][899475200]sclsctss_gvss1: NTP default config file found

2018-06-30 20:51:29.389: [    CTSS][899475200]sclsctss_gvss8: Return [0] and NTP status [2].

2018-06-30 20:51:29.389: [    CTSS][899475200]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].

2018-06-30 20:51:29.389: [    CTSS][899475200]ctss_check_vendor_sw: Ctssd is switching to observer role

2018-06-30 20:51:29.389: [    CTSS][899475200]clsctsselect_update_mbrdata: Updating pridata: { version[1] node[1] swversion[186647296] mode[0xee] }.

2018-06-30 20:51:29.639: [  CRSCCL][671086336]clsCclGetPriMemberData: Detected pridata change for node[1]. Retrieving it to the cache.

2018-06-30 20:51:31.434: [    CTSS][916338432]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xee], offset[0 ms]}, length=[8].

2018-06-30 20:51:35.258: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 20:51:35.258: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 20:51:35.259: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [2] hostname [raclhr-11gr2-n2] )

2018-06-30 20:51:35.656: [  CRSCCL][671086336]clsCclGetPriMemberData: Detected pridata change for node[2]. Retrieving it to the cache.

2018-06-30 20:51:43.240: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 20:51:43.240: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 20:51:43.240: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )

2018-06-30 20:51:51.217: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 20:51:51.217: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 20:51:51.218: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )

2018-06-30 20:51:59.194: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 20:51:59.194: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 20:51:59.195: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )

节点1octssd.log中记录发现ntp服务,ctss服务会自动切换到观望模式。

 

2018-06-30 20:57:27.608: [    CTSS][839321344]ctsselect_msm: CTSS mode is [0xc6]

2018-06-30 20:57:27.608: [    CTSS][839321344]ctssslave_swm1_2: Ready to initiate new time sync process.

2018-06-30 20:57:27.609: [    CTSS][839321344]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].

2018-06-30 20:57:27.612: [    CTSS][845625088]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].

2018-06-30 20:57:27.613: [    CTSS][845625088]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].

2018-06-30 20:57:27.613: [    CTSS][839321344]ctssslave_swm2_3: Received time sync message from master.

2018-06-30 20:57:27.613: [    CTSS][839321344]ctssslave_swm17: LT [1530363447sec 613028usec], MT [1530363447sec 140655884569984usec], Delta [4410usec]

2018-06-30 20:57:27.613: [    CTSS][839321344]ctssslave_swm19: The offset is [19748 usec] and sync interval set to [1]

2018-06-30 20:57:27.613: [    CTSS][839321344]ctssslave_swm: Received from master (mode [0xee] nodenum [1] hostname [raclhr-11gr2-n1] )

2018-06-30 20:57:27.613: [    CTSS][839321344]ctsselect_msm: Sync interval returned in [1]

2018-06-30 20:57:27.613: [    CTSS][845625088]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler

节点2octssd.log中也会记录发现ntp服务,ctss服务为观望模式,并且同步时间的主节点是节点1

 

 

 

1.1.3      模拟集群时间不一致

如果在我们生产系统中碰到集群时间不一致会导致什么结果,我们的排查思路是怎么样的,以下是模拟集群时间不一致的场景。

更改节点2的时间,向后推移2天:

将系统时间设定成20180702日的命令如下:

#date -s 07/02/2018

将系统时间设定成下午232306秒的命令如下。

#date -s 23:23:06

 

[root@raclhr-11gR2-N2 ctssd]# crsctl stat res -t -init

ora.ctssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2          ACTIVE:172768000   

[root@raclhr-11gR2-N2 ctssd]# crsctl check ctss

CRS-4701: The Cluster Time Synchronization Service is in Active mode.

CRS-4702: Offset (in msec): 172768000

172768000微妙大约为2:

SYS@lhrrac11> select 172768000/1000/24/60/60 from dual;

 

172768000/1000/24/60/60

-----------------------

             1.99962963

 

更改节点2的时间后,在ASM和DB的alert日志中产生了以下的告警信息:

Time drift detected. Please check VKTM trace file for more details.

 

生产环境oracle rac集群搭建前期准备之NTP服务搭建

oracle rac 安装 PRVG-13606 ntp 同步报错解决过程

oracle 11g r2 rac linux下 ntp问题

oracle rac是啥

配置Linux 11G R2 RAC NTP服务

Oracle RAC 11.2.0.4 CTSS 状态异常