暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

Oracle集群RAC时间同步(CTSS和NTP)

DBA小记 2020-10-27
4356

RAC集群的时间同步,可以采用操作系统的NTP服务,也可以使用Oracle自带的服务CTSS,如果NTP没有启用,那么Oracle会自动启用自己的ctssd进程。

从Oracle 11gR2 RAC开始使用Cluster Time Synchronization Service(CTSS)同步各节点的时间。CTSS时间同步服务作为Clusteware的一部分被安装,在系统中,如果察觉到时间同步服务或者时间同步服务配置(NTP),那么CTSS将以观察模式(Oberver Mode)启动和运行,不执行时间同步操作。CTSS守护进程能随时被安装,并将一直运行,但是只有在系统符合配置条件情况下才会起作用。如果NTP不存在于任何的集群服务器中,CTSS将被激活,接管集群的时间管理工作,以活动模式(Active Mode)启动和运行,使用集群其中一个服务器作为参考服务器,同步集群中的其他服务器的时间。

在RAC中,集群的时间应该是保持同步的,否则可能导致很多问题,例如:依赖于时间的应用会造成数据的错误,各种日志打印的顺序紊乱,这将会影响问题的诊断,严重的可能会导致集群宕机或者重新启动集群时节点无法加入集群。

NTP和CTSS是可以共存的,且NTP的优先级要高于CTSS,也就是说,如果系统中同时有NTP和CTSS,那么集群的时间是由NTP同步的,CTSS会处于观望(Observer)模式,只有当集群关闭所有的NTP服务,CTSS才会处于激活(Active)模式。在一个集群中,只要有一个节点的ntp处于活动状态,那么集群的所有节点的CTSS都会处于观望(Observer)模式。

需要注意的是,要让CTSS处于激活(Active)模式,则不仅要关闭ntp服务(/sbin/service ntpd stop),还要删除/etc/ntp.conf文件(也可mv etc/ntp.conf etc/ntp.conf.bak),否则不能启用CTSS。

1、CTSS同步模式

    [root@rac1centorder ~]# service ntpd status
    ntpd 已停
    [root@rac1centorder ~]# ll /etc/ntp.*
    -rw-r--r--. 1 root root 1778 12月 18 2017 /etc/ntp.conf.bak
    [root@rac1centorder ~]# chkconfig --list ntpd
    ntpd 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭
    复制

    查看ctss进程

      [root@rac1centorder ~]# ps -ef |grep ctss
      root 129931 1 0 2019 ? 23:33:10 /u01/app/11.2.0/grid_1/bin/octssd.bin reboot
      root 217171 155615 0 11:24 pts/0 00:00:00 grep ctss
      复制

      查看集群节点1的ctss状态:

        [root@rac1centorder ~]# su - grid
        [grid@rac1centorder ~]$ crsctl check ctss
        CRS-4701: The Cluster Time Synchronization Service is in Active mode.
        CRS-4702: Offset (in msec): 0
        复制

        节点1的octssd的日志:

          [grid@rac1centorder ~]$ tail -30 /u01/app/11.2.0/grid_1/log/rac1centorder/ctssd/octssd.log
          2020-10-26 11:31:19.472: [ CTSS][2587350784]ctssslave_swm: The system time difference is too small [753] usec. Not adjusting time.
          2020-10-26 11:31:19.472: [ CTSS][2587350784]ctssslave_swm17: LT [1603683079sec 472234usec], MT [1603683079sec 140694539155355usec], Delta [1920usec]
          2020-10-26 11:31:19.472: [ CTSS][2587350784]ctssslave_swm19: The offset is [-753 usec] and sync interval set to [1]
          2020-10-26 11:31:19.472: [ CTSS][2587350784]ctssslave_swm: Received from master (mode [0xcc] nodenum [2] hostname [rac2centorder] )
          2020-10-26 11:31:19.472: [ CTSS][2587350784]ctsselect_msm: Sync interval returned in [1]
          2020-10-26 11:31:19.472: [ CTSS][2591553280]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
          2020-10-26 11:31:27.472: [ CTSS][2587350784]ctsselect_msm: CTSS mode is [0xc4]
          2020-10-26 11:31:27.472: [ CTSS][2587350784]ctssslave_swm1_2: Ready to initiate new time sync process.
          2020-10-26 11:31:27.473: [ CTSS][2587350784]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].
          2020-10-26 11:31:27.474: [ CTSS][2591553280]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].
          2020-10-26 11:31:27.474: [ CTSS][2591553280]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].
          2020-10-26 11:31:27.474: [ CTSS][2587350784]ctssslave_swm2_3: Received time sync message from master.
          复制

          查看集群节点2的ctss状态:

            [grid@rac2centorder ~]$ crsctl check ctss
            CRS-4701: The Cluster Time Synchronization Service is in Active mode.
            CRS-4702: Offset (in msec): 0
            复制

            节点2的octssd的日志:

              [grid@rac2centorder ~]$ tail -30 /u01/app/11.2.0/grid_1/log/rac2centorder/ctssd/octssd.log
              2020-10-26 11:35:03.532: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [1] hostname [rac1centorder] )
              2020-10-26 11:35:10.688: [ CTSS][2236086016]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xcc], offset[0 ms]}, length=[8].
              2020-10-26 11:35:11.533: [ CTSS][2221074176]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
              2020-10-26 11:35:11.533: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received sync msg
              2020-10-26 11:35:11.534: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [1] hostname [rac1centorder] )
              2020-10-26 11:35:19.536: [ CTSS][2221074176]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
              2020-10-26 11:35:19.536: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received sync msg
              2020-10-26 11:35:19.536: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [1] hostname [rac1centorder] )
              2020-10-26 11:35:26.720: [ CTSS][2216871680]sclsctss_gvss2: NTP default pid file not found
              2020-10-26 11:35:26.720: [ CTSS][2216871680]sclsctss_gvss8: Return [0] and NTP status [1].
              2020-10-26 11:35:26.720: [ CTSS][2216871680]ctss_check_vendor_sw: Vendor time sync software is not detected. status [1].
              复制

              log中记录没有发现ntp服务,ctss服务为激活模式,同步时间的主节点是节点1,并且会告诉集群的时间有差异,但是因为差异过小,无需调整。

                 检验集群的时间:
                [grid@rac1centorder ~]$  cluvfy comp clocksync -n all -verbose
                Verifying Clock Synchronization across the cluster nodes
                Checking if Clusterware is installed on all nodes...
                Check of Clusterware install passed
                Checking if CTSS Resource is running on all nodes...
                Check: CTSS Resource running on all nodes
                Node Name Status
                ------------------------------------ ------------------------
                rac2centorder passed
                rac1centorder passed
                Result: CTSS resource check passed
                Querying CTSS for time offset on all nodes...
                Result: Query of CTSS for time offset passed
                Check CTSS state started...
                Check: CTSS state
                Node Name State
                ------------------------------------ ------------------------
                rac2centorder Active
                rac1centorder Active
                CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
                Reference Time Offset Limit: 1000.0 msecs
                Check: Reference Time Offset
                Node Name Time Offset Status
                ------------ ------------------------ ------------------------
                rac2centorder 0.0 passed
                  rac1centorder  0.0                       passed                 
                Time offset is within the specified limits on the following set of nodes:
                "[rac2centorder, rac1centorder]"
                Result: Check of clock time offsets passed
                Oracle Cluster Time Synchronization Services check passed
                Verification of Clock Synchronization across the cluster nodes was successful.
                复制

                虽然集群时间不一致,但是这种情况下校验结果是通过的,而且略微的差异范围内集群也会自动同步回来。

                注意:

                (1)CTSS不会把系统时间向前调整,Oracle 10.2 RAC中有向前调整时间引起节点重启的BUG;

                (2)CTSS可以保证节点之间时间同步,但不能保证和外部标准时钟(北京时间)保持一致。

                2、Linux NTP同步模式

                此方法既可以保证节点间同步,又保证了时钟和标准时间同步。

                配置NTP服务:

                修改所有节点/etc/ntp.conf, 192.168.7.2为公司内网时间同步服务器(已和标准时钟同步)。

                  [root@rac1centorder ~]# vi /etc/ntp.conf
                  server 192.168.7.2
                  driftfile /var/lib/ntp/drift
                  broadcastdelay 0.008
                  disable monitor
                  复制

                  注:disable monitor 防止NTP服务的DDOS攻击解决办法。

                    [root@rac1centorder ~]# vi /etc/sysconfig/ntpd    
                    # Drop root to id 'ntp:ntp' by default.
                    OPTIONS=" -x -u ntp:ntp -p /var/run/ntpd.pid -g"
                    复制

                    注:-x参数代表使用clock slewing 微调模式同步,避免时钟大幅度跳跃导致集群重构。大幅度向后调整时间会导致 Clusterware 以为错过了签到,从而发生节点驱逐的情况。

                    题外:

                      有时候存在硬件时间和系统时间不同步问题,同步命令如下:
                      clock 显示硬件时间
                      clock --hctosys 硬件时间 写入 系统时间
                      clock --systohc 系统时间 写入 硬件时间
                      hwclock -r 显示硬件时间
                      hwclock -s 硬件时间 写入 系统时间
                      hwclock -w 系统时间 写入 硬件时间
                      复制

                      自动启动配置:

                        [root@rac1centorder ~]# chkconfig --list ntpd
                        ntpd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
                        [root@rac1centorder ~]# chkconfig ntpd on
                        [root@rac1centorder ~]# chkconfig --list ntpd
                        ntpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
                        复制

                        开始NTP服务:

                          [root@rac1centorder ~]# service ntpd status
                          ntpd 已停
                          [root@rac1centorder ~]# service ntpd restart
                          关闭 ntpd:[失败]
                          正在启动 ntpd:[确定]
                          [root@rac1centorder ~]# ntpq -p
                          remote refid st t when poll reach delay offset jitter
                          ==============================================================================
                          *192.168.7.2 120.25.115.20 3 u 27 64 377 0.187 5.667 3.343
                          复制

                          开始NTP后查看CTSS状态:

                            [grid@rac1centorder ~]$ crsctl check ctss
                            CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
                            [grid@rac2centorder ~]$ crsctl check ctss
                            CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
                            [grid@rac1centorder ~]$ tail -30 /u01/app/11.2.0/grid_1/log/rac1centorder/ctssd/octssd.log
                            2020-10-26 13:54:25.782: [ CTSS][2587350784]sclsctss_gvss1: NTP default config file found
                            2020-10-26 13:54:25.782: [ CTSS][2587350784]sclsctss_gvss8: Return [0] and NTP status [2].
                            2020-10-26 13:54:25.782: [ CTSS][2587350784]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].
                            2020-10-26 13:54:25.782: [ CTSS][2587350784]ctss_check_vendor_sw: Ctssd is switching to observer role
                            2020-10-26 13:54:25.782: [ CTSS][2587350784]clsctsselect_update_mbrdata: Updating pridata: { version[1] node[1] swversion[186647552] mode[0xe6] }.
                            2020-10-26 13:54:25.783: [ CTSS][2587350784]ctsselect_msm: CTSS mode is [0xe6]
                            复制
                            节点1的octssd.log中记录发现ntp服务,ctss服务会自动切换到观望模式。
                              [grid@rac2centorder ~]$ tail -30 /u01/app/11.2.0/grid_1/log/rac2centorder/ctssd/octssd.log
                              2020-10-26 14:28:56.783: [ CTSS][2216871680]sclsctss_gvss1: NTP default config file found
                              2020-10-26 14:28:56.783: [ CTSS][2216871680]sclsctss_gvss8: Return [0] and NTP status [2].
                              2020-10-26 14:28:56.783: [ CTSS][2216871680]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].
                              2020-10-26 14:28:56.783: [ CTSS][2216871680]clsctsselect_update_mbrdata: Updating pridata: { version[1] node[2] swversion[186647552] mode[0xee] }.
                              2020-10-26 14:28:57.034: [ CRSCCL][2013263616]clsCclGetPriMemberData: Detected pridata change for node[2]. Retrieving it to the cache.
                              2020-10-26 14:28:58.337: [ CTSS][2221074176]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
                              2020-10-26 14:28:58.337: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received sync msg
                              2020-10-26 14:28:58.337: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received from slave ( mode [0xe6] nodenum [1] hostname [rac1centorder] )
                              2020-10-26 14:29:06.339: [ CTSS][2221074176]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
                              复制

                              节点2的octssd.log中也会记录发现ntp服务,ctss服务为观望模式,并且同步时间的主节点是节点1。


                              文章转载自DBA小记,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                              评论