Oracle Cluster Health Monitor(CHM)

oracle分享技术 2021-02-06

901

Cluster Health Monitor（以下简称CHM）是一个Oracle提供的工具，用来自动收集操作系统的资源（CPU、内存、SWAP、进程、I/O以及网络等）的使用情况。相对于OSWatcher，CHM直接调用OS的API来降低开销，而OSWatcher则是直接调用UNIX命令。另外，CHM的实时性更强，每秒收集一次数据(在11.2.0.3，改为了5秒一次)。

CHM主要包括两个服务：
1). System Monitor Service(osysmond)：这个服务在所有节点都会运行，osysmond会将每个节点的资源使用情况发送给cluster logger service，后者将会把所有节点的信息都接收并保存到CHM的资料库。
$ ps -ef|grep osysmond
root 7984 1 0 Jun05 ? 01:16:14 /u01/app/11.2.0/grid/bin/osysmond.bin

2). Cluster Logger Service(ologgerd)：在一个集群中的，ologgerd 会有一个主机点(master)，还有一个备节点(standby)。当ologgerd在当前的节点遇到问题无法启动后，它会在备用节点启用。

CHM诊断日志：如果CHM的运行异常，可以查看下面的日志：
$GRID_HOME/log/<nodename>/crflogd/crflogd.log
$GRID_HOME/log/<nodename>/crfmond/crfmond.log

GI 中的服务ora.crf 是CHM对应的资源，可以使用下面的命令来启停CHM（不推荐停止该服务）:
用root用户：
$GRID_HOME/bin/crsctl stop res ora.crf -init
$GRID_HOME/bin/crsctl modify res ora.crf -attr ENABLED=0 -init
$GRID_HOME/bin/crsctl start res ora.crf -init

获得CHM生成的数据的方法有两种：
1. 一种是使用Grid_home/bin/diagcollection.pl：
1). 首先，确定cluster logger service的主节点：
$ oclumon manage -get master
Master = rac2

2).用root身份在主节点rac2执行下面的命令：
# <Grid_home>/bin/diagcollection.pl -collect -chmos -incidenttime inc_time -incidentduration duration
inc_time是指从什么时间开始获得数据，格式为MM/DD/YYYY24HH:MM:SS, duration指的是获得开始时间后多长时间的数据。

比如：# diagcollection.pl -collect -crshome /u01/app/11.2.0/grid -chmoshome /u01/app/11.2.0/grid -chmos -incidenttime 06/15/201215:30:00 -incidentduration 00:05

2. 另外一种获得CHM生成的数据的方法为oclumon:
$oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] | [-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h]

-s表示开始时间，-e表示结束时间
$ oclumon dumpnodeview -allnodes -v -s "2012-06-15 07:40:00" -e "2012-06-15 07:57:00" > /tmp/chm1.txt

$ oclumon dumpnodeview -n node1 node2 node3 -last "12:00:00" >/tmp/chm1.txt
$ oclumon dumpnodeview -allnodes -last "00:15:00" >/tmp/chm1.txt

oracle

文章转载自oracle分享技术，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

Oracle Cluster Health Monitor(CHM)

评论