暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

OEM13.5 被监控对象无信息

原创 二两烧麦 2025-02-08
34

目录

故障描述

Oracle一体机计算节点1和计算节点3的agent运行正常,但是监控不到操作系统的cpu/内存/IO/网络的数据。
重启agent,未解决。
开始故障排查。

1 确认 Agent 证书是否有效

  • Agent 使用 SSL 证书进行安全通信。检查证书是否过期或损坏,可以通过以下命令查看证书信息:
    emctl status agent -details
[oracle@gsydbadm01 bin]$ ./emctl status agent -details
Oracle Enterprise Manager Cloud Control 13c Release 5
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
Agent Version          : 13.5.0.0.0
OMS Version            : 13.5.0.0.0
Protocol Version       : 12.1.0.1.0
Agent Home             : /u01/app/oracle/agent135/agent_inst
Agent Log Directory    : /u01/app/oracle/agent135/agent_inst/sysman/log
Agent Binaries         : /u01/app/oracle/agent135/agent_13.5.0.0.0
Core JAR Location      : /u01/app/oracle/agent135/agent_13.5.0.0.0/jlib
Agent Process ID       : 133557
Parent Process ID      : 121999
Agent URL              : https://gsydbadm01.local:3872/emd/main/
Local Agent URL in NAT : https://gsydbadm01.local:3872/emd/main/
Repository URL         : https://em13c:4903/empbs/upload
Started at             : 2025-02-06 17:01:21
Started by user        : oracle
Operating System       : Linux version 2.6.39-400.284.1.el6uek.x86_64 (amd64)
Number of Targets      : 73
Last Reload            : (none)           -------无信息
Last successful upload                       : (none)  -----无信息  说明上传信息未成功
Last attempted upload                        : 2025-02-08 08:42:56
Total Megabytes of XML files uploaded so far : 0
Number of XML files pending upload           : 5,444
Size of XML files pending upload(MB)         : 3.93
Available disk space on upload filesystem    : 14.36%
Collection Status                            : [COLLECTIONS_HALTED(
UPLOAD_SYSTEM Threshold (UploadMaxNumberXML: 5000) exceeded with 5110 files)]
Backoff Expiration                           : 2025-02-08 08:43:11
Heartbeat Status                             : Ok
Last attempted heartbeat to OMS              : 2025-02-08 08:42:19
Last successful heartbeat to OMS             : 2025-02-08 08:42:19
Next scheduled heartbeat to OMS              : 2025-02-08 08:43:19
-----------------------------------------------------------------------------
Agent is Running and Ready
复制

从以上信息可以判断,是上传信息不成功。为什么会不成功呢?网络是可以通的。

2 到日志中查找问题。

 cd /u01/app/oracle/agent135/agent_inst/sysman/log
 gcagent.log  emctl.log

tail -f emagent.nohup
EONSPROVIDER: oracle.eons.proxy.impl.ONSFactoryImpl
Feb 06, 2025 5:01:40 PM oracle.eons.proxy.impl.ConnectionManagerImpl readFormFactor
WARNING: unable to locate formfactor file - /u01/app/oracle/agent135/agent_13.5.0.0.0/eons/conf/.formfactor
Feb 06, 2025 5:01:42 PM oracle.sysman.diag.EMDiagImpl captureDiagData.478
SEVERE: Critical error: java.time.OffsetDateTime cannot be cast to java.sql.Timestamp
java.lang.ClassCastException: java.time.OffsetDateTime cannot be cast to java.sql.Timestamp
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType.setValue(SvrGenAlertType.java:228)
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType.<init>(SvrGenAlertType.java:131)
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlrt$QueueListener.report(SvrGenAlrt.java:687)
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlrt$QueueListener.run(SvrGenAlrt.java:1228)
    at oracle.sysman.gcagent.target.interaction.execution.ReceiveletInteractionMgr$3$1.run(ReceiveletInteractionMgr.java:1554)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at oracle.sysman.gcagent.util.system.GCAThread$RunnableWrapper.run(GCAThread.java:198)
    at java.lang.Thread.run(Thread.java:748)

Feb 06, 2025 5:01:42 PM oracle.sysman.diag.EMDiagImpl createIncident.648
INFO: incident 650 created with problem key java.lang.ClassCastException:oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType:228, in directory /u01/app/oracle/agent135/
agent_inst/diag/ofm/emagent/emagent/incident/incdir_650
Feb 06, 2025 5:01:43 PM oracle.sysman.diag.EMDiagImpl captureDiagData.478
SEVERE: Critical error: java.time.OffsetDateTime cannot be cast to java.sql.Timestamp
java.lang.ClassCastException: java.time.OffsetDateTime cannot be cast to java.sql.Timestamp
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType.setValue(SvrGenAlertType.java:228)
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType.<init>(SvrGenAler
tail -f emagent.nohup
Feb 08, 2025 9:31:38 AM oracle.sysman.diag.EMDiagImpl createIncident.648
INFO: incident 654 created with problem key java.lang.ClassCastException:oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType:228, in directory /u01/app/oracle/agent135/agent_inst/diag/ofm/emagent/emagent/incident/incdir_654
复制

进入到目录中

more readme.txt
Problem Key: java.lang.ClassCastException:oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType:228
ECID: 0000PJZBkDkBx0w0wFw0zk1bdfFr000003
Thread Id: 46
Error Message Id: OFM-99999

Context Values
--------------
threadName : AQMetricsDB


Stack Trace
-----------
java.lang.ClassCastException: java.time.OffsetDateTime cannot be cast to java.sql.Timestamp
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType.setValue(SvrGenAlertType.java:228)
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType.<init>(SvrGenAlertType.java:131)
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlrt$QueueListener.report(SvrGenAlrt.java:687)
    at oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlrt$QueueListener.run(SvrGenAlrt.java:1228)
    at oracle.sysman.gcagent.target.interaction.execution.ReceiveletInteractionMgr$3$1.run(ReceiveletInteractionMgr.java:1554)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at oracle.sysman.gcagent.util.system.GCAThread$RunnableWrapper.run(GCAThread.java:198)
    at java.lang.Thread.run(Thread.java:748)


Supplemental Files
------------------
复制

3 进入官网查询

Applies to:Enterprise Manager Base Platform - Version 13.5.0.0.0 and later
Information in this document applies to any platform.

Symptoms
On : 13.5.0.0.0 version, OMS Upgrade
ACTUAL BEHAVIOR
---------------
EM 13.5 - We applied RU3 Patch but still see below issue 
Receiving following incidents:
Observed following error in <AGENT_INST>/sysman/log/gcagent.log:INFO - ADR Incident created: Id=4, message=[java.time.OffsetDateTime cannot be cast to java.sql.Timestamp], module=oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType, problemKey='java.lang.ClassCastException:oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType:228',
direcotry=<AGENT_INST>/diag/ofm/emagent/emagent/incident/incdir_4
 
Cause
This issue is being addressed in the following bug:
BUG 33125216 - ClassCastException:oracle.sysman.db.receivelet.aqmetricsdb.SvrGenAlertType:228
Solution
 Run the root.sh on the problematic agent server and it should resolve the issue. 
复制

4 尝试解决

重新运行一下root.sh
需要使用root用户进行运行
cd /u01/app/oracle/agent135/agent_13.5.0.0.0
./root.sh
然后重新启动agent
继续监控emagent.nohup日志,从结果来看,为出现相应的报错

--- EMState agent
----- 2025-02-08 09:48:38,319::119713::Mismatch detected between timezone in env (Asia/Shanghai) and in /u01/app/oracle/agent135/agent_inst/sysman/config/emd.properties (PRC). Forcing value to latter.. -----
----- 2025-02-08 09:48:38,764::119713::Auto tuning the agent at time 2025-02-08 09:48:38,764 -----
----- 2025-02-08 09:48:39,526::119713::Finished auto tuning the agent at time 2025-02-08 09:48:39,526 -----
----- 2025-02-08 09:48:39,529::119713::Launching the JVM with following options: -Xmx240M -XX:MaxMetaspaceSize=224M -server -Djava.security.egd=file:///dev/./urandom -Dsun.lang.ClassLoader.allowArraySyntax=true -XX:-UseLargePages -XX:+UseLinuxPosixThreadCPUClocks -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCompressedOops -DHTTPClient.dontSeekTerminatingChunk=true -----
----- 2025-02-08 09:48:39,530::119713::Agent Launched with PID 123615 at time 2025-02-08 09:48:39,530 -----
----- 2025-02-08 09:48:39,530::123615::Time elapsed between Launch of Watchdog process and execing EMAgent is 2 secs -----
----- 2025-02-08 09:48:39,531::119713::Previous Thrash State(-1,-1) -----
2025-02-08 09:48:39,745 [1:main (@ 2025-02-08 09:48:39 CST)] WARN - Missing filename for log handler 'wsm'
2025-02-08 09:48:39,753 [1:main (@ 2025-02-08 09:48:39 CST)] WARN - Missing filename for log handler 'opss'
2025-02-08 09:48:39,754 [1:main (@ 2025-02-08 09:48:39 CST)] WARN - Missing filename for log handler 'opsscfg'
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
EONSPROVIDER: oracle.eons.proxy.impl.ONSFactoryImpl
Feb 08, 2025 9:48:53 AM oracle.eons.proxy.impl.ConnectionManagerImpl readFormFactor
WARNING: unable to locate formfactor file - /u01/app/oracle/agent135/agent_13.5.0.0.0/eons/conf/.formfactor
复制

5 问题未解决

2025-02-08 10:00:59,871 [282:F37ECCBE:GC.SysExecutor.8 (AgentSystemMonitorTask)] WARN - Subsystem (Upload Manager) returned bad status of {+ Upload Manager: *Critical, but not mandatory component* +}
2025-02-08 10:01:00,224 [130:980F9148:GC.SysExecutor.2 (Ping OMS)] INFO - attempting another heartbeat
复制

从日志来看,信息还是无法上传到oms服务上。

进行OMS服务器查看日志

cd /u01/gc_inst/em/EMGC_OMS1/sysman/log
tail -200 emoms_pbs.log|more
2025-02-08 10:21:56,396 [GCLoader[response_severity] - https://gsydbadm03.local:3872/emd/main/] ERROR gcloader.DataLoader logp.251 - LOADER ERROR: Loader already procesing this request:  tracking_key=26179.1738
832488000 emd_url=https://gsydbadm03.local:3872/emd/main/ loadEntryGuid=78A31DA12406AE511D4587933BE24246 upload_type=response_severity stream_id=1
2025-02-08 10:21:56,396 [GCLoader[response_severity] - https://gsydbadm03.local:3872/emd/main/] ERROR gcloader.Receiver logp.251 - Upload failed: emdURL=https://gsydbadm03.local:3872/emd/main/ trackingKey=26179
.1738832488000 type=response_severity e=ERROR-800|LOADER ERROR: Loader already procesing this request:  tracking_key=26179.1738832488000 emd_url=https://gsydbadm03.local:3872/emd/main/ loadEntryGuid=78A31DA1240
6AE511D4587933BE24246 upload_type=response_severity stream_id=1
ERROR-800|LOADER ERROR: Loader already procesing this request:  tracking_key=26179.1738832488000 emd_url=https://gsydbadm03.local:3872/emd/main/ loadEntryGuid=78A31DA12406AE511D4587933BE24246 upload_type=respon
se_severity stream_id=1
复制

6 继续查询MOS。

Applies to:Enterprise Manager Base Platform - Version 13.2.0.0.0 and later
Information in this document applies to any platform.

Symptoms
On : 13.2.1.0.0 version, Agent
Agent Status shows running and ready. However the Section under Collection Status : [COLLECTIONS_HALTED(
UPLOAD SYSTEM Threshold - unable to purge files in upload system)]
 
Oracle Enterprise Manager Cloud Control 13c Release 2
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 13.2.0.0.0
OMS Version : 13.2.0.0.0
Protocol Version : 12.1.0.1.0
Agent Home : <AGENT BASE DIRECTORY>/agent_inst
Agent Log Directory : <AGENT BASE DIRECTORY>/agent_inst/sysman/log
Agent Binaries : <AGENT BASE DIRECTORY>/agent_13.2.0.0.0
Core JAR Location :<AGENT BASE DIRECTORY>/agent_13.2.0.0.0/jlib
Agent Process ID : 383892
Parent Process ID : 383541
Agent URL : https://<AGENT HOSTNAME>.<DOMAINNAME>:3872/emd/main/
Local Agent URL in NAT : https:/<AGENT HOSTNAME>.<DOMAINNAME>:3872/emd/main/
Repository URL : https://<OMS HOSTNAME>.<DOMAINNAME>:4900/empbs/upload
Started at : 2018-10-04 09:25:05
Started by user : oracle
Operating System : Linux version 4.1.12-94.8.4.el6uek.x86_64 (amd64)
Number of Targets : 37
Last Reload : (none)
Last successful upload : 2018-10-08 08:00:13
Last attempted upload : 2018-10-09 08:25:54
Total Megabytes of XML files uploaded so far : 0.06
Number of XML files pending upload : 4,905
Size of XML files pending upload(MB) : 4.8
Available disk space on upload filesystem : 38.94%
Collection Status : [COLLECTIONS_HALTED(
UPLOAD SYSTEM Threshold - unable to purge files in upload system)]
Backoff Expiration : 2018-10-09 08:26:17
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2018-10-09 08:25:12
Last successful heartbeat to OMS : 2018-10-09 08:25:12
Next scheduled heartbeat to OMS : 2018-10-09 08:26:12
---------------------------------------------------------------
Agent is Running and Ready
 
The file .../gc_inst/em/EMGC_OMS1/sysman/log/emoms_pbs.trc shows the following error regarding the Loader System:2018-10-09 12:52:40,620 [GCLoader[severity] -https://<HOSTNAME>.<DOMAINNAME>:3872/emd/main/] ERROR gcloader.DataLoader logp.251 - LOADER ERROR: Loader already procesing this request: tracking_key=14693.1538138172000 emd_url=https://<HOSTNAME>.<DOMAINNAME>:3872/emd/main/ loadEntryGuid=6E19BEA94FCEC5091969986F77065F7F upload_type=severity stream_id=2
2018-10-09 12:52:40,621 [GCLoader[severity] - https://<HOSTNAME>.<DOMAINNAME>:3872/emd/main/] ERROR gcloader.Receiver logp.251 - Upload failed: emdURL=https://<HOSTNAME>.<DOMAINNAME>:3872/emd/main/ trackingKey=14693.1538138172000 type=severity e=ERROR-800|LOADER ERROR: Loader already processing this request: tracking_key=14693.1538138172000 emd_url=https://<HOSTNAME>.<DOMAINNAME>:3872/emd/main/ loadEntryGuid=6E19BEA94FCEC5091969986F77065F7F upload_type=severity stream_id=2
ERROR-800|LOADER ERROR: Loader already processing this request: tracking_key=14693.1538138172000 emd_url=https://<HOSTNAME>.<DOMAINNAME>:3872/emd/main/ loadEntryGuid=6E19BEA94FCEC5091969986F77065F7F upload_type=severity stream_id=2
at oracle.sysman.core.pbs.gcloader.DataLoader.startUpload(DataLoader.java:2257)
at oracle.sysman.core.pbs.gcloader.RequestMapper.processAll(RequestMapper.java:160)
at oracle.sysman.core.pbs.gcloader.Receiver.processFile(Receiver.java:2835)
at oracle.sysman.core.pbs.gcloader.Receiver.doPost(Receiver.java:2329)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:751)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:844)
 
 
Cause
This issue happens when a row is found in the MGMT_LOAD_ENTRIES table for the emd_url + upload_type + stream_id string combination is currently locked and being processed by another loader thread.
If the condition is found, there is an error logged in the emoms_pbs.trc:; ERROR-800|LOADER ERROR: Loader already processing this request
Most probably reason is that the previous process that uploaded the data got stuck not releasing the lock.  This can be a locking problem in the repository database as documented in the two bugs below:
Bug 26522375 Agent Upload timed out before completion
Bug 23509601 EM Agent upload failing due to backoff event
 
Solution
1. Stop the Oracle Management Server ( OMS).
cd <OMS_HOME>/bin
./emctl stop oms -all -force
2. After the OMS is stopped, please verify no processes are left over running
3.Verify no processes are hanging:
ps -ef | grep EMGC_ADMINSERVER
ps -ef | grep EMGC_OMS1
ps -ef | grep java
ps -ef | grep opmn
4. Kill the left over OMS java processes
$kill -9 ( from above results)
5. Stop/ Start the repository database
- sqlplus as <SYS USER>/<SYS PASSWORD> as sysdba
sql> shutdown
- stop listener
lsnrctl stop
- Restart the Listener
lsnrctl start
- Start the repository database
- sqlplus as <SYS USER>/<SYS PASSWORD> as sysdba
sql> startup
5. Bounce the job subsystem
- Login to the DB repository as SYS and verify the value of the parameter job_queue_processes
SQL> show parameter job_queue_processes    ->>remember this value or write it down
SQL> alter system set job_queue_processes=0 scope=BOTH;
- Connect to the repository database as the <USER SYSMAN> user and run the following
SQL> connect SYSMAN/<SYSMAN PASSWORD>
SQL>exec emd_maintenance.remove_em_dbms_jobs;
SQL> commit;
Reconnect to the repository database as the user with SYSDBA  permission (<SYS USER> ) and reset the value of job_queue_processes to it’s original value that you wrote down in previous step.
SQL>Connect as SYS again
SQL>alter system set job_queue_processes= scope=BOTH;
For example:
SQL>alter system set job_queue_processes=1000 scope=BOTH;
- Connect to the repository database as the <SYSMAN USER> and re-submit the DBMS_SCHEDULER jobs.
SQL>exec emd_maintenance.submit_em_dbms_jobs;
SQL>commit;
6.Start the OMS and re-check the repository jobs on both the nodes
$<OMS_HOME>/bin>./emctl start oms
$<OMS_HOME>/bin>./emctl status oms –details
Wait for the OMS to start.
7.  For the affected agents:
emctl stop agent
emctl clearstate agent
emctl start agent
emctl upload agent
The agent may have many files to upload, and this may take several times to upload all the files.
复制

7 尝试解决

按照下面7步进行处理。

1. Stop the Oracle Management Server ( OMS).
cd <OMS_HOME>/bin
./emctl stop oms -all -force
2. After the OMS is stopped, please verify no processes are left over running
3.Verify no processes are hanging:
ps -ef | grep EMGC_ADMINSERVER
ps -ef | grep EMGC_OMS1
ps -ef | grep java
ps -ef | grep opmn
4. Kill the left over OMS java processes
$kill -9 ( from above results)
5. Stop/ Start the repository database
- sqlplus as <SYS USER>/<SYS PASSWORD> as sysdba
sql> shutdown
- stop listener
lsnrctl stop
- Restart the Listener
lsnrctl start
- Start the repository database
- sqlplus as <SYS USER>/<SYS PASSWORD> as sysdba
sql> startup
5. Bounce the job subsystem
- Login to the DB repository as SYS and verify the value of the parameter job_queue_processes
SQL> show parameter job_queue_processes    ->>remember this value or write it down
SQL> alter system set job_queue_processes=0 scope=BOTH;
- Connect to the repository database as the <USER SYSMAN> user and run the following
SQL> connect SYSMAN/<SYSMAN PASSWORD>
SQL>exec emd_maintenance.remove_em_dbms_jobs;
SQL> commit;
Reconnect to the repository database as the user with SYSDBA  permission (<SYS USER> ) and reset the value of job_queue_processes to it’s original value that you wrote down in previous step.
SQL>Connect as SYS again
SQL>alter system set job_queue_processes= scope=BOTH;
For example:
SQL>alter system set job_queue_processes=1000 scope=BOTH;
- Connect to the repository database as the <SYSMAN USER> and re-submit the DBMS_SCHEDULER jobs.
SQL>exec emd_maintenance.submit_em_dbms_jobs;
SQL>commit;
6.Start the OMS and re-check the repository jobs on both the nodes
$<OMS_HOME>/bin>./emctl start oms
$<OMS_HOME>/bin>./emctl status oms –details
Wait for the OMS to start.
7.  For the affected agents:
emctl stop agent
emctl clearstate agent
emctl start agent
emctl upload agent
The agent may have many files to upload, and this may take several times to upload all the files.
复制

问题解决
重点关注下面2个bug

Bug 26522375 Agent Upload timed out before completion
Bug 23509601 EM Agent upload failing due to backoff event
复制
文章转载自二两烧麦,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论