暂无图片
暂无图片
9
暂无图片
暂无图片
4
暂无图片

记一次Oracle数据库SQL执行超时产生ORA-609报错导致进程被abort问题分析及处理

原创 尚雷 2022-11-29
5453

一、现象描述

研发中心开发部一同事反馈Oracle一开发测试库无法访问,业务无法测试,让我协助帮忙处理下。

尝试登陆该服务器失败,但可以ping通该服务器,于是让系统组同事帮忙通过控制台登陆该服务器看下什么问题,被告知无法进入服务器,报内核错误,于是强制重启。

服务器重启一段时间后,可以正常登陆服务器,数据库已被自动拉起。

告知研发同事进行测试,被告知应用连接超时。

观察数据库日志,发现大量如下报错:
image.png
开发测试人员反馈,因为大量超时,应用无法正常进行测试。

二、问题分析

由于ORA-609,导致ospid(xxxx)进程被aborting了,同时由于TNS-12537报错,最终导致应用连接被关闭。
ORA-609错误原因是因为无法与进入的连接进行联系,所以无法将此连接转入监听器,所以数据库的process中止此进程,因此服务器进程被终止关闭。

根据报错信息,在网上及MOS上搜索了下相关信息,并根据相关信息,在Oracle用户$ORACLE_HOME/network/admin下创建sqlnet.ora,并配置

sqlnet.expire_time=10
SQLNET.INBOUND_CONNECT_TIMEOUT=180
DIAG_ADR_ENABLED = OFF
复制

让研发同事连接测试,结果应用连接后,很快超时,数据库负载压力突增,就连编辑文件都会被夯住。

根据报错,又反复调整了几个参数值,应用测试还是超时。

另外观察到ASM日志里保留大量如下错误:
image.png
根据资料显示 WARNING: Waited 139 secs for write IO to PST disk 0 in group 1 这种报错大意为PST通信链路在访问磁盘组中的磁盘时候等待了XXX秒,而且触发了持续的等待。超时等待会在频率触发的基础上递增每次的等待时间。

出现这种状况的原因一般与操作系统网络通信链路,数据库主机磁盘或者超时参数的设置有关。

并根据建议将_asm_hbeatiowait值进行了调整:alter system set “_asm_hbeatiowait”=120 scope=spfile;

并重启了ASM实例,重启完后,依然应用报连接超时。

此时观察到系统日志/var/log/messages里有报如下信息:
image.png
根据 “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message 查询了相关资料,对此的解释是:

默认情况下, Linux会最多使用40%的可用内存作为文件系统缓存。当超过这个阈值后,文件系统会把将缓存中的内存全部写入磁盘, 导致后续的IO请求都是同步的。

将缓存写入磁盘时,有一个默认120秒的超时时间。 出现上面的问题的原因是IO子系统的处理速度不够快,不能在120秒将缓存中的数据全部写入磁盘。IO系统响应缓慢,导致越来越多的请求堆积,最终系统内存全部被占用,导致系统失去响应。

对此的建议修改是:

sysctl -w vm.dirty_ratio=10
sysctl -w vm.dirty_background_ratio=5
sysctl -p
复制

如果系统永久生效,修改/etc/sysctl.conf文件。加入如下两行

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
复制

重启了数据库和操作系统。

此时也到了周末,期待着经过周末,该问题能否自动修复。

但很遗憾,周一早上观察,依然报连接超时。

研发的同事反馈,部分SQL可以正常查询,但几个关键SQL查询很慢,超过了应用设置的超时时间,其中如下这个SQL要求在3毫米内查出。

-- SQL已做了脱敏处理
select * from (
select q.*,rownum rnum from (
select *
  from (select *
          from (select inquiry_id id,
                       d.*,
                       case
                         when quo_num > 0 then
                          '1'
                         when gi_flag = '1' then
                          '4'
                         when left_day <= 0 then
                          '3'
                         else
                          '3'
                       end quotation_con,
                       row_number() over(partition by inquiry_id order by so_flag asc,quo_id desc) rn
                  from (select m.com_name comName,
                               m.com_id,
                               i.info_title,
                               i.inquiry_id,
                               (select count(ip.inquiry_prod_id) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') inq_prod,
                               (select ip.pu_un_text from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1' and rownum=1) in_prod_un,
                               (select sum(ip.pu_qua) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') in_prod_qu,
                               i.member_id,
                               (select com_id from cor_co_cn_ma_chk where xxx_member_id = i.member_id and rownum<=1) mic_com_id,
                               (select mm.MEMBER_TYPE
                                  from xxx.CO_ME_INFO mm
                                 where mm.MEMBER_ID = i.member_id) member_type,
                               (select l.log_user from xxx.core_logon_info l where l.member_id=i.member_id and l.user_role='0' and rownum<=1) log_user,
                               i.delivery_province,
                               i.delivery_city,
                               i.eff_ti,
                               i.added_time added_time_inquiry,
                               trunc(i.eff_ti) - trunc(sysdate) + 1 left_day,
                               f_get_big_buyer_flag(i.member_id) big_flag,
                               f_get_com_cert_st(i.member_id) cert_st,
                               q.invite_read_flag read_flag,
                               q.gi_flag,
                               '1' so_flag,
                               q.quo_id quo_id,
                               (select count(1)
                                  from quo_info q
                                 where q.relation_id = i.inquiry_id
                                   and q.relation_type = '1'
                                   and q.quotation_flag = '1'
                                   and q.member_id = xxxx) quo_num,
                               (select max(q.added_time)
                                  from quo_info q
                                 where q.relation_id = i.inquiry_id
                                   and q.relation_type = '1'
                                   and q.quotation_flag = '1') added_time_order,
                               '0' selection
                          from inquiry_info       i,
                               quo_info     q,
                               cor_com_ma m
                         where i.bu_flag = '1'
                           and i.sensetive_flag = '0'
                           and i.CON_ST in ('0', '3')
                           and m.member_id = i.member_id
                           and m.com_sta in ('1', '3')
                           and q.relation_type = '1'
                           and i.inquiry_id = q.relation_id
                           and q.member_id = xxxx
                           and q.invite_email is not null
                        union all
                        select m.com_name comName,
                               m.com_id,
                               i.info_title,
                               i.inquiry_id,
                               (select count(ip.inquiry_prod_id) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') inq_prod,
                               (select ip.pu_un_text from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1' and rownum=1) in_prod_un,
                               (select sum(ip.pu_qua) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') in_prod_qu,
                               i.member_id,
                               (select com_id from cor_co_cn_ma_chk where xxx_member_id = i.member_id and rownum<=1) mic_com_id,
                               (select mm.MEMBER_TYPE
                                  from xxx.CO_ME_INFO mm
                                 where mm.MEMBER_ID = i.member_id) member_type,
                               (select l.log_user from xxx.core_logon_info l where l.member_id=i.member_id and l.user_role='0' and rownum<=1) log_user,
                               i.delivery_province,
                               i.delivery_city,
                               i.eff_ti,
                               i.added_time added_time_inquiry,
                               trunc(i.eff_ti) - trunc(sysdate) + 1 left_day,
                               f_get_big_buyer_flag(i.member_id) big_flag,
                               f_get_com_cert_st(i.member_id) cert_st,
                               q.invite_read_flag read_flag,
                               q.gi_flag,
                               '1' so_flag,
                               q.quo_id quo_id,
                               (select count(1)
                                  from quo_info q
                                 where q.relation_id = i.inquiry_id
                                   and q.relation_type = '1'
                                   and q.quotation_flag = '1'
                                   and q.member_id = xxxx) quo_num,
                               (select max(q.added_time)
                                  from quo_info q
                                 where q.relation_id = i.inquiry_id
                                   and q.relation_type = '1'
                                   and q.quotation_flag = '1') added_time_order,
                               '0' selection
                          from inquiry_info           i,
                               quo_info         q,
                               core_pr_se p,
                               cor_com_ma     m
                         where i.bu_flag = '1'
                           and i.sensetive_flag = '0'
                           and i.CON_ST in ('0', '3')
                           and m.member_id = i.member_id
                           and m.com_sta in ('1', '3')
                           and q.relation_type = '1'
                           and i.inquiry_id = q.relation_id
                           and i.mic_prod_id = p.prod_id
                           and q.member_id = xxxx
                           and q.mic_com_id is not null
                        union all
                        select i.comName,
                               i.com_id,
                               i.info_title,
                               i.inquiry_id,
                               (select count(ip.inquiry_prod_id) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') inq_prod,
                               (select ip.pu_un_text from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1' and rownum=1) in_prod_un,
                               (select sum(ip.pu_qua) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') in_prod_qu,
                               i.member_id,
                               (select com_id from cor_co_cn_ma_chk where xxx_member_id = i.member_id and rownum<=1) mic_com_id,
                               (select mm.MEMBER_TYPE
                                  from xxx.CO_ME_INFO mm
                                 where mm.MEMBER_ID = i.member_id) member_type,
                               (select l.log_user from xxx.core_logon_info l where l.member_id=i.member_id and l.user_role='0' and rownum<=1) log_user,
                               i.delivery_province,
                               i.delivery_city,
                               i.eff_ti,
                               i.added_time added_time_inquiry,
                               trunc(i.eff_ti) - trunc(sysdate) + 1 left_day,
                               f_get_big_buyer_flag(i.member_id) big_flag,
                               f_get_com_cert_st(i.member_id) cert_st,
                               i.readFlag read_flag,
                               i.gi_flag,
                               '2' so_flag,
                               i.quo_id quo_id,
                               (select count(1)
                                  from quo_info q
                                 where q.relation_id = i.inquiry_id
                                   and q.relation_type = '1'
                                   and q.quotation_flag = '1'
                                   and q.member_id = xxxx) quo_num,
                               nvl((select max(q.added_time)
                                     from quo_info q
                                    where q.relation_id = i.inquiry_id
                                      and q.relation_type = '1'
                                      and q.quotation_flag = '1'),
                                   i.s_added_time) added_time_order,
                               i.selection
                          from (select row_number() over(partition by s.ma_rel_id order by s.added_time asc) rn,
                                       s.gi_flag,
                                       s.added_time s_added_time,
                                       s.read_flag_supp readFlag,
                                       s.match_id quo_id,
                                       m.com_id,
                                       m.com_name comName,
                                       s.selection,
                                       q.*
                                  from xxx.sy_ma_info         s,
                                       xxx.inquiry_info              q,
                                       xxx.cor_com_ma        m,
                                       xxx.cor_co_cn_ma_chk k
                                 where s.ma_rel_id = q.inquiry_id
                                   and s.mic_com_id = k.com_id
                                   and s.relation_type in ('3', '4', '5')
                                   and s.delete_flag = '1'
                                   and q.bu_flag = '1'
                                   and q.sensetive_flag = '0'
                                   and q.CON_ST in ('0', '3')
                                   and m.member_id = s.member_id
                                   and m.com_sta in ('1', '3')
                                   and k.xxx_member_id = xxxx) i
                        union all
                        select i.comName,
                               i.com_id,
                               i.info_title,
                               i.inquiry_id,
                 (select count(ip.inquiry_prod_id) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') inq_prod,
                 (select ip.pu_un_text from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1' and rownum=1) in_prod_un,
                 (select sum(ip.pu_qua) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') in_prod_qu,
                               i.member_id,
                               (select com_id from cor_co_cn_ma_chk where xxx_member_id = i.member_id and rownum<=1) mic_com_id,
                               (select mm.MEMBER_TYPE
                                  from xxx.CO_ME_INFO mm
                                 where mm.MEMBER_ID = i.member_id) member_type,
                               (select l.log_user from xxx.core_logon_info l where l.member_id=i.member_id and l.user_role='0' and rownum<=1) log_user,
                               i.delivery_province,
                               i.delivery_city,
                               i.eff_ti,
                               i.added_time added_time_inquiry,
                               trunc(i.eff_ti) - trunc(sysdate) + 1 left_day,
                               f_get_big_buyer_flag(i.member_id) big_flag,
                               f_get_com_cert_st(i.member_id) cert_st,
                               i.readFlag read_flag,
                               i.gi_flag,
                               '3' so_flag,
                               i.quo_id quo_id,
                               (select count(1)
                                  from quo_info q
                                 where q.relation_id = i.inquiry_id
                                   and q.relation_type = '1'
                                   and q.quotation_flag = '1'
                                   and q.member_id = xxxx) quo_num,
                               nvl((select max(q.added_time)
                                     from quo_info q
                                    where q.relation_id = i.inquiry_id
                                      and q.relation_type = '1'
                                      and q.quotation_flag = '1'),
                                   i.s_added_time) added_time_order,
                               '0' selection
                          from (select row_number() over(partition by s.ma_rel_id order by s.added_time asc) rn,
                                       s.gi_flag,
                                       s.added_time s_added_time,
                                       s.read_flag readFlag,
                                       s.match_id quo_id,
                                       m.com_id,
                                       m.com_name comName,
                                       q.*
                                  from sys_mat_cn s,
                                       inquiry_info       q,
                                       cor_com_ma m
                                 where  s.supp_member_id = xxxx and s.ma_rel_id = q.inquiry_id
                                   and s.ma_meth = '2'
                                   and q.bu_flag = '1'
                                   and q.sensetive_flag = '0'
                                   and q.CON_ST in ('0', '3')
                                   and m.member_id = q.member_id
                                   and m.com_sta in ('1', '3')
                                   AND ((   ((TRUNC(s.ADDED_TIME) >= TRUNC(SYSDATE - 2)) OR q.inquiry_id = 2000522250 )
                                       AND NOT EXISTS
                                        (SELECT *
                                            FROM quo_info k
                                           WHERE k.relation_id = q.inquiry_id
                                             AND k.relation_type = '1'
                                             AND k.quotation_flag = '1'
                                             AND k.member_id = xxxx)) OR
                                       s.gi_flag = '1' OR EXISTS
                                        (SELECT *
                                           FROM quo_info k
                                          WHERE k.relation_id = q.inquiry_id
                                            AND k.relation_type = '1'
                                            AND k.quotation_flag = '1'
                                            AND k.member_id = xxxx))
                                   ) i
                          union all
                        select i.comName,
                               i.com_id,
                               i.info_title,
                               i.inquiry_id,
                 (select count(ip.inquiry_prod_id) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') inq_prod,
                 (select ip.pu_un_text from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1' and rownum=1) in_prod_un,
                 (select sum(ip.pu_qua) from inq_pr_in ip where ip.relation_id = i.inquiry_id and ip.relation_type = '1' and ip.delete_flag = '1') in_prod_qu,
                               i.member_id,
                               (select com_id from cor_co_cn_ma_chk where xxx_member_id = i.member_id and rownum<=1) mic_com_id,
                               (select mm.MEMBER_TYPE
                                  from xxx.CO_ME_INFO mm
                                 where mm.MEMBER_ID = i.member_id) member_type,
                               (select l.log_user from xxx.core_logon_info l where l.member_id=i.member_id and l.user_role='0' and rownum<=1) log_user,
                               i.delivery_province,
                               i.delivery_city,
                               i.eff_ti,
                               i.added_time added_time_inquiry,
                               trunc(i.eff_ti) - trunc(sysdate) + 1 left_day,
                               f_get_big_buyer_flag(i.member_id) big_flag,
                               f_get_com_cert_st(i.member_id) cert_st,
                               i.readFlag read_flag,
                               i.gi_flag,
                               '2' so_flag,
                               i.quo_id quo_id,
                               (select count(1)
                                  from quo_info q
                                 where q.relation_id = i.inquiry_id
                                   and q.relation_type = '1'
                                   and q.quotation_flag = '1'
                                   and q.member_id = xxxx) quo_num,
                               nvl((select max(q.added_time)
                                     from quo_info q
                                    where q.relation_id = i.inquiry_id
                                      and q.relation_type = '1'
                                      and q.quotation_flag = '1'),
                                   i.s_added_time) added_time_order,
                               i.selection
                          from (select row_number() over(partition by s.ma_rel_id order by s.added_time asc) rn,
                                       s.gi_flag,
                                       s.added_time s_added_time,
                                       s.read_flag readFlag,
                                       s.match_id quo_id,
                                       m.com_id,
                                       m.com_name comName,
                                       '1' as selection,
                                       q.*
                                  from xxx.sys_mat_cn         s,
                                       xxx.inquiry_info              q,
                                       xxx.cor_com_ma        m,
                                       xxx.cor_co_cn_ma_chk k
                                 where s.ma_rel_id = q.inquiry_id
                                   and s.mic_com_id = k.com_id
                                   and s.ma_meth ='2'
                                   and q.bu_flag = '1'
                                   and q.sensetive_flag = '0'
                                   and q.CON_ST in ('0', '3')
                                   and q.member_id = m.member_id
                                   and m.com_sta in ('1', '3')
                                   and k.xxx_member_id = xxxx) i
                         where rn <= 1) d)
         where rn <= 1) d where d.quo_num=0 and d.gi_flag='0' and d.eff_ti>=trunc(sysdate) order by eff_ti DESC ) q ) p
 where rnum > 1
   and rnum <= 10
复制

尝试执行了下该SQL,发现当查到有数据的时候,会非常慢,超过一分钟,当查询无数据的时候会很快,也就一秒左右。
备注:
在处理过程中,也遇到了监听报WARNING: Subscription for node down event still pending告警,如下所示:
image.png

在10g以后,涉及到Oracle监听器的消息默认会订阅到ONS(Oracle Notification Service),在非RAC环境,建议禁止订阅这些消息。

并根据MOS资料,在GRID用户 $ORACLE_HOME/network/admin下修改 listener.ora

添加 SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF

并对监听执行 lsnrctl reload。

根据一段时间观察,监听日志未再报 WARNING: Subscription for node down event still pending告警。

三、解决方法

该数据库服务器为64G,SGA和PGA分别设置了30G和10G。

对比了一个线上11G和此环境配置差不多的生产库,调整了下SGA和PGA以及sysctl.conf、limits.conf的参数,重启数据库和服务器依然如此。

并查询了SQL的缓存命中率。

于是将应用程序运行了一段时间,抓取问题时段的ASH报告,如下:
image.png
发现主要是direct path read等待事件。direct path read是 11G推出的一个新特性,

Direct path read的目的是让一些不常使用的大表数据(冷数据)在全表扫描时,每次都从磁盘读到用户的私有内存(PGA),而不要去挤占有限的、宝贵的、频繁使用的数据(热数据)所在的共享内存(SGA-buffer cache)。

热数据只在第一次访问时从磁盘读,读到SGA的buffer cache后,再次访问会直接从内存读,效率高、对存储压力小。

从共享内存读到直接路径读,这个变化在不频繁的全表扫描时是起到积极作用的,但某些SQL由于业务设计问题,导致SQL执行计划不合适,就会对数据库产生致命影响。

于是就尝试将 direct path read这一特性进行强制关闭。

ALTER SYSTEM SET _serial_direct_read='NEVER' SCOPE=BOTH;
复制

并通过如下语句查询执行结果:

SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
FROM SYS.x$ksppi x, SYS.x$ksppcv y
WHERE x.indx = y.indx
AND (x.ksppinm = '_small_table_threshold'
OR x.ksppinm = '_serial_direct_read');
复制

我将之前执行的SQL重新又执行了一次,耗时从原来的一分多钟一下子降到了一秒多,于是又重新找了几个变量带入进去,查询时间基本在3秒以下。

通知研发人员复测下,反馈时间确实块了很多,业务测试通过了。

四、问题反馈

对于该环境,通过和研发的沟通,目前存在硬解析较高,因为研发未对SQL添加绑定变量,导致硬解析高。另外禁用了direct path read是否会对其它SQL产生影响,还在观察中。

最后修改时间:2022-11-29 10:57:43
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

zhangyfr
暂无图片
2年前
评论
暂无图片 0
😂
2年前
暂无图片 点赞
评论
Raptor
暂无图片
2年前
评论
暂无图片 0
完全错误的做法,饮鸩止渴
2年前
暂无图片 点赞
评论
布衣
暂无图片 暂无图片
2年前
评论
暂无图片 0
很赞,学到了
2年前
暂无图片 点赞
评论
猫瞳映月
暂无图片
2年前
评论
暂无图片 0
2年前
暂无图片 点赞
评论