oracle19C数据库一条正则函数语句的优化

原创 11 0 2023-09-01

163

环境描述： linux 7.6 + oracle 19C

前几日，软件做了新功能的发布后，数据库的awr报告里出现一条很慢的语句（表名、列名做了替换）：

select count(t1.bxbm) from ( select yy as bxbm from (with t as (select id x, bxbm y from uf_testxxx ) select x, regexp_substr(y, :"SYS_B_00", :"SYS_B_01", level) yy from t where instr(y, :"SYS_B_02") >= :"SYS_B_03" connect by level <= regexp_count(y, :"SYS_B_04") + :"SYS_B_05" and y=prior y and prior dbms_random.value is not null ) group by yy )t1, ( SELECT REGEXP_SUBSTR(:"SYS_B_06", :"SYS_B_07", :"SYS_B_08", LEVEL, :"SYS_B_09") AS bxbm FROM DUAL CONNECT BY LEVEL <= LENGTH(:"SYS_B_10") - LENGTH(REGEXP_REPLACE(:"SYS_B_11", :"SYS_B_12", :"SYS_B_13"))+:"SYS_B_14" )t2 where t1.bxbm = t2.bxbm;

经过统计，表uf_testxxx的记录1000多。

该语句每次执行2000多秒。

经过测试，该语句在Oracle11G上运行正常，不到1秒.

可以确定是19c的bug.

优化的思路是使用表连接替换递归。

select count(t1.bxbm)

from (
select yy as bxbm
from (with t as (select id x, bhbm y
from uf_testxxx where bxbm is not null)
select x, regexp_substr(y, :"SYS_B_00", :"SYS_B_01", le) yy
from t，( select level le from dual connect by level <= 200) b
where b.le <= length(y) - length(replace(y,',','')) +1 and regexp_substr(y, '[^,]+', 1, b.le) is not null
)
group by yy
)t1, (
SELECT REGEXP_SUBSTR(:"SYS_B_06", :"SYS_B_07", :"SYS_B_08", LEVEL, :"SYS_B_09") AS bxbm
FROM DUAL
CONNECT BY LEVEL <=
LENGTH(:"SYS_B_10") - LENGTH(REGEXP_REPLACE(:"SYS_B_11", :"SYS_B_12", :"SYS_B_13"))+:"SYS_B_14"
)t2 where t1.bxbm = t2.bxbm;

其中的200，是根据业务情况下确定的。经过分析，业务数据最大分隔是145个，取200完全可以满足业务的需求。

优化后的效果是每次执行0.7秒.

oracle 19c