OCEANBASE 处理客户端请求(SQL)为线程池模式,在出现性能问题时可以一定程度上避免OS LOAD过高,但也会导致慢SQL(大查询)占用worker线程导致正常业务SQL小查询不能被处理响应变慢、无法响应而影响业务。
OB相关与线程池及相关机制
线程分类
OBServer 会启动很多不同功能的线程,细化的分类请参考ObServer线程的相关章节,本节按照最粗略的标准可以分为以下两类:
-
一类是处理 SQL 和事务提交的线程,统称为 Worker 线程。
-
其余的是处理网络 IO、磁盘 IO、Compaction 以及定时任务的线程。
Worker 线程是分租户的,非 Worker 线程是所有租户共享的。本节的租户 CPU 隔离都是针对 Worker线程的。
基于线程数的 CPU 隔离
Unit 的 CPU 隔离是通过一个 Unit 的活跃 Worker 线程数实现的。
由于 SQL 执行过程中可能会有 IO 等待、锁等待等,所以一个线程无法用满一个物理 CPU,故在缺省配置下,OBServer 会给每个 CPU 启动 4 个线程,4 这个倍数可以通过配置 cpu_quota_concurrency 来控制。这就意味着如果一个 Unit 的 MAX_CPU 是10,那么它能同时运行的活跃线程是 40,最大物理 CPU 的占用是 400%, 也就是 10 个 CPU 被超卖成了 40 个 CPU。
大查询处理
我们认为相比于大查询,让短查询尽快返回对用户更有意义,即大查询的查询优先级更低,当大查询和短查询同时争抢 CPU 时,系统会限制大查询的 CPU 使用。
当一个线程执行的 SQL 查询耗时太长,这条查询就会被判定为大查询, 一旦判定为大查询,执行大查询的 Worker 会等在一个 Pthread Condition 上,这样就为其它的 Worker 线程让出了 CPU。
具体实现上,OBServer 在代码中插入了很多检查点,Worker 线程在运行过程中会通过检查点定期检查自己的状态,如果判断应该挂起,那么线程就会等待在一个 Pthread Conditon 上,等到合适的时机再被唤醒。
如果同时有大查询和小查询,大查询最多占用 30% 的 Worker 线程,30% 这个百分比值可以通过配置项 large_query_worker_percentage 来设置。
- 最大线程数 = unit_max_cpu * worker_per_cpu_quota
- 活跃线程数 = unit_min_cpu * cpu_quota_concurrency
- 大查询线程数 = 活跃线程数 * large_query_worker_percentage
有两点需要说明:
-
当没有小查询的时候,大查询可以用到 100% 的 Worker 线程。只有当同时有大查询和小查询时,30% 的比例才生效。
-
一个 Worker 因为执行大查询被挂起时,作为补偿,系统可能会新创建一个 Worker 线程,但是总的 Worker 线程不能超过
MAX_CPU的 10 倍,10 这个倍数可以通过配置项workers_per_cpu_quota来设置。 -
large_query_worker_percentage 当大查询较多,并发较高时,并不能限制住活跃线程数的使用,仍然会导致活跃线程数全部占用,导致无法运行小查询,未能起到隔离大查询的效果。
提前识别大查询
由于 OBServer 挂起一个大查询线程,就会启动一个新的 Worker 线程, 但是如果有大量大查询涌入,OBServer 新创建的线程还是被用来处理大查询,很快达到 Worker 数上限,在这批大查询消耗完之前就没有机会再处理短查询了。
为了优化这个场景,OBServer 会在 SQL 开始执行之前预判它是不是大查询,预判的本质就是估计 SQL 的执行时间。预判主要依据以下假设场景:如果两条 SQL 的执行 Plan 是一样的,可以猜测它们的执行时间也是相似的,这样就可以用 Plan 最近的执行时间来判断 SQL 会不会是大查询。
如果某条 SQL 被预判为大查询,那么该查询就会被放入一个特殊的大查询队列,其 Worker 线程会被释放,系统就会接着执行后面的请求了。
相关参数
-
cpu_quota_concurrency
配置项用于设置集群中每个 CPU 所允许的最大并发数,也可以理解为活跃线程数,限制的最大并发处理客户端请求的线程数,超过会队列阻塞,包括登录
-
workers_per_cpu_quota
最大worker线程数,目前还不确定作用,因为有cpu_quota_concurrency限制,不确定内部rpc调用或者px是否能额外使用
-
large_query_worker_percentage
大查询可用的worker比例
上面3个参数均为集群级参数
针对租户最大活跃线程数、大查询隔离效果、队列测试
调整参数
cpu_quota_concurrency <<<限制的最大并发处理客户端请求的线程数,超过会队列阻塞,包括登录
workers_per_cpu_quota <<<最大worker线程数,目前还不确定作用,因为有cpu_quota_concurrency限制,不确定内部rpc调用或者px是否能额外使用
large_query_worker_percentage <<<大查询可用的worker比例
MySQL [oceanbase]> alter system set cpu_quota_concurrency=2;
Query OK, 0 rows affected (0.070 sec)
MySQL [oceanbase]> alter system set workers_per_cpu_quota=4;
Query OK, 0 rows affected (0.066 sec)
MySQL [oceanbase]>
MySQL [oceanbase]> select distinct name,value from __all_virtual_sys_parameter_stat where name in ('workers_per_cpu_quota','cpu_quota_concurrency','px_workers_per_cpu_quota')
-> ;
+--------------------------+-------+
| name | value |
+--------------------------+-------+
| px_workers_per_cpu_quota | 10 |
| cpu_quota_concurrency | 2 |
| workers_per_cpu_quota | 4 |
+--------------------------+-------+
3 rows in set (0.030 sec)
obclient [SYS]> show parameters like '%large_query_worker_percentage%';
+------+----------+---------------+----------+-------------------------------+-----------+-------+----------------------------------------------------------------------------------------------------+---------+---------+---------+-------------------+
| ZONE | SVR_TYPE | SVR_IP | SVR_PORT | NAME | DATA_TYPE | VALUE | INFO | SECTION | SCOPE | SOURCE | EDIT_LEVEL |
+------+----------+---------------+----------+-------------------------------+-----------+-------+----------------------------------------------------------------------------------------------------+---------+---------+---------+-------------------+
| z2 | observer | 192.168.56.37 | 2882 | large_query_worker_percentage | NULL | 30 | the percentage of the workers reserved to serve large query request. Range: [0, 100] in percentage | TENANT | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| z1 | observer | 192.168.56.36 | 2882 | large_query_worker_percentage | NULL | 30 | the percentage of the workers reserved to serve large query request. Range: [0, 100] in percentage | TENANT | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
+------+----------+---------------+----------+-------------------------------+-----------+-------+----------------------------------------------------------------------------------------------------+---------+---------+---------+-------------------+
2 rows in set (0.060 sec)
租户max_cpu为3
cpu_quota_concurrency = 2 <<允许最大并发活动线程数为6
workers_per_cpu_quota = 4
large_query_worker_percentage = 30 << large query 限制 30%线程,也就是1-2个
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1004 | 192.168.56.36 | 12 | 24 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 | 6 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.37 | 12 | 24 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 6 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.38 | 3 | 6 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
查询测试租户ID
obclient [oceanbase]> select tenant_id,tenant_name from __all_tenant;
+-----------+-------------+
| tenant_id | tenant_name |
+-----------+-------------+
| 1 | sys |
| 1004 | test |
| 1006 | test2 |<<<
+-----------+-------------+
3 rows in set (0.003 sec)
准备测试角本,执行,把active worker压满
角本连接192.168.56.36这台observer,并发6个客户端,运行慢查询
vi test.sh
for i in {1..6}
do
obclient -h192.168.56.36 -P2881 -usys@test2 -pEnmo_2024 -c<<EOF &
with t as (select rownum n from dual connect by rownum<=100000) select count(*) from t a,t b,t c;
EOF
done;
wait
sh test.sh
---6个正在执行的客户端进程
[root@ocp ~]# ps -ef |grep obclient
root 3739370 3739369 0 15:09 pts/0 00:00:00 obclient -h192.168.56.36 -P2881 -usys@test2 -px xxxxxxxxx -c
root 3739371 3739369 0 15:09 pts/0 00:00:00 obclient -h192.168.56.36 -P2881 -usys@test2 -px xxxxxxxxx -c
root 3739372 3739369 0 15:09 pts/0 00:00:00 obclient -h192.168.56.36 -P2881 -usys@test2 -px xxxxxxxxx -c
root 3739373 3739369 0 15:09 pts/0 00:00:00 obclient -h192.168.56.36 -P2881 -usys@test2 -px xxxxxxxxx -c
root 3739374 3739369 0 15:09 pts/0 00:00:00 obclient -h192.168.56.36 -P2881 -usys@test2 -px xxxxxxxxx -c
root 3739375 3739369 0 15:09 pts/0 00:00:00 obclient -h192.168.56.36 -P2881 -usys@test2 -px xxxxxxxxx -c
[root@ocp ~]#
查询队列
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1004 | 192.168.56.37 | 12 | 24 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 6 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.36 | 12 | 24 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 | 6 | 6 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
| 1006 | 192.168.56.38 | 3 | 6 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
关注192.168.56.36
used_lq_tokens 1
req_queue_total_size 2
查询活动会话
查看活动会话,test2 只有 4个
[root@ocp ~]# sh obm as2
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
| svr_ip | id | user | tenant | db | sql_id | time | retry_cnt | command | info | user_client_ip |
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
| 192.168.56.36 | 3221588599 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 128.443141 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221617751 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 128.443264 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221598994 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 128.443415 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221594000 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 128.443523 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.37 | 3221849116 | ocp_monitor | sys | oceanbase | 2A6D6C59E6D86C57519267B5EC2CBEC3 | 0.506379 | 0 | Query | select /*+ query_timeout(10000000) */ sid as session_id, eve | 127.0.0.1 |
| 192.168.56.38 | 3222511209 | root | sys | oceanbase | 675CEF94E296A769020C364B4BF14F4A | 0.02959 | 0 | Query | select svr_ip,id,user,tenant,db,ifnull(sql_id,'') sql_id,tim | 192.168.56.43 |
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
这时登录192.168.56.36,被阻塞,登录无响应
登录不进去
[root@ocp ~]# obclient -h192.168.56.36 -P2881 -usys@test2 -pEnmo_2024 -c
无响应
做简单分析
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1004 | 192.168.56.36 | 12 | 24 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 | 6 | 6 | 1 | 0 | 3 | 0 | 0 | 0 | 0 | 3 | 0 | 0 |
| 1006 | 192.168.56.38 | 3 | 6 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.37 | 12 | 24 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 6 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
queue0: high prio rpc,事务相关 rpc
queue1: lock retry,锁冲突重试
queue2: normal prio rpc,普通 rpc,inner sql,sql 触发的 rpc
queue3: lock retry SQL req;OB_TASK;OB_GTS_TASK
queue4: normal SQL req;OB_SQL_TASK
queue5: warmup req
应该有6个可以运行的worker,为什么看不到?
[root@ocp ~]# sh obm as2
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
| svr_ip | id | user | tenant | db | sql_id | time | retry_cnt | command | info | user_client_ip |
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
| 192.168.56.36 | 3221588599 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 128.443141 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221617751 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 128.443264 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221598994 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 128.443415 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221594000 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 128.443523 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.37 | 3221849116 | ocp_monitor | sys | oceanbase | 2A6D6C59E6D86C57519267B5EC2CBEC3 | 0.506379 | 0 | Query | select /*+ query_timeout(10000000) */ sid as session_id, eve | 127.0.0.1 |
| 192.168.56.38 | 3222511209 | root | sys | oceanbase | 675CEF94E296A769020C364B4BF14F4A | 0.02959 | 0 | Query | select svr_ip,id,user,tenant,db,ifnull(sql_id,'') sql_id,tim | 192.168.56.43 |
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
活动会话未达到6, 就不能登录了,是不是有固定2个预留worker,不能用来执行普通请求?
used_lq_tokens(1)+req_queue_total_size(3)=4 < token_cnt(6)
cat observer.log|grep "dump tenant info(tenant={id:1006"|tr ',' '\n'
[2024-10-21 15:15:14.083539] INFO [SERVER.OMT] ob_multi_tenant.cpp:834 [3714414][0][Y0-0000000000000000-0-0] [lt=10] [dc=0] dump tenant info(tenant={id:1006
compat_mode:1
unit_min_cpu:"3.000000000000000000e+00"
unit_max_cpu:"3.000000000000000000e+00"
slice:"0.000000000000000000e+00"
slice_remain:"0.000000000000000000e+00"
token_cnt:6
sug_token_cnt:6
ass_token_cnt:6
lq_wait_timeout_workers:0
lq_tokens:1
used_lq_tokens:1
stopped:false
idle_us:925888
recv_hp_rpc_cnt:372446
recv_np_rpc_cnt:763336
recv_lp_rpc_cnt:0
recv_mysql_ps_close_cnt:0
recv_mysql_cnt:222
recv_task_cnt:64
recv_large_req_cnt:0
tt_large_quries:58110948
pop_normal_cnt:5094841
actives:6
workers:6
nesting workers:7
lq waiting workers:0
req_queue:total_size=3 queue[0]=0 queue[1]=0 queue[2]=0 queue[3]=0 queue[4]=3 queue[5]=0
large queued:0
reserve queued:0
multi_level_queue:total_size=0 queue[0]=0 queue[1]=0 queue[2]=0 queue[3]=0 queue[4]=0 queue[5]=0 queue[6]=0 queue[7]=0
recv_level_rpc_cnt:cnt[0]=0 cnt[1]=0 cnt[2]=0 cnt[3]=0 cnt[4]=0 cnt[5]=45424 cnt[6]=0 cnt[7]=0
group_map:
rpc_stat_info: pcode=0x717:cnt=397 pcode=0x710:cnt=200 pcode=0x51c:cnt=32})
[admin@ob1 log]$
下一步测试计划
调整参数再测试能否看到更多活动会话,是否最后2个worker不能使用?
调大cpu_quota_concurrency 未生效
cpu_quota_concurrency修改后未能立即生效,token_cnt增加,ass_token_cnt未增加
这次测试还到一个信息,大查询的worker比例是基于token_cnt计算的
obclient [oceanbase]> alter system set cpu_quota_concurrency=4;
Query OK, 0 rows affected (0.072 sec)
obclient [oceanbase]> alter system set workers_per_cpu_quota=8;
Query OK, 0 rows affected (0.075 sec)
obclient [oceanbase]> select distinct name,value from __all_virtual_sys_parameter_stat where name in ('workers_per_cpu_quota','cpu_quota_concurrency','px_workers_per_cpu_quota')
-> ;
+--------------------------+-------+
| name | value |
+--------------------------+-------+
| px_workers_per_cpu_quota | 10 |
| cpu_quota_concurrency | 4 |
| workers_per_cpu_quota | 8 |
+--------------------------+-------+
3 rows in set (0.035 sec)
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1004 | 192.168.56.37 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.36 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 |
| 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.38 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
cpu_quota_concurrency从2改为4,token_cnt增长到12了,为什么ass_token_cnt没有动?
cpu_quota_concurrency修改后未能立即生效
把并发调到12,重复进行前面的测试
vi test.sh
for i in {1..12}
do
obclient -h192.168.56.36 -P2881 -usys@test2 -pEnmo_2024 -c<<EOF &
with t as (select rownum n from dual connect by rownum<=100000) select count(*) from t a,t b,t c;
EOF
done;
wait
[root@ocp ~]# sh obm as2
+---------------+------------+-------------+--------+-----------+----------------------------------+-----------+-----------+---------+--------------------------------------------------------------+----------------+
| svr_ip | id | user | tenant | db | sql_id | time | retry_cnt | command | info | user_client_ip |
+---------------+------------+-------------+--------+-----------+----------------------------------+-----------+-----------+---------+--------------------------------------------------------------+----------------+
| 192.168.56.36 | 3221551599 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 18.921777 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221535022 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 18.921817 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221531880 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 18.921851 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221543406 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 18.921883 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.38 | 3222513182 | root | sys | oceanbase | 675CEF94E296A769020C364B4BF14F4A | 0.026282 | 0 | Query | select svr_ip,id,user,tenant,db,ifnull(sql_id,'') sql_id,tim | 192.168.56.43 |
| 192.168.56.38 | 3222513092 | ocp_monitor | sys | oceanbase | 2A6D6C59E6D86C57519267B5EC2CBEC3 | 2.911484 | 0 | Query | select /*+ query_timeout(10000000) */ sid as session_id, eve | 127.0.0.1 |
+---------------+------------+-------------+--------+-----------+----------------------------------+-----------+-----------+---------+--------------------------------------------------------------+----------------+
可使用以下方法获取trace_id
select svr_ip,trace_id from __all_virtual_processlist where sql_id='xxxx'
select svr_ip,trace_id from __all_virtual_processlist where id='xxxx'
[root@ocp ~]#
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1004 | 192.168.56.37 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.36 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 | 12 | 6 | 3 | 0 | 10 | 0 | 0 | 0 | 0 | 10 | 0 | 0 |
| 1006 | 192.168.56.38 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
这时虽然ass_token_cnt 还是6,但used_lq_tokens 增加到了3,req_queue_total_size增加到了10
并发运行worker没有增长,但可用的lq worker增加了,queue增加了
[root@ocp ~]# sh obm as2
+---------------+------------+------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
| svr_ip | id | user | tenant | db | sql_id | time | retry_cnt | command | info | user_client_ip |
+---------------+------------+------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
| 192.168.56.36 | 3221551599 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 211.08908 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221535022 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 211.089114 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221531880 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 211.089161 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221543406 | SYS | test2 | SYS | B384CE21C242FBA38FB4FF826A7080B6 | 211.089199 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.37 | 3221851781 | root | sys | oceanbase | 675CEF94E296A769020C364B4BF14F4A | 0.027516 | 0 | Query | select svr_ip,id,user,tenant,db,ifnull(sql_id,'') sql_id,tim | 192.168.56.43 |
+---------------+------------+------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
在另一个unit较大的租户再次进行测试,验证会预留2个worker不能使用
# 1004 test 租户测试
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1004 | 192.168.56.36 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.38 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.37 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
create or replace function test_func(wait_second in number) return number
as
begin
dbms_lock.sleep(wait_second);
return wait_second;
end;
/
for i in {1..24}
do
obclient -h192.168.56.36 -P2881 -usys@test -pEnmo_2024 -c<<EOF &
with t as (select rownum n from dual connect by rownum<=100) select sum(test_func(5)) from t;
EOF
done;
wait
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1004 | 192.168.56.36 | 12 | 48 | 24 | 14 | 0 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.37 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.38 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
used_lq_tokens(14)+req_queue_total_size(2)= 16
通过队列无法正确判断并发运行中的worker
-- 22个活动会话,也是最后2个worker无法被使用 ass_token_cnt(24)-活动会话(22)=2
[root@ocp ~]# sh obm as2
+---------------+------------+------+--------+-----------+----------------------------------+-----------+-----------+---------+--------------------------------------------------------------+----------------+
| svr_ip | id | user | tenant | db | sql_id | time | retry_cnt | command | info | user_client_ip |
+---------------+------------+------+--------+-----------+----------------------------------+-----------+-----------+---------+--------------------------------------------------------------+----------------+
| 192.168.56.36 | 3221584420 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.043479 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221572879 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.046967 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221576580 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.047764 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221557740 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.047777 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221585289 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.04987 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221581799 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.05049 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221579547 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.050655 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221549268 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.050743 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221554868 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.050785 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221562763 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.050881 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221586303 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.050897 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221569728 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.050943 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221552406 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.050945 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221572694 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.05097 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221577831 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.051235 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221708742 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.051412 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221555810 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.051475 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221581413 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.051669 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221579358 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.051845 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221572829 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.051887 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221720029 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.052159 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.36 | 3221561335 | SYS | test | oceanbase | 1B1CB04BA6F21B2561F8367C8BE88F7A | 28.053014 | 0 | Query | with t as (select rownum n from dual connect by rownum<= | 192.168.56.43 |
| 192.168.56.38 | 3222522024 | root | sys | oceanbase | 675CEF94E296A769020C364B4BF14F4A | 0.025835 | 0 | Query | select svr_ip,id,user,tenant,db,ifnull(sql_id,'') sql_id,tim | 192.168.56.43 |
+---------------+------------+------+--------+-----------+----------------------------------+-----------+-----------+---------+--------------------------------------------------------------+----------------+
可使用以下方法获取trace_id
select svr_ip,trace_id from __all_virtual_processlist where sql_id='xxxx'
select svr_ip,trace_id from __all_virtual_processlist where id='xxxx'
[root@ocp ~]#
[root@ocp ~]# obsys
无法登录
select * from __all_virtual_dump_tenant_info
where tenant_id=1004\G
obclient [oceanbase]> select * from __all_virtual_dump_tenant_info
-> where tenant_id=1004\G
*************************** 2. row ***************************
svr_ip: 192.168.56.36
svr_port: 2882
tenant_id: 1004
compat_mode: 1
unit_min_cpu: 12
unit_max_cpu: 12
slice: 0
remain_slice: 0
token_cnt: 48
ass_token_cnt: 24
lq_tokens: 14
used_lq_tokens: 2
stopped: 0
idle_us: 3507120
recv_hp_rpc_cnt: 43055525
recv_np_rpc_cnt: 45501678
recv_lp_rpc_cnt: 0
recv_mysql_cnt: 20066
recv_task_cnt: 310
recv_large_req_cnt: 2
recv_large_queries: 454297
actives: 24
workers: 24
lq_waiting_workers: 0
req_queue_total_size: 0
queue_0: 0
queue_1: 0
queue_2: 0
queue_3: 0
queue_4: 0
queue_5: 0
large_queued: 0
2 rows in set (0.031 sec)
# 前面的测试是当大查询并发高时,无法登录,如果是已经登录上的长连接,执行小查询是否受影响?
前面的测试是当大查询并发高时,无法登录,如果是已经登录上的长连接,执行小查询是否受影响?
提前登录
obclient [SYS]>
执行并发大查询
[root@ocp test]# sh test.sh
看到已经发生队列等待req_queue_total_size
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1004 | 192.168.56.37 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.36 | 12 | 48 | 24 | 14 | 0 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.38 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
在原来已经连上库上窗口,执行小查询,同样被阻塞,无法执行,登录队列2增加到3,增加的这个为刚刚发起的小查询
obclient [SYS]> select count(*) from dba_objects;
[root@ocp ~]# sh obm queue
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| tenant_id | svr_ip | unit_max_cpu | token_cnt | ass_token_cnt | used_lq_tokens | lq_waiting_workers | req_queue_total_size | queue_0 | queue_1 | queue_2 | queue_3 | queue_4 | queue_5 | large_queued |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
| 1006 | 192.168.56.38 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.36 | 12 | 48 | 24 | 14 | 0 | 3 | 0 | 0 | 0 | 0 | 3 | 0 | 0 |
| 1006 | 192.168.56.36 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1004 | 192.168.56.37 | 12 | 48 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1006 | 192.168.56.37 | 3 | 12 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----------+---------------+--------------+-----------+---------------+----------------+--------------------+----------------------+---------+---------+---------+---------+---------+---------+--------------+
[root@ocp ~]# sh obm as2
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
| svr_ip | id | user | tenant | db | sql_id | time | retry_cnt | command | info | user_client_ip |
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
| 192.168.56.36 | 3221608445 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.099773 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221604096 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.100249 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221610132 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101039 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221596916 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101338 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221607117 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101355 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221609368 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101423 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221601517 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101603 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221591591 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101705 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221610301 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101782 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221603451 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101818 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221612605 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101933 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221583930 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101952 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221588358 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.101986 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221605576 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102175 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221607591 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102185 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221604791 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102193 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221603794 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102234 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221610659 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102287 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221561649 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102416 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221557310 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102453 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221598819 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102535 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.36 | 3221584417 | SYS | test | oceanbase | FF0D04A1B1C77C3799B47FF5FB3BAB85 | 320.102605 | 0 | Query | with t as (select rownum n from dual connect by rownum<=20 | 192.168.56.43 |
| 192.168.56.37 | 3221865739 | ocp_monitor | sys | oceanbase | 2A6D6C59E6D86C57519267B5EC2CBEC3 | 1.024772 | 0 | Query | select /*+ query_timeout(10000000) */ sid as session_id, eve | 127.0.0.1 |
| 192.168.56.38 | 3222525230 | root | sys | oceanbase | 675CEF94E296A769020C364B4BF14F4A | 0.026861 | 0 | Query | select svr_ip,id,user,tenant,db,ifnull(sql_id,'') sql_id,tim | 192.168.56.43 |
+---------------+------------+-------------+--------+-----------+----------------------------------+------------+-----------+---------+--------------------------------------------------------------+----------------+
查看活跃会话,可以发现刚刚执行的小查询并未出现在活动会话中,也就是在队列等待时,SQL还未开始执行,不算活跃会话
总结与思考
- large_query_worker_percentage` 无法避免慢SQL把worker耗尽,导致ob无法处理新请求
- 行锁冲突等待时,需要回滚掉当前DML,然后让出线程,避免长时间等待消耗活跃线程,处理能力下降,所以active查询不到
- 合理的设置相关参数,可以避免过度消耗OS资源导致服务中断,同时也要避免过于保守active worker不足降低处理能力
- 要合理利用ob sql限流功能,限制慢SQL并行,避免导致active work不足导致无法处理正常业务请求
- 要理解ob sql限流机制,限流数量基于每个plan_id计数,如某个SQL_ID 限流10,但由于plan cache未能共享,存在3个plan,某该SQL可以达到30并发。
- cpu_quota_concurrency修改后未能立即生效,token_cnt增加,ass_token_cnt未增加,大查询的worker是基于token_cnt计算,也就是cpu_quota_concurrency调大后,实际可用的worker不会增加,但可用于大查询的worker增加了




