概述
OCEANBASE高可用可以做到分区级,相当多的操作都是分区级的,在某些情况主分区切主不影响事务的继续运行,下面测试验证什么情况下切主事务可以继续,无感知。
相关参数
trx_force_kill_threshold 用于设置因冻结或切主需要杀事务时的最长等待时间。
| 属性 | 描述 |
|---|---|
| 参数类型 | 时间类型 |
| 默认值 | 100ms |
| 取值范围 | [1ms, 10s] |
| 是否重启 OBServer 生效 | 否 |
OceanBase 数据库 V3.2 版本对于分布式事务中的 Leader 切换做了优化,尽量避免 Leader 切换导致的事务回滚。优化内容为:
- 在进行有主改选时,如果旧的 Leader 上正在执行的事务日志能在规定的时间内(
trx_force_kill_threshold, default=100ms)把同步到新 Leader 上,则新 Leader 上任后可以继续推进事务的执行。 - 如果不能在规定的时间内完成事务日志的同步,则需要回滚正在执行的事务。
该机制是否只能用在原主副本可用时,计划内切主,或内部操作预期内切主的场景?
测试切主是否对小事务无影响
-
创建测试表
---用户租户(oracle模式) create table tabha(id number,c varchar2(100)); insert into tabha(id,c) select rownum,'test'||rownum from dual connect by rownum<=10000; commit; create index idx_tabha on tab1(id); --sys租户 MySQL [oceanbase]> alter system major freeze; Query OK, 0 rows affected (0.016 sec) -
修改查询,事务超时时间为7200,避免操作过程中超时
set ob_query_timeout=7200000000; set ob_trx_idle_timeout=7200000000; set ob_trx_lock_timeout=7200000000; set ob_trx_timeout=7200000000; obclient [SYS]> show variables like '%timeout%'; +---------------------+------------------+ | VARIABLE_NAME | VALUE | +---------------------+------------------+ | connect_timeout | 10 | | interactive_timeout | 28800 | | net_read_timeout | 30 | | net_write_timeout | 60 | | ob_pl_block_timeout | 3216672000000000 | | ob_query_timeout | 7200000000 | | ob_trx_idle_timeout | 7200000000 | | ob_trx_lock_timeout | 7200000000 | | ob_trx_timeout | 7200000000 | | wait_timeout | 259200 | +---------------------+------------------+ 10 rows in set (0.004 sec) -
执行一条UPDATE,不提交
update tabha set c=c where id=100; obclient [SYS]> update tabha set c=c where id=100; Query OK, 1 row affected (0.005 sec) Rows matched: 1 Changed: 1 Warnings: 0 -
查询tabha 主副本位置
select * from __all_virtual_meta_table where table_id in (select table_id from __all_virtual_table where table_name='tabha') and role=1\G MySQL [oceanbase]> select * from __all_virtual_meta_table where table_id in (select table_id from __all_virtual_table where table_name='tabha') and role=1\G *************************** 1. row *************************** tenant_id: 1001 table_id: 1100611139454075 partition_id: 0 svr_ip: 192.168.56.36 svr_port: 2882 gmt_create: 2024-06-05 10:45:56.786010 gmt_modified: 2024-06-05 10:47:14.871100 sql_port: 2881 unit_id: 1002 partition_cnt: 0 zone: zone1 role: 1 member_list: 192.168.56.36:2882:1717555556783731,192.168.56.37:2882:1717555556783731,192.168.56.38:2882:1717555556783731 row_count: 10000 data_size: 52933 data_version: 212 data_checksum: 263436233 row_checksum: 0 column_checksum: is_original_leader: 0 is_previous_leader: 1717555556815486 create_time: 0 rebuild: 0 replica_type: 0 required_size: 2097152 status: REPLICA_STATUS_NORMAL is_restore: 0 partition_checksum: 0 quorum: 3 fail_list: recovery_timestamp: 0 memstore_percent: 100 data_file_id: 1700968058986177 1 row in set (0.037 sec) -
对表tabha进行手工切主操作(SWITCH REPLICA LEADER)
MySQL [oceanbase]> select *from __all_virtual_election_info where table_id=1100611139454075 and role=1\G *************************** 1. row *************************** svr_ip: 192.168.56.36 svr_port: 2882 table_id: 1100611139454075 partition_idx: 0 is_running: 1 is_changing_leader: 0 current_leader: "192.168.56.36:2882" previous_leader: 0.0.0.0 proposal_leader: 0.0.0.0 member_list: 3{server:"192.168.56.36:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.37:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.38:2882", timestamp:1717555556783731, flag:0, need_encrypt:false} replica_num: 3 lease_start: 1717555561250000 lease_end: 1717555571450000 time_offset: 4050000 active_timestamp: 1717555556791050 T1_timestamp: 1717555557200000 leader_epoch: 1717555551600000 state: 1 role: 1 stage: 1 eg_id: 6565001008975067605 remaining_time_in_blacklist: 0 1 row in set (0.048 sec) ALTER SYSTEM SWITCH REPLICA LEADER PARTITION_ID '0%0@1100611139454075' SERVER '192.168.56.36:2882' ; MySQL [oceanbase]> ALTER SYSTEM SWITCH REPLICA LEADER PARTITION_ID '0%0@1100611139454075' SERVER '192.168.56.36:2882' ; Query OK, 0 rows affected (0.006 sec) MySQL [oceanbase]> select *from __all_virtual_election_info where table_id=1100611139454075 and role=1\G *************************** 1. row *************************** svr_ip: 192.168.56.36 svr_port: 2882 table_id: 1100611139454075 partition_idx: 0 is_running: 1 is_changing_leader: 0 current_leader: "192.168.56.36:2882" previous_leader: 0.0.0.0 proposal_leader: 0.0.0.0 member_list: 3{server:"192.168.56.36:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.37:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.38:2882", timestamp:1717555556783731, flag:0, need_encrypt:false} replica_num: 3 lease_start: 1717555561250000 lease_end: 1717555571450000 time_offset: 4050000 active_timestamp: 1717555556791050 T1_timestamp: 1717555557200000 leader_epoch: 1717555551600000 state: 1 role: 1 stage: 1 eg_id: 6565001008975067605 remaining_time_in_blacklist: 0 1 row in set (0.047 sec) -
事务继续update仍然不会中断
obclient [SYS]> update tabha set c=c where id=101; Query OK, 1 row affected (0.005 sec) Rows matched: 1 Changed: 1 Warnings: 0 -
primary_zone从’zone1;zone2;zone3’改为’zone2;zone1;zone3’,发生主副本所在节点变化
alter table tabha set primary_zone='zone2;zone1;zone3'; obclient [SYS]> alter table tabha set primary_zone='zone2;zone1;zone3'; Query OK, 0 rows affected (0.111 sec) MySQL [oceanbase]> select *from __all_virtual_election_info where table_id=1100611139454075 and role=1\G *************************** 1. row *************************** svr_ip: 192.168.56.37 svr_port: 2882 table_id: 1100611139454075 partition_idx: 0 is_running: 1 is_changing_leader: 0 current_leader: "192.168.56.37:2882" <<< previous_leader: 192.168.56.36 <<< proposal_leader: 0.0.0.0 member_list: 3{server:"192.168.56.36:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.37:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.38:2882", timestamp:1717555556783731, flag:0, need_encrypt:false} replica_num: 3 lease_start: 1717556607050000 lease_end: 1717556617250000 time_offset: 4050000 active_timestamp: 1717555556790992 T1_timestamp: 1717556603000000 leader_epoch: 1717556597400000 state: 1 role: 1 stage: 1 eg_id: 11361271041241770193 remaining_time_in_blacklist: 0 1 row in set (0.047 sec) -
事务继续update仍然不会中断
obclient [SYS]> update tabha set c=c where id=102; Query OK, 1 row affected (0.018 sec) Rows matched: 1 Changed: 1 Warnings: 0 -
进行大事务测试,大事务切主后不能继续
insert 300万行数据 obclient [SYS]> insert into tabha select rownum,'test'||rownum from dual connect by rownum<=3000000; Query OK, 3000000 rows affected (49.525 sec) Records: 3000000 Duplicates: 0 Warnings: 0 切主 MySQL [oceanbase]> ALTER SYSTEM SWITCH REPLICA LEADER PARTITION_ID '0%0@1100611139454075' SERVER '192.168.56.36:2882' ; Query OK, 0 rows affected (0.004 sec) 继续insert会提示事务需要rollback; obclient [SYS]> insert into tabha select rownum,'test'||rownum from dual connect by rownum<=3000000; ORA-24761: transaction rolled back: transaction needs rollback obclient [SYS]> rollback; Query OK, 0 rows affected (0.001 sec) -
正在执行中的DML会失败
obclient [SYS]> update tabha set c=c where id=1 and (select avg(dbms_random.random()) from dual connect by rownum<100000)<100; MySQL [oceanbase]> ALTER SYSTEM SWITCH REPLICA LEADER PARTITION_ID '0%0@1100611139454075' SERVER '192.168.56.36:2882' ; Query OK, 0 rows affected (0.004 sec) obclient [SYS]> update tabha set c=c where id=1 and (select avg(dbms_random.random()) from dual connect by rownum<100000)<100; ORA-24761: transaction rolled back: transaction context does not exist
测试结论
1、小事务未提交(初步定义为日志传输小于100ms),涉及的分区发生切主,事务可以继续
2、大事务,涉及的分区发生切主,事务需要回滚
3、正在进行中的DML,涉及的分区发生切主,会中断,事务需要回滚
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




