暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

OCEANBASE分区切主对事务的影响

概述

OCEANBASE高可用可以做到分区级,相当多的操作都是分区级的,在某些情况主分区切主不影响事务的继续运行,下面测试验证什么情况下切主事务可以继续,无感知。

相关参数

trx_force_kill_threshold 用于设置因冻结或切主需要杀事务时的最长等待时间。

属性 描述
参数类型 时间类型
默认值 100ms
取值范围 [1ms, 10s]
是否重启 OBServer 生效

OceanBase 数据库 V3.2 版本对于分布式事务中的 Leader 切换做了优化,尽量避免 Leader 切换导致的事务回滚。优化内容为:

  • 在进行有主改选时,如果旧的 Leader 上正在执行的事务日志能在规定的时间内(trx_force_kill_threshold, default=100ms)把同步到新 Leader 上,则新 Leader 上任后可以继续推进事务的执行。
  • 如果不能在规定的时间内完成事务日志的同步,则需要回滚正在执行的事务。

该机制是否只能用在原主副本可用时,计划内切主,或内部操作预期内切主的场景?

测试切主是否对小事务无影响

  1. 创建测试表

    ---用户租户(oracle模式)
    create table tabha(id number,c varchar2(100));
    insert into tabha(id,c) select rownum,'test'||rownum from dual connect by rownum<=10000;
    commit;
    create index idx_tabha on tab1(id);
    --sys租户
    MySQL [oceanbase]> alter system major freeze;
    Query OK, 0 rows affected (0.016 sec)
    
    
  2. 修改查询,事务超时时间为7200,避免操作过程中超时

    set ob_query_timeout=7200000000;
    set ob_trx_idle_timeout=7200000000;
    set ob_trx_lock_timeout=7200000000;
    set ob_trx_timeout=7200000000;
    
    obclient [SYS]> show variables like '%timeout%';
    +---------------------+------------------+
    | VARIABLE_NAME       | VALUE            |
    +---------------------+------------------+
    | connect_timeout     | 10               |
    | interactive_timeout | 28800            |
    | net_read_timeout    | 30               |
    | net_write_timeout   | 60               |
    | ob_pl_block_timeout | 3216672000000000 |
    | ob_query_timeout    | 7200000000       |
    | ob_trx_idle_timeout | 7200000000       |
    | ob_trx_lock_timeout | 7200000000       |
    | ob_trx_timeout      | 7200000000       |
    | wait_timeout        | 259200           |
    +---------------------+------------------+
    10 rows in set (0.004 sec)
    
    
    
  3. 执行一条UPDATE,不提交

    update tabha set c=c where id=100;
    
    
    obclient [SYS]> update tabha set c=c where id=100;
    Query OK, 1 row affected (0.005 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    
    
    
  4. 查询tabha 主副本位置

    select * from __all_virtual_meta_table where table_id in (select table_id from __all_virtual_table where table_name='tabha') and role=1\G
    
    MySQL [oceanbase]> select * from __all_virtual_meta_table where table_id in (select table_id from __all_virtual_table where table_name='tabha') and role=1\G
    *************************** 1. row ***************************
             tenant_id: 1001
              table_id: 1100611139454075
          partition_id: 0
                svr_ip: 192.168.56.36
              svr_port: 2882
            gmt_create: 2024-06-05 10:45:56.786010
          gmt_modified: 2024-06-05 10:47:14.871100
              sql_port: 2881
               unit_id: 1002
         partition_cnt: 0
                  zone: zone1
                  role: 1
           member_list: 192.168.56.36:2882:1717555556783731,192.168.56.37:2882:1717555556783731,192.168.56.38:2882:1717555556783731
             row_count: 10000
             data_size: 52933
          data_version: 212
         data_checksum: 263436233
          row_checksum: 0
       column_checksum:
    is_original_leader: 0
    is_previous_leader: 1717555556815486
           create_time: 0
               rebuild: 0
          replica_type: 0
         required_size: 2097152
                status: REPLICA_STATUS_NORMAL
            is_restore: 0
    partition_checksum: 0
                quorum: 3
             fail_list:
    recovery_timestamp: 0
      memstore_percent: 100
          data_file_id: 1700968058986177
    1 row in set (0.037 sec)
    
    
  5. 对表tabha进行手工切主操作(SWITCH REPLICA LEADER)

    MySQL [oceanbase]> select *from __all_virtual_election_info where table_id=1100611139454075 and role=1\G
    *************************** 1. row ***************************
                         svr_ip: 192.168.56.36
                       svr_port: 2882
                       table_id: 1100611139454075
                  partition_idx: 0
                     is_running: 1
             is_changing_leader: 0
                 current_leader: "192.168.56.36:2882"
                previous_leader: 0.0.0.0
                proposal_leader: 0.0.0.0
                    member_list: 3{server:"192.168.56.36:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.37:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.38:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}
                    replica_num: 3
                    lease_start: 1717555561250000
                      lease_end: 1717555571450000
                    time_offset: 4050000
               active_timestamp: 1717555556791050
                   T1_timestamp: 1717555557200000
                   leader_epoch: 1717555551600000
                          state: 1
                           role: 1
                          stage: 1
                          eg_id: 6565001008975067605
    remaining_time_in_blacklist: 0
    1 row in set (0.048 sec)
    
    
    ALTER SYSTEM SWITCH REPLICA LEADER PARTITION_ID  '0%0@1100611139454075' SERVER  '192.168.56.36:2882' ;
    
    
    
    MySQL [oceanbase]> ALTER SYSTEM SWITCH REPLICA LEADER PARTITION_ID  '0%0@1100611139454075' SERVER  '192.168.56.36:2882' ;
    Query OK, 0 rows affected (0.006 sec)
    
    
    MySQL [oceanbase]> select *from __all_virtual_election_info where table_id=1100611139454075 and role=1\G
    *************************** 1. row ***************************
                         svr_ip: 192.168.56.36
                       svr_port: 2882
                       table_id: 1100611139454075
                  partition_idx: 0
                     is_running: 1
             is_changing_leader: 0
                 current_leader: "192.168.56.36:2882"
                previous_leader: 0.0.0.0
                proposal_leader: 0.0.0.0
                    member_list: 3{server:"192.168.56.36:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.37:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.38:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}
                    replica_num: 3
                    lease_start: 1717555561250000
                      lease_end: 1717555571450000
                    time_offset: 4050000
               active_timestamp: 1717555556791050
                   T1_timestamp: 1717555557200000
                   leader_epoch: 1717555551600000
                          state: 1
                           role: 1
                          stage: 1
                          eg_id: 6565001008975067605
    remaining_time_in_blacklist: 0
    1 row in set (0.047 sec)
    
  6. 事务继续update仍然不会中断

    obclient [SYS]> update tabha set c=c where id=101;
    Query OK, 1 row affected (0.005 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
  7. primary_zone从’zone1;zone2;zone3’改为’zone2;zone1;zone3’,发生主副本所在节点变化

    alter table tabha set primary_zone='zone2;zone1;zone3';
    
    
    obclient [SYS]> alter table tabha set primary_zone='zone2;zone1;zone3';
    Query OK, 0 rows affected (0.111 sec)
    
    MySQL [oceanbase]> select *from __all_virtual_election_info where table_id=1100611139454075 and role=1\G
    *************************** 1. row ***************************
                         svr_ip: 192.168.56.37
                       svr_port: 2882
                       table_id: 1100611139454075
                  partition_idx: 0
                     is_running: 1
             is_changing_leader: 0
                 current_leader: "192.168.56.37:2882" <<<
                previous_leader: 192.168.56.36 <<<
                proposal_leader: 0.0.0.0
                    member_list: 3{server:"192.168.56.36:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.37:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}{server:"192.168.56.38:2882", timestamp:1717555556783731, flag:0, need_encrypt:false}
                    replica_num: 3
                    lease_start: 1717556607050000
                      lease_end: 1717556617250000
                    time_offset: 4050000
               active_timestamp: 1717555556790992
                   T1_timestamp: 1717556603000000
                   leader_epoch: 1717556597400000
                          state: 1
                           role: 1
                          stage: 1
                          eg_id: 11361271041241770193
    remaining_time_in_blacklist: 0
    1 row in set (0.047 sec)
    
    
  8. 事务继续update仍然不会中断

    obclient [SYS]> update tabha set c=c where id=102;
    Query OK, 1 row affected (0.018 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    
  9. 进行大事务测试,大事务切主后不能继续

    insert 300万行数据
    obclient [SYS]> insert into tabha select rownum,'test'||rownum from dual  connect by rownum<=3000000;
    Query OK, 3000000 rows affected (49.525 sec)
    Records: 3000000  Duplicates: 0  Warnings: 0
    
    切主
    MySQL [oceanbase]> ALTER SYSTEM SWITCH REPLICA LEADER PARTITION_ID  '0%0@1100611139454075' SERVER  '192.168.56.36:2882' ;
    Query OK, 0 rows affected (0.004 sec)
    
    继续insert会提示事务需要rollback;
    obclient [SYS]> insert into tabha select rownum,'test'||rownum from dual  connect by rownum<=3000000;
    ORA-24761: transaction rolled back: transaction needs rollback
    
    obclient [SYS]> rollback;
    Query OK, 0 rows affected (0.001 sec)
    
    
  10. 正在执行中的DML会失败

    obclient [SYS]> update tabha set c=c where id=1 and (select  avg(dbms_random.random()) from dual connect by rownum<100000)<100;
    
    
    
    MySQL [oceanbase]> ALTER SYSTEM SWITCH REPLICA LEADER PARTITION_ID  '0%0@1100611139454075' SERVER  '192.168.56.36:2882' ;
    Query OK, 0 rows affected (0.004 sec)
    
    obclient [SYS]> update tabha set c=c where id=1 and (select  avg(dbms_random.random()) from dual connect by rownum<100000)<100;
    ORA-24761: transaction rolled back: transaction context does not exist
    
    
    

测试结论

1、小事务未提交(初步定义为日志传输小于100ms),涉及的分区发生切主,事务可以继续

2、大事务,涉及的分区发生切主,事务需要回滚

3、正在进行中的DML,涉及的分区发生切主,会中断,事务需要回滚

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论