暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

MySQL主从数据不一致-BUG

原创 冯刚 2022-10-13
1824

前言

前两天看到 Kevin崔 的文章(下面链接,转发请注明),提到MySQL主从数据不一致的几种情况,里面提到MySQL自身bug的原因,但是没展开。

MySQL 主从数据不一致有哪些情况:https://www.modb.pro/db/500145

想到之前碰到过的案例,高度怀疑是MySQL BUG导致,记录一下。

MySQL基于行的复制Row-Based Replication (RBR) 下,如果一张没有主键的表被并发更新,过一段时间新增一个自增字段作为主键,会导致主从该表数据不一致。

1 环境信息

RDS MySQL:5.7.30

2 错误描述

业务反馈生产RDS数据库有张表,主从数据不一致。

主库:

mysql> select * from t1 where v_template_id='1031';
+-------+---------------------+--------+
| id    |    v_template_id    | sku_id |
+-------+---------------------+--------+
| 70238 |                1031 |  23363 |
+-------+---------------------+--------+
1 row in set (10.87 sec)

从库1和从库2和从库3:

mysql> select * from t1 where v_template_id='1031';
+-------+---------------------+--------+
| id    |     v_template_id   | sku_id |
+-------+---------------------+--------+
| 71294 |                1031 |  23363 |
+-------+---------------------+--------+
1 row in set (0.01 sec)

可以看到,三个从库t1表的数据和主库不一致。比如上面 v_template_id='1031',对应主键ID不一致。

3 排查

经过统计,这种情况大约有上千条。

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `v_template_id` bigint(16) NOT NULL,
  `sku_id` bigint(16) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `sku_id` (`sku_id`)
) ENGINE=InnoDB AUTO_INCREMENT=47627822 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

4 原因

后面推测是MySQL BUG导致。

4.1 bug名称

Bug #92949 : add auto_increment column as PK cause RBR replication inconsistent

MySQL BUG链接:https://bugs.mysql.com/bug.php?id=92949

4.2 bug描述

Description:
This issue comes from one of our customers.
Under RBR replication, If a table without PK is concurrently updated, and an auto_increment column is added to this table as PK after a while of updating. The table on master and slave will be inconsistent with each other.

4.3 复现

## Use two concurrent connecitons on master
## Insert in each connection, and make sure the insert order is different
## from commit order.

步骤1 主库会话1操作

create table t1(tid int, name varchar(30), index idx_tid(tid)) engine=InnoDB;
begin;
insert into t1 values (1, 'fun');

步骤2 主库会话2操作

begin;
insert into t1 values (2, 'go');
commit;

步骤3 主库会话1操作

commit;

步骤4 主库会话1和从库会话操作

## 主库会话1	
[root@localhost] {16:28:51} (test) [13]> select * from t1;


## 从库会话
[root@localhost] {16:32:09} (test) [13]> select * from t1;


步骤5 主库会话1操作

alter table t1 add column id int not null auto_increment primary key;

步骤6 主库会话1和从库会话操作

## 主库会话1	
[root@localhost] {16:28:51} (test) [13]> select * from t1;


## 从库会话
[root@localhost] {16:32:09} (test) [13]> select * from t1;


步骤7 可模拟更多的delete和update操作

第4步的时候,主从查到的数据条数一致,但是顺序不一致;第5步新增自增主键id后,第6步主从上查到的数据已经不一样了。tid相当于生产案例中的v_template_id,此时已经复现了BUG。

4.4 原因

Suggested fix:
I think there are 2 points that cause this inconsistency.

1. For table without PK, the internal cluster index is ordered as insert order. In the above scenario, master and slave have different order, so the data is organized in different order, which is not a big deal in normal case. 
But when the auto_incremental column is added, and PK is create on this column, the whole table is rebuild. The auto_increment value is assigned monotonically increasing, corresponding to the internal cluster index order. So the same auto_increment PK value mapped to different business data.

2. For RBR replication, if PK exists, Update/Delete row lookup is based on PK values, they other columns are not considered, as long as we can fetch data from engine according PK.

I'm not sure whether this is a bug or incorrect usage of MySQL.
But I think we can do some improvements to the RBR row lookup logic, if we found match PK, but they other fields are mismatching, an error or warning should be emitted in error log to notify user about such inconsistency.

5 解决方案

生产问题已经出现,只能手动修复数据。

对于无主键的表,如果要新增自增主键,需要按下面方法操作

# Instead of ALTER TABLE t ADD COLUMN c INT NOT NULL AUTO_INCREMENT PRIMARY KEY:
    ALTER TABLE t ADD COLUMN c INT NOT NULL;
    SET @c = 0;
    UPDATE t SET c = (@c := @c + 1);
    ALTER TABLE t ADD PRIMARY KEY(c);
    ALTER TABLE t CHANGE c c INT NOT NULL AUTO_INCREMENT;

The update will generate row events, so slave will apply this deterministically.


最后修改时间:2022-10-17 16:16:35
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论