MySQL ProxySQL 分片1

DBA圈 2021-04-13

891

本文演示了MySQL和ProxySQL分片是如何工作的。

最近我的一位同事让我写一个关于ProxySQL如何进行分片简单的例子。

作为回应，我写这个简短的教程，希望它能很好的说明ProxySQL分片功能，并帮助人们更好地理解如何使用它。

ProxySQL是一个功能强大的平台，它能让我们用一个简单而有效的方法操作和管理数据库连接和查询。本文将向您展示ProxySQL是如何做到的。

开始前，让我们理解一些基本概念。

ProxySQL组织其内部设置的服务器主机组(HG)，并且每个HG可以与users和查询规则(Query Rules:QR)关联；
每个QR可以作为结尾(apply= 1)，或者让ProxySQL继续解析其他QRs；
QR可以是重写操作，可以是一个简单的匹配，可以有一个特定的目标主机组(HG)，或者是通用的；
QRs使用正则表达式定义。

可以看到，查询规则(QRs)像一个的过滤器和转换的序列，你可以随意编排它。

这些简单的基本规则给我们巨大的灵活性。它允许我们创建非常简单的动作，比如一个简单的查询、重写，或者非常复杂的数十个QR的连接链。可以去here找到相关文档。

HGs和QRs的相关信息可以使用ProxySQL管理员接口方便的访问，在表mysql_servers, mysql_query_rules stats.stats_mysql_query_rules。最后一张表可以评估这些规则是否以及如何使用的( The last one allows us to evaluate if and how the rule(s) is used )。

至于分片，ProxySQL能做些什么来帮助我们实现我们需要的(以一个相对简单的方法)？一些程序员和企业在应用程序中引入分片逻辑，使用多个连接达到不同的目标，或者使一些逻辑跨多个schema/table分割负载。ProxySQL allows us to simplify the way connectivity and query distribution is supposed to work reading data in the query or accepting HINTS.

不管是什么需求，分片(sharding)可以归纳为一下几个不同的类别：

通过分割相同的容器中的数据(如有碎片的状态，每个状态都是一个模式)
通过物理数据位置(在同一个空间里可以有多个MySQL服务器，页可以是在地理上的分布)
两者的结合，使用专用服务器通过State做分割，同时随意分割schema/table (say by gender)

在下面的例子中，我将展示如何使用ProxySQL完成上面定义的三种不同的场景(并多加了一些情况)。

下面的例子将从管理ProxySQL接口和MySQL控制台给出报告文本。我将做如下标记：

Mc作为MySQL console
Pa 作为ProxySQL Admin

请注意，MySQL控制台必须使用 - c标志通过查询中的comments。这是因为MySQL控制台的默认行为是删除comments。

我将演示整个过程，你也可以在笔记本电脑上做同样的实验，并在可能的情况下我将提到真正的实现过程。因为我想让你直接测试ProxySQL功能。

对于下面描述的示例，我将使用ProxySQL v1.2.2版本，它将在不久之后成为主流。你可以去下面的地址下载：

git clone https://github.com/sysown/proxysql.git

git checkout v1.2.2

编译：

cd <path to proxy source code>

make

make install

如果你需要如何安装和配置ProxySQL完整的说明，请阅读 here和here

最后，您需要加载WORLD test DB。WORLD test DB可以去here找到。

Shard inside the same MySQL Server using three different schemas split by continent

显然，您可以拥有任意数量的分片和相关的schema。这里将演示流量(traffic)如何被重定向(redirected)到不同的目标(schemas)，并保持相同的结构(表)，基于数据中相关信息识别目标，或者通过应用程序。

OK, let us roll the ball.（让我们滚球）
[Mc]
+---------------+-------------+
| Continent | count(Code) |
+---------------+-------------+
| Asia | 51 | <--
| Europe | 46 | <--
| North America | 37 |
| Africa | 58 | <--
| Oceania | 28 |
| Antarctica | 5 |
| South America | 14 |
+---------------+-------------+
对于这个练习，我将使用3个主机。

总结一下，我将需要：

3台主机：168.1.[5-6-7]
3个schema：Continent X + world schema
1个用户：user_shardRW
3个主机组：10,20,30（以后会用到）

首先，创建schema：Asia, Africa, Europe:
[Mc]
Create schema [Asia|Europe|North_America|Africa];
create table Asia.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Asia' ;
create table Europe.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Europe' ;
create table Africa.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='Africa' ;
create table North_America.City as select a.* from world.City a join Country on a.CountryCode = Country.code where Continent='North America' ;
create table Asia.Country as select * from world.Country where Continent='Asia' ;
create table Europe.Country as select * from world.Country where Continent='Europe' ;
create table Africa.Country as select * from world.Country where Continent='Africa' ;
create table North_America.Country as select * from world.Country where Continent='North America' ;

再创建user:
grant all on *.* to user_shardRW@'%' identified by 'test';

现在，可以开始配置ProxySQL了：
[Pa]
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('user_shardRW','test',1,10,'test_shard1');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',10,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',20,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',30,3306,100);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
到此，我们已经定义了用户，服务器和主机组。

现在开始定义查询规则(query rules)的逻辑：
[Pa]
delete from mysql_query_rules where rule_id > 30;
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (31,1,'user_shardRW',"^SELECT\s*(.*)\s*from\s*world.(\S*)\s(.*).*Continent='(\S*)'\s*(\s*.*)$","SELECT \1 from \4.\2 WHERE 1=1 \5",1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
我现在去查询主节点（或某个单节点），但是我希望ProxySQL能重定向查询到正确的分片，获取continent的值：
[Mc]
SELECT name,population from world.City WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
+---------+---------------+
| name | population |
+---------+------------+
| Roma | 2643581 |
+---------+---------------+
你也许会说：“你在查询World schema，当然会得到正确的数据。”

事实是这并没有发生，ProxySQL并没有去查询World库，而是查了Europe库。

让我们看一下细节：

[Pa]
select * from stats_mysql_query_digest;
Original :SELECT name,population from world.City WHERE Continent='Europe' and CountryCode='ITA' order by population desc limit 1;
Transformed :SELECT name,population from Europe.City WHERE ?=? and CountryCode=? order by population desc limit ?
让我来解释一下发生了什么。

ProxySQL中的31号规则将获取我们需要查询的所有字段，它可以在WHERE子句中获取CONTINENT字段，它将采用WHERE后面的所有条件，并用正则重组所有的queries。

这种机制适用于所有的分片(schema)吗？答案是肯定的。

像这样的查询：SELECT name,population from world.Country WHERE Continent='Asia' ;

首先，禁用刚才插入的规则，这不是必须的，但我会这样做以便你们能理解整个过程。
[Pa]
mysql> update mysql_query_rules set active=0 where rule_id=31;
Query OK, 1 row affected (0.00 sec)
mysql> LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)
Done.

现在我所希望的是：对包含注释/* continent=X */的所有查询，都指向continent X 库，相同的服务器。

为此，我让ProxySQL替换所有查询中有引用World库的语句。
[Pa]
delete from mysql_query_rules where rule_id in (31,33,34,35,36);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (31,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Asia\s*\*.*",null,0,23,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (32,1,'user_shardRW','world.','Asia.',0,23,23);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (33,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Europe\s*\*.*",null,0,25,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (34,1,'user_shardRW','world.','Europe.',0,25,25);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (35,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Africa\s*\*.*",null,0,24,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (36,1,'user_shardRW','world.','Africa.',0,24,24);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
它是怎么工作的呢？

我定义了2个级联规则。

第一条规则：抓取包含有期望值的查询(如continent = Asia)，如果匹配了，ProxySQL退出此查询，同时去查询Apply字段。如果Apply值为0，就去读FlagOUT的值。此时会应用第一条规则(FlagIN=FlagOUT)。

第二条规则：获取请求，并用我定义的值替换world的值。简单来说，它将用replace_pattern里的值替换任何match_pattern里的值。

ProxySQL用正则实现了Re2 Google library。Re2速度非常快但有一些局限性，比如它不支持flag option g。换句话说，如果我select多张表，也就有多个”world”，Re2将只替换第一个实例。

因此，一个这样的查询：

Select * continent=Europe */ * from world.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

将被转换为：

Select * continent=Europe */ * from Europe.Country join world.City on world.City.CountryCode=world.Country.Code where Country.code='ITA' ;

并且查询失败。

第二天，Rene和我讨论如何解决Re2这个问题。最后，我们选择了递归操作

这是什么意思呢？这以为着ProxySQL v1.2.2版本现在有一个新功能，即允许递归操作调用查询规则。ProxySQL可以运行的最大迭代数由全局变量mysql-query_processor_iterations管理。mysql-query_processor_iterations定义了一个查询进程可以执行多少个操作（从开始到结束）。

这个新的实现方法允许一个查询规则引用其本身，从而多次执行。

如果你回去看你会注意到QR 34使 FlagIN和FlagOUT指向相同的值25和Apply= 0。这让ProxySQL递归地调用规则34直到它改变world这个词的所有值。

由于微信字数限制，其他内容请款下篇

mysql

文章转载自DBA圈，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

MySQL ProxySQL 分片1

Shard inside the same MySQL Server using three different schemas split by continent

评论