别裁员!别裁员!别裁员! 一招降低IT数据库成本

digoal 2020-02-14

409

作者

digoal

日期

2020-02-14

背景

2020疫情无情，多数企业因此受挫，特别中小企业，甚至到了要裁员的地步, 但是人才是最宝贵的，裁员一定是下下策，如何渡过这个难关，疫情带给我们什么反思?

开源节流有新方法，通常数据库在企业IT支出中的占比将近一半，降低数据库成本对降低企业IT成本效果明显，但是一般企业没有专业DBA，很难在这方面下手，不过没关系，有了云厂商，一切变得简单。借助阿里云我们找到了可以为企业IT节省至少一倍成本的方法.

到底时什么方法呢? 回顾一下年前做的一系列MySQL+PG联合解决方案的课程.

《阿里云 RDS PostgreSQL+MySQL 联合解决方案课程 - 汇总视频、课件》

在众多数据库中, PG是一个企业级的开源数据库, 各方面的功能与Oracle对齐, 适合范围广, 能处理的数据量庞大. 采用PG的大型企业例如平安，邮储银行，阿里，华为，中兴，人保, 招商, 富士康, 苹果, SAP, saleforce等以及全球财富1000强等众多企业。《外界对PostgreSQL 的评价》

阿里云RDS PG的优势:

支持完整生命周期管理,包括高可用, 容灾, 备份, 安全, 审计, 加密, cloud dba等模块, 大幅降低企业的使用和管理成本.
专业内核和DBA团队 7*24小时服务.
支持并行计算，LLVM，GPU加速，向量计算，分析能力更强。
PG的优化器强大，应对复杂SQL处理效率更高，适合复杂业务场景, 更适合新零售、制造业、工业、在线教育、游戏、金融、政府、企业ERP等行业或领域。
内核扩展, 根据垂直领域的需求定制化。
- Ganos插件, GIS功能更强更专业，支持平面、球面几何，栅格，时空轨迹，点云，拓扑网络模型。
- pase插件, 支持高维向量搜索, 支持精确的图像搜索, 人脸识别, 相似查询.
- roaringbitmap插件, 支持实时大数据用户画像, 精准营销.
- rdkit插件, 支持化学分析, 分子式的相似搜索, 化学机器学习等.
多模能力更强，其表现在索引更丰富，除了btree,hash还支持gin,gist,spgist,brin,bloom,rum等索引接口，适合模糊搜索，全文检索，多维任意搜索，时空搜索，高维向量（广泛应用于图像识别、相似特征扩选，时序搜索，用户画像，化学分析，DNA检索等。
类型更加丰富，同时支持扩展类型，除了基本类型以外，支持网络、全文检索、数组、xml、JSON、范围、域、树、多维、分子、GIS等类型。支持更丰富的应用场景。
支持oss_fdw, 可以将数据库的归档数据存储在oss中, 降低成本, 并且访问方法不变.

本文将对PG和MySQL进行多方位对比, 在某些方面PG的综合性能比MySQL高出一个数量级, PG+MySQL结合使用, 可以大幅降低企业成本.

疫情无情PG有情, 别裁员了, 建立多元化的技术栈, 强化企业IT能力更重要.

环境

申请阿里云RDS PG 12实例, 8核32G 1500G ESSD

同硬件配置的MySQL 8.0

用户密码:

user123 xxxxxxxx!

库:

db1

连接串:

PG:

export PGPASSWORD=xxxxxxxx! psql -h pgm-bp1z26gbo3gx893a129310.pg.rds.aliyuncs.com -p 1433 -U user123 db1

MySQL:

mysql -h rm-bp1wv992ym962k85888370.mysql.rds.aliyuncs.com -P 3306 -u user123 --password=xxxxxx! -D db1

测试用的客户端ecs centos 7.x x64安装mysql, pg客户端

yum install -y mysql-* yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm yum install -y postgresql12

MySQL 8.0测试

测试表

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), birth TIMESTAMP, hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL );

批量写入存储过程

```
DROP PROCEDURE IF EXISTS BatchInsert;

delimiter // -- 把界定符改成双斜杠
CREATE PROCEDURE BatchInsert(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号（可自定义），第二个位生成MySQL记录个数
BEGIN
DECLARE Var INT;
DECLARE ID INT;
SET Var = 0;
SET ID = init;
WHILE Var < loop_time DO
insert into employees
(id, fname, lname, birth, hired, separated, job_code, store_id)
values
(ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID);
SET ID = ID + 1;
SET Var = Var + 1;
END WHILE;
END;
//
delimiter ; -- 界定符改回分号
```

批量写入20万条

```
-- 开启事务插入,否则会很慢

begin;
CALL BatchInsert(1, 200000);
commit;

Query OK, 1 row affected (7.53 sec)
```

使用insert into继续批量写入

```
mysql> insert into employees select * from employees;
Query OK, 200000 rows affected (1.61 sec)
Records: 200000 Duplicates: 0 Warnings: 0

mysql> insert into employees select * from employees;
Query OK, 400000 rows affected (3.25 sec)
Records: 400000 Duplicates: 0 Warnings: 0

mysql> insert into employees select * from employees;
Query OK, 800000 rows affected (6.51 sec)
Records: 800000 Duplicates: 0 Warnings: 0

mysql> insert into employees select * from employees;
Query OK, 1600000 rows affected (12.93 sec)
Records: 1600000 Duplicates: 0 Warnings: 0

mysql> insert into employees select * from employees;
Query OK, 3200000 rows affected (28.61 sec)
Records: 3200000 Duplicates: 0 Warnings: 0

mysql> insert into employees select * from employees;
Query OK, 6400000 rows affected (56.48 sec)
Records: 6400000 Duplicates: 0 Warnings: 0

mysql> insert into employees select * from employees;
Query OK, 12800000 rows affected (1 min 55.30 sec)
Records: 12800000 Duplicates: 0 Warnings: 0
```

查询性能

mysql> select count(*) from employees; +----------+ | count(*) | +----------+ | 25600000 | +----------+ 1 row in set (6.15 sec)

求distinct性能

mysql> select count(distinct id) from employees ; +--------------------+ | count(distinct id) | +--------------------+ | 200000 | +--------------------+ 1 row in set (16.67 sec)

分组求distinct性能

mysql> select count(*) from (select id from employees group by id) t; +----------+ | count(*) | +----------+ | 200000 | +----------+ 1 row in set (15.52 sec)

再写入200万

begin; CALL BatchInsert(1, 2000000); commit;

测试表2, 写入200万.

```
CREATE TABLE employees1 (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
birth TIMESTAMP,
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
);

DROP PROCEDURE IF EXISTS BatchInser1;

delimiter // -- 把界定符改成双斜杠
CREATE PROCEDURE BatchInsert1(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号（可自定义），第二个位生成MySQL记录个数
BEGIN
DECLARE Var INT;
DECLARE ID INT;
SET Var = 0;
SET ID = init;
WHILE Var < loop_time DO
insert into employees1
(id, fname, lname, birth, hired, separated, job_code, store_id)
values
(ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID);
SET ID = ID + 1;
SET Var = Var + 1;
END WHILE;
END;
//
delimiter ; -- 界定符改回分号
```

使用loop insert写入200万行

```
-- 开启事务插入,否则会很慢

begin;
CALL BatchInsert1(1, 2000000);
commit;

Query OK, 1 row affected (1 min 7.06 sec)
```

2560万多对一JOIN 200万, 分组,排序

select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;

简单查询性能(因为以上查询几个小时都没有出结果, 不得不新建一个200万的表进行查询测试):

```
CREATE TABLE employees2 (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
birth TIMESTAMP,
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
);

DROP PROCEDURE IF EXISTS BatchInser2;

delimiter // -- 把界定符改成双斜杠
CREATE PROCEDURE BatchInsert2(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号（可自定义），第二个位生成MySQL记录个数
BEGIN
DECLARE Var INT;
DECLARE ID INT;
SET Var = 0;
SET ID = init;
WHILE Var < loop_time DO
insert into employees2
(id, fname, lname, birth, hired, separated, job_code, store_id)
values
(ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID);
SET ID = ID + 1;
SET Var = Var + 1;
END WHILE;
END;
//
delimiter ; -- 界定符改回分号

-- 开启事务插入,否则会很慢

begin;
CALL BatchInsert2(1, 2000000);
commit;

Query OK, 1 row affected (1 min 7.06 sec)
```

创建索引

create index idx_employees2_1 on employees2(id);

建立查询存储过程, 查询200万次.

```
DROP PROCEDURE IF EXISTS select1;

delimiter // -- 把界定符改成双斜杠
CREATE PROCEDURE select1(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号（可自定义），第二个位生成MySQL记录个数
BEGIN
DECLARE Var INT;
DECLARE ID1 INT;
DECLARE vid INT;
DECLARE vfname VARCHAR(30);
DECLARE vlname VARCHAR(30);
DECLARE vbirth TIMESTAMP;
DECLARE vhired DATE;
DECLARE vseparated DATE;
DECLARE vjob_code INT;
DECLARE vstore_id INT;
SET Var = 0;
SET ID1 = init;
WHILE Var < loop_time DO
select t.id,t.fname,t.lname,t.birth,t.hired,t.separated,t.job_code,t.store_id
into
vid,vfname,vlname,vbirth,vhired,vseparated,vjob_code,vstore_id
from employees2 t
where t.id=id1;
SET ID1 = ID1 + 1;
SET Var = Var + 1;
END WHILE;
END;
//
delimiter ; -- 界定符改回分号
```

基于KEY简单查询, 查询200万次的耗时.

```
-- 开启事务查询

begin;
CALL select1(1, 2000000);
commit;

Query OK, 1 row affected (1 min 10.23 sec)
```

MySQL 1亿+:

继续测试到1亿数据量.

```
mysql> insert into employees select * from employees;
Query OK, 27600000 rows affected (4 min 38.62 sec)
Records: 27600000 Duplicates: 0 Warnings: 0

mysql> insert into employees select * from employees;
Query OK, 55200000 rows affected (11 min 13.40 sec)
Records: 55200000 Duplicates: 0 Warnings: 0

mysql> select count() from employees;
+-----------+
| count() |
+-----------+
| 110400000 |
+-----------+
1 row in set (28.00 sec)

mysql> select count(distinct id) from employees ;
+--------------------+
| count(distinct id) |
+--------------------+
| 2000000 |
+--------------------+
1 row in set (1 min 17.73 sec)

mysql> select count() from (select id from employees group by id) t;
+----------+
| count() |
+----------+
| 2000000 |
+----------+
1 row in set (1 min 24.64 sec)
```

1.1亿全量数据更新

mysql> update employees set lname=lname||'new'; Query OK, 110400000 rows affected, 65535 warnings (21 min 30.34 sec) Rows matched: 110400000 Changed: 110400000 Warnings: 220800000

1.1亿多对一JOIN 200万, 分组,排序, 超过3小时没有查询出结果.

select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;

1.1亿创建索引

mysql> create index idx_employees_1 on employees(id); Query OK, 0 rows affected (3 min 49.04 sec) Records: 0 Duplicates: 0 Warnings: 0

阿里云RDS PostgreSQL 12测试

测试表

直接使用srf快速写入20万数据

```
\timing

insert into employees
(id, fname, lname, birth, hired, separated, job_code, store_id)
select
ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID
from generate_series(1,200000) id;

INSERT 0 200000
Time: 355.652 ms
```

也可以使用和mysql一样的方法loop insert写入20万

```
create or replace function BatchInsert(IN init INT, IN loop_time INT) -- 第一个参数为初始ID号（可自定义），第二个位生成记录个数
returns void as $$
DECLARE
Var INT := 0;
begin
for id in init..init+loop_time-1 loop
insert into employees
(id, fname, lname, birth, hired, separated, job_code, store_id)
values
(ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID);
end loop;
end;
$$ language plpgsql strict;

db1=# select batchinsert(1,200000);
batchinsert

(1 row)
Time: 1292.559 ms (00:01.293)
```

使用insert into继续批量写入

db1=> insert into employees select * from employees ; INSERT 0 400000 Time: 322.335 ms db1=> insert into employees select * from employees ; INSERT 0 800000 Time: 835.365 ms db1=> insert into employees select * from employees ; INSERT 0 1600000 Time: 1622.475 ms (00:01.622) db1=> insert into employees select * from employees ; INSERT 0 3200000 Time: 3583.787 ms (00:03.584) db1=> insert into employees select * from employees ; INSERT 0 6400000 Time: 7277.764 ms (00:07.278) db1=> insert into employees select * from employees ; INSERT 0 12800000 Time: 15639.482 ms (00:15.639)

查询性能

```
db1=> select count(*) from employees ;
count

25600000
(1 row)

Time: 604.982 ms
```

求distinct性能

```
db1=> select count(distinct id) from employees ;
count

200000
(1 row)

Time: 7852.604 ms (00:07.853)
```

分组求distinct性能

```
db1=> select count(*) from (select id from employees group by id) t;
count

200000
(1 row)

Time: 2982.907 ms (00:02.983)
```

再写入200万

insert into employees (id, fname, lname, birth, hired, separated, job_code, store_id) select ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID from generate_series(1,2000000) id;

测试表2, 写入200万.

insert into employees1
(id, fname, lname, birth, hired, separated, job_code, store_id)
select
ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID
from generate_series(1,2000000) id;

INSERT 0 2000000
Time: 3037.777 ms (00:03.038)
```

2560万多对一JOIN 200万, 分组,排序

Time: 8897.907 ms (00:08.898)
```

简单查询性能:

创建索引

create index idx_employees1_1 on employees1(id); CREATE INDEX Time: 1436.346 ms (00:01.436)

基于KEY简单查询, 查询200万次的耗时.

```
do language plpgsql $$
declare
begin
for i in 1..2000000 loop
perform * from employees1 where id=i;
end loop;
end;
$$;

DO
Time: 9515.728 ms (00:09.516)
```

```
db1=> select 9515.728/2000000;
?column?

0.00475786400000000000
(1 row)
```

PG 1亿+:

```
db1=> INSERT INTO employees select * from employees;
INSERT 0 27600000
Time: 25050.665 ms (00:25.051)

db1=> INSERT INTO employees select * from employees;
INSERT 0 55200000
Time: 64726.430 ms (01:04.726)
```

```
db1=> select count(*) from employees;
count

110400000
(1 row)

Time: 7286.152 ms (00:07.286)
```

```
db1=> select count(distinct id) from employees;
count

2000000
(1 row)

Time: 39783.068 ms (00:39.783)
```

```
db1=> select count(*) from (select id from employees group by id) t;
count

2000000
(1 row)

Time: 14668.305 ms (00:14.668)
```

Time: 33731.431 ms (00:33.731)
```

更新1.1亿

db1=> update employees set lname=lname||'new'; UPDATE 110400000 Time: 385372.063 ms (06:25.372)

创建索引:

db1=> create index idx_employees_1 on employees(id); CREATE INDEX Time: 70450.491 ms (01:10.450)

MySQL vs PG 性能报表

8核32G 1500G essd云盘， MySQL 8.0 vs PG 12

数据量 | sql | MySQL耗时 | PG耗时 | PG vs MySQL性能倍数
---|---|---|---|---
20万 | {写入}
存储过程loop insert | 7.53 s | 1.29 s | 5.84
20万 | {写入}
SRF insert | 不支持 | 0.36 s | -
40万 | {写入}
INSERT INTO employees select * from employees; | 3.25 s | 0.32 s | 10.16
80万 | {写入}
INSERT INTO employees select * from employees; | 6.51 s | 0.84 s | 7.75
160万 | {写入}
INSERT INTO employees select * from employees; | 12.93 s | 1.62 s | 7.95
320万 | {写入}
INSERT INTO employees select * from employees; | 28.61 s | 3.58 s | 7.99
640万 | {写入}
INSERT INTO employees select * from employees; | 56.48 s | 7.28 s | 7.76
1280万 | {写入}
INSERT INTO employees select * from employees; | 115.30 s | 15.64 s | 7.37
2760万 | {写入}
INSERT INTO employees select * from employees; | 278.62 s | 25.05 s | 11.12
5520万 | {写入}
INSERT INTO employees select * from employees; | 673.40 s | 64.73 s | 10.40
200万 | {普通查询}
KV查询200万次.
PS: 进程模型,建议实际应用时使用连接池，总连接控制在1000以内绝佳,未来支持内置线程池,几万连接完全没问题. | 70.23 s | 9.52 s | 7.38
2560万 | {复杂查询}
select count(*) from employees; | 6.15 s | 0.60 s | 10.25
2560万 | {复杂查询}
select count(distinct id) from employees; | 16.67 s | 7.85 s | 2.12
2560万 | {复杂查询}
select count(*) from (select id from employees group by id) t; | 15.52 s | 2.98 s | 5.21
1.1亿 | {复杂查询}
select count(*) from employees; | 28 s | 7.29 s | 3.84
1.1亿 | {复杂查询}
select count(distinct id) from employees; | 77.73 s | 39.78 s | 1.95
1.1亿 | {复杂查询}
select count(*) from (select id from employees group by id) t; | 84.64 s | 14.67 s | 5.77
2760万多对一JOIN 200万 | {JOIN + 运算}
select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10; | 超过3小时未出结果 | 8.90 s | 至少 1213.48
1.1亿多对一JOIN 200万 | {JOIN + 运算}
select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10; | 超过3小时未出结果 | 33.73 s | 至少 320.19
1.1亿 | {更新}
update employees set lname=concat(lname,'new'); | 1290.34 s | 70.45 s | 18.32
1.1亿 | {创建索引}
create index idx_employees_1 on employees(id); | 229.04 s | 70.45 s | 3.25