DorisDB、TiDB/TiFlash、ClickHouse性能对比-单多表场景
--2021-02-07 刘春雷
因分析型业务猛烈增长,本文将对3种OLAP列存数据库进行性能对比,功能及易运维等方面此次先不讨论~
因个人刚刚接触不久,如有错误的地方,大家参考着看哈~
1、汇总
1.1、结论
DorisDB总体时间最短
DorisDB最快次数最多,ClickHouse次之
TiDB/TiFlash总体时间最长
TiDB执行计划多数走TiKV,导致执行时间长,且数据量越多,执行时间越长
TiDB强制走TiFlash ,单表多数提速多,多表多数变慢,但4.0.10 版本的执行计划多数不走
ClickHouse多表查询均报错,需要转换成大单表
1.2、单表查询
1.3、多表Join
2、信息
2.1、数据库信息
数据库均为默认配置,无特殊参数调整,可能会导致相关测试没有达到性能极致
2.2、表信息
表名 | 行数 | 解释 |
lineorder | 6亿 | SSB商品订单表 |
customer | 300万 | SSB客户表 |
part | 140万 | SSB 零部件表 |
supplier | 20万 | SSB 供应商表 |
dates | 2556 | 日期表 |
lineorder_flat | 6亿 | SSB打平后的宽表 |
3、DorisDB模拟数据
3.1、构建数据
【安装测试工具】
https://www.dorisdb.com/zh-CN/blog/1.8
wget http://dorisdb-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.zip
tar xvf ssb-poc.tar.gz
cd ssb-poc
make && make install
【构建数据】
cd output
bin/gen-ssb.sh 100 data_dir
【修改配置】:
conf/doris.conf
【执行脚本建表】:
bin/create_db_table.sh ddl_100
sql: ssb_create success
sql: ssb_flat_create success
【使用Stream load导入单表数据】
bin/stream_load.sh data_dir
stream load start. table: lineorder, path: data_dir/lineorder.tbl.1
.......
stream load success. table: lineorder, path: data_dir/lineorder.tbl.96
stream load start. table: customer, path: data_dir/customer.tbl
stream load success. table: customer, path: data_dir/customer.tbl
stream load start. table: dates, path: data_dir/dates.tbl
stream load success. table: dates, path: data_dir/dates.tbl
stream load start. table: part, path: data_dir/part.tbl
stream load success. table: part, path: data_dir/part.tbl
stream load start. table: supplier, path: data_dir/supplier.tbl
stream load success. table: supplier, path: data_dir/supplier.tbl
【插入数据到宽表lineorder_flat】
bin/flat_insert.sh
sql: ssb_flat_insert start
sql: ssb_flat_insert success
3.2、测试SQL
首先在客户端执行命令,修改Doris的并行度(类似clickhouse set max_threads= 8)
mysql -h127.0.0.1 -uroot -pxxx -P9030
set global parallel_fragment_exec_instance_num = 8;
【多表】:
bin/benchmark.sh -p -d ssb
------ dataset: ssb, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 326.0
q2 214.0
q3 199.0
q4 2316.0
q5 1474.0
q6 1158.0
q7 4683.0
q8 1878.0
q9 1418.0
q10 277.0
q11 4675.0
q12 1346.0
q13 736.0
【单表】:
bin/benchmark.sh -p -d ssb-flat
------ dataset: ssb-flat, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 294.0
q2 169.0
q3 172.0
q4 307.0
q5 366.0
q6 172.0
q7 584.0
q8 241.0
q9 134.0
q10 136.0
q11 391.0
q12 433.0
q13 270.0
4、ClickHouse模拟数据
4.1、构建数据
【下载】:
https://clickhouse.tech/docs/en/getting-started/example-datasets/star-schema/
【参考文档执行】
https://github.com/vadimtk/ssb-dbgen
【生成数据】
cd ssb-dbgen-master
./dbgen -s 100 -T c
./dbgen -s 100 -T l
./dbgen -s 100 -T p
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000
./dbgen -s 100 -T s
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000
./dbgen -s 100 -T d
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000
【创建表】:
clickhouse-client --user=default --host=127.0.0.1
use lcl
CREATE TABLE customer( C_CUSTKEY UInt32, C_NAME String, C_ADDRESS String, C_CITY LowCardinality(String), C_NATION LowCardinality(String), C_REGION LowCardinality(String), C_PHONE String, C_MKTSEGMENT LowCardinality(String))ENGINE = MergeTree ORDER BY (C_CUSTKEY);
CREATE TABLE lineorder( LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY LowCardinality(String), LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE LowCardinality(String))ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
CREATE TABLE part( P_PARTKEY UInt32, P_NAME String, P_MFGR LowCardinality(String), P_CATEGORY LowCardinality(String), P_BRAND LowCardinality(String), P_COLOR LowCardinality(String), P_TYPE LowCardinality(String), P_SIZE UInt8, P_CONTAINER LowCardinality(String))ENGINE = MergeTree ORDER BY P_PARTKEY;
CREATE TABLE supplier( S_SUPPKEY UInt32, S_NAME String, S_ADDRESS String, S_CITY LowCardinality(String), S_NATION LowCardinality(String), S_REGION LowCardinality(String), S_PHONE String)ENGINE = MergeTree ORDER BY S_SUPPKEY;
【导入数据】:
clickhouse-client --user=default --host=127.0.0.1 --query "INSERT INTO customer FORMAT CSV" < customer.tbl
clickhouse-client --user=default --host=127.0.0.1 --query "INSERT INTO part FORMAT CSV" < part.tbl
clickhouse-client --user=default --host=127.0.0.1 --query "INSERT INTO supplier FORMAT CSV" < supplier.tbl
clickhouse-client --user=default --host=127.0.0.1 --query "INSERT INTO lineorder FORMAT CSV" < lineorder.tbl
【创建大单表及灌入数据】:
CREATE TABLE lineorder_flat ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS SELECT l.LO_ORDERKEY AS LO_ORDERKEY, l.LO_LINENUMBER AS LO_LINENUMBER, l.LO_CUSTKEY AS LO_CUSTKEY, l.LO_PARTKEY AS LO_PARTKEY, l.LO_SUPPKEY AS LO_SUPPKEY, l.LO_ORDERDATE AS LO_ORDERDATE, l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY, l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY, l.LO_QUANTITY AS LO_QUANTITY, l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE, l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE, l.LO_DISCOUNT AS LO_DISCOUNT, l.LO_REVENUE AS LO_REVENUE, l.LO_SUPPLYCOST AS LO_SUPPLYCOST, l.LO_TAX AS LO_TAX, l.LO_COMMITDATE AS LO_COMMITDATE, l.LO_SHIPMODE AS LO_SHIPMODE, c.C_NAME AS C_NAME, c.C_ADDRESS AS C_ADDRESS, c.C_CITY AS C_CITY, c.C_NATION AS C_NATION, c.C_REGION AS C_REGION, c.C_PHONE AS C_PHONE, c.C_MKTSEGMENT AS C_MKTSEGMENT, s.S_NAME AS S_NAME, s.S_ADDRESS AS S_ADDRESS, s.S_CITY AS S_CITY, s.S_NATION AS S_NATION, s.S_REGION AS S_REGION, s.S_PHONE AS S_PHONE, p.P_NAME AS P_NAME, p.P_MFGR AS P_MFGR, p.P_CATEGORY AS P_CATEGORY, p.P_BRAND AS P_BRAND, p.P_COLOR AS P_COLOR, p.P_TYPE AS P_TYPE, p.P_SIZE AS P_SIZE, p.P_CONTAINER AS P_CONTAINER FROM lineorder AS l INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;
速度:
Progress: 38.35 million rows, 1.77 GB (263.42 thousand rows/s., 12.14 MB/s.)
4.2、测试SQL
4.3、多表join测试
导入dates表 完成
因SQL均报错,本人比较懒惰,尝试修复失败....所以ClickHouse 多表Join为无,测试具体如下
5、TiDB模拟数据
5.1、构建数据
use lcl
创建相关表
【导入数据】:
for i in {1..100};do echo "----------$i-------------";mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE 'lineorder.tbl.$i' INTO TABLE lineorder FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'";done
mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE '/opt/soft/data_dir/dates.tbl' INTO TABLE dates FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'"
mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE '/opt/soft/data_dir/customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'"
mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE '/opt/soft/data_dir/supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'"
mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE '/opt/soft/data_dir/part.tbl' INTO TABLE part FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'"
【添加至TiFlash】:
alter table lineorder SET TIFLASH REPLICA 1;
alter table customer SET TIFLASH REPLICA 1;
alter table dates SET TIFLASH REPLICA 1;
alter table part SET TIFLASH REPLICA 1;
alter table supplier SET TIFLASH REPLICA 1;
【查看】:
SELECT * FROM information_schema.tiflash_replica ;
【汇聚数据到大单表】:
导出:
time ./dumpling -uroot -pxxx -h127.0.0.1 -Pxxx -o dumping_lineorder_flat/ --filetype sql --no-header -F 256MiB --threads 4 --sql 'SELECT `LO_ORDERKEY` , `LO_ORDERDATE` , `LO_LINENUMBER` , `LO_CUSTKEY` , `LO_PARTKEY` , `LO_SUPPKEY` , `LO_ORDERPRIORITY` , `LO_SHIPPRIORITY` , `LO_QUANTITY` , `LO_EXTENDEDPRICE` , `LO_ORDTOTALPRICE` , `LO_DISCOUNT` , `LO_REVENUE` , `LO_SUPPLYCOST` , `LO_TAX` , `LO_COMMITDATE` , `LO_SHIPMODE` , `C_NAME` , `C_ADDRESS` , `C_CITY` , `C_NATION` , `C_REGION` , `C_PHONE` , `C_MKTSEGMENT` , `S_NAME` , `S_ADDRESS` , `S_CITY` , `S_NATION` , `S_REGION` , `S_PHONE` , `P_NAME` , `P_MFGR` , `P_CATEGORY` , `P_BRAND` , `P_COLOR` , `P_TYPE` , `P_SIZE` , `P_CONTAINER` FROM lcl.lineorder l INNER JOIN lcl.customer c ON (c.C_CUSTKEY = l.LO_CUSTKEY) INNER JOIN lcl.supplier s ON (s.S_SUPPKEY = l.LO_SUPPKEY) INNER JOIN lcl.part p ON (p.P_PARTKEY = l.LO_PARTKEY) '
【导入到大单表】:
【模拟loader 相关文件-创建文件】:
lcl.lineorder_flat-schema.sql
lcl-schema-create.sql
【修改文件名】:
for i in {000..885};do echo "------- $i -----";mv result.000000$i.sql lcl.lineorder_flat.000000$i.sql;done
【替换文件内容】:
for i in {000..885};do echo "------- $i -----";sed -i 's/INSERT INTO ``/INSERT INTO `lineorder_flat`/g' lcl.lineorder_flat.000000$i.sql;done
【导入至大单表】:
cd /opt/soft/tidb-enterprise-tools-nightly-linux-amd64
./bin/loader -c=loader.toml
5.2、测试SQL
参考上面结果: