暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

DorisDB、TiDB/TiFlash、ClickHouse性能对比-单多表场景

雷雷DBA 2021-02-25
5333

DorisDB、TiDB/TiFlash、ClickHouse性能对比-单多表场景

                                                                                            --2021-02-07  刘春雷

因分析型业务猛烈增长,本文将对3种OLAP列存数据库进行性能对比,功能及易运维等方面此次先不讨论~

因个人刚刚接触不久,如有错误的地方,大家参考着看哈~

1、汇总

1.1、结论

  • DorisDB总体时间最短

  • DorisDB最快次数最多,ClickHouse次之

  • TiDB/TiFlash总体时间最长

  • TiDB执行计划多数走TiKV,导致执行时间长,且数据量越多,执行时间越长

  • TiDB强制走TiFlash ,单表多数提速多,多表多数变慢,但4.0.10 版本的执行计划多数不走

  • ClickHouse多表查询均报错,需要转换成大单表

1.2、单表查询


1.3、多表Join


2、信息

2.1、数据库信息

数据库均为默认配置,无特殊参数调整,可能会导致相关测试没有达到性能极致

2.2、表信息

表名行数解释
lineorder6亿SSB商品订单表
customer300万SSB客户表
part140万SSB 零部件表
supplier20万SSB 供应商表
dates2556日期表
lineorder_flat6亿SSB打平后的宽表

3、DorisDB模拟数据

3.1、构建数据

【安装测试工具】

https://www.dorisdb.com/zh-CN/blog/1.8

wget http://dorisdb-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.zip

tar xvf ssb-poc.tar.gz
cd ssb-poc
make && make install  

【构建数据】

cd output
bin/gen-ssb.sh 100 data_dir

【修改配置】:

conf/doris.conf

【执行脚本建表】:

bin/create_db_table.sh ddl_100

sql: ssb_create success
sql: ssb_flat_create success

【使用Stream load导入单表数据】

bin/stream_load.sh data_dir

stream load start. table: lineorder, path: data_dir/lineorder.tbl.1
.......
stream load success. table: lineorder, path: data_dir/lineorder.tbl.96
stream load start. table: customer, path: data_dir/customer.tbl
stream load success. table: customer, path: data_dir/customer.tbl
stream load start. table: dates, path: data_dir/dates.tbl
stream load success. table: dates, path: data_dir/dates.tbl
stream load start. table: part, path: data_dir/part.tbl
stream load success. table: part, path: data_dir/part.tbl
stream load start. table: supplier, path: data_dir/supplier.tbl
stream load success. table: supplier, path: data_dir/supplier.tbl

【插入数据到宽表lineorder_flat】

bin/flat_insert.sh

sql: ssb_flat_insert start
sql: ssb_flat_insert success

3.2、测试SQL

首先在客户端执行命令,修改Doris的并行度(类似clickhouse set max_threads= 8)

mysql -h127.0.0.1 -uroot -pxxx  -P9030

set global  parallel_fragment_exec_instance_num  = 8;

【多表】:

bin/benchmark.sh -p -d ssb

------ dataset: ssb, concurrency: 1 ------
sql\time(ms)\parallel_num       1
q1      326.0
q2      214.0
q3      199.0
q4      2316.0
q5      1474.0
q6      1158.0
q7      4683.0
q8      1878.0
q9      1418.0
q10     277.0
q11     4675.0
q12     1346.0
q13     736.0

【单表】:

bin/benchmark.sh -p -d ssb-flat

------ dataset: ssb-flat, concurrency: 1 ------
sql\time(ms)\parallel_num       1
q1      294.0
q2      169.0
q3      172.0
q4      307.0
q5      366.0
q6      172.0
q7      584.0
q8      241.0
q9      134.0
q10     136.0
q11     391.0
q12     433.0
q13     270.0

4、ClickHouse模拟数据

4.1、构建数据

【下载】:

https://clickhouse.tech/docs/en/getting-started/example-datasets/star-schema/

【参考文档执行】

https://github.com/vadimtk/ssb-dbgen

【生成数据】

cd ssb-dbgen-master

./dbgen -s 100 -T c

./dbgen -s 100 -T l

./dbgen -s 100 -T p
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000

 ./dbgen -s 100 -T s
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000

./dbgen -s 100 -T d
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000

【创建表】:

clickhouse-client --user=default --host=127.0.0.1
use lcl

CREATE TABLE customer(        C_CUSTKEY       UInt32,        C_NAME          String,        C_ADDRESS       String,        C_CITY          LowCardinality(String),        C_NATION        LowCardinality(String),        C_REGION        LowCardinality(String),        C_PHONE         String,        C_MKTSEGMENT    LowCardinality(String))ENGINE = MergeTree ORDER BY (C_CUSTKEY);
CREATE TABLE lineorder(    LO_ORDERKEY             UInt32,    LO_LINENUMBER           UInt8,    LO_CUSTKEY              UInt32,    LO_PARTKEY              UInt32,    LO_SUPPKEY              UInt32,    LO_ORDERDATE            Date,    LO_ORDERPRIORITY        LowCardinality(String),    LO_SHIPPRIORITY         UInt8,    LO_QUANTITY             UInt8,    LO_EXTENDEDPRICE        UInt32,    LO_ORDTOTALPRICE        UInt32,    LO_DISCOUNT             UInt8,    LO_REVENUE              UInt32,    LO_SUPPLYCOST           UInt32,    LO_TAX                  UInt8,    LO_COMMITDATE           Date,    LO_SHIPMODE             LowCardinality(String))ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
CREATE TABLE part(        P_PARTKEY       UInt32,        P_NAME          String,        P_MFGR          LowCardinality(String),        P_CATEGORY      LowCardinality(String),        P_BRAND         LowCardinality(String),        P_COLOR         LowCardinality(String),        P_TYPE          LowCardinality(String),        P_SIZE          UInt8,        P_CONTAINER     LowCardinality(String))ENGINE = MergeTree ORDER BY P_PARTKEY;
CREATE TABLE supplier(        S_SUPPKEY       UInt32,        S_NAME          String,        S_ADDRESS       String,        S_CITY          LowCardinality(String),        S_NATION        LowCardinality(String),        S_REGION        LowCardinality(String),        S_PHONE         String)ENGINE = MergeTree ORDER BY S_SUPPKEY;

【导入数据】:

clickhouse-client --user=default --host=127.0.0.1 --query "INSERT INTO customer FORMAT CSV" < customer.tbl
clickhouse-client --user=default --host=127.0.0.1 --query "INSERT INTO part FORMAT CSV" < part.tbl
clickhouse-client --user=default --host=127.0.0.1 --query  "INSERT INTO supplier FORMAT CSV" < supplier.tbl
clickhouse-client --user=default --host=127.0.0.1 --query  "INSERT INTO lineorder FORMAT CSV" < lineorder.tbl

【创建大单表及灌入数据】:

CREATE TABLE lineorder_flat ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS SELECT     l.LO_ORDERKEY AS LO_ORDERKEY,     l.LO_LINENUMBER AS LO_LINENUMBER,     l.LO_CUSTKEY AS LO_CUSTKEY,     l.LO_PARTKEY AS LO_PARTKEY,     l.LO_SUPPKEY AS LO_SUPPKEY,     l.LO_ORDERDATE AS LO_ORDERDATE,     l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,     l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,     l.LO_QUANTITY AS LO_QUANTITY,     l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,     l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,     l.LO_DISCOUNT AS LO_DISCOUNT,     l.LO_REVENUE AS LO_REVENUE,     l.LO_SUPPLYCOST AS LO_SUPPLYCOST,     l.LO_TAX AS LO_TAX,     l.LO_COMMITDATE AS LO_COMMITDATE,     l.LO_SHIPMODE AS LO_SHIPMODE,     c.C_NAME AS C_NAME,     c.C_ADDRESS AS C_ADDRESS,     c.C_CITY AS C_CITY,     c.C_NATION AS C_NATION,     c.C_REGION AS C_REGION,     c.C_PHONE AS C_PHONE,     c.C_MKTSEGMENT AS C_MKTSEGMENT,     s.S_NAME AS S_NAME,     s.S_ADDRESS AS S_ADDRESS,     s.S_CITY AS S_CITY,     s.S_NATION AS S_NATION,     s.S_REGION AS S_REGION,     s.S_PHONE AS S_PHONE,     p.P_NAME AS P_NAME,     p.P_MFGR AS P_MFGR,     p.P_CATEGORY AS P_CATEGORY,     p.P_BRAND AS P_BRAND,     p.P_COLOR AS P_COLOR,     p.P_TYPE AS P_TYPE,     p.P_SIZE AS P_SIZE,     p.P_CONTAINER AS P_CONTAINER FROM lineorder AS l INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;

速度:

Progress: 38.35 million rows, 1.77 GB (263.42 thousand rows/s., 12.14 MB/s.)

4.2、测试SQL


4.3、多表join测试

导入dates表 完成

因SQL均报错,本人比较懒惰,尝试修复失败....所以ClickHouse 多表Join为无,测试具体如下


5、TiDB模拟数据

5.1、构建数据

use lcl

创建相关表

【导入数据】:

for i in {1..100};do echo "----------$i-------------";mysql -uroot -pxxx  -h127.0.0.1 -Pxxx  lcl -e "LOAD DATA LOCAL INFILE 'lineorder.tbl.$i' INTO TABLE lineorder FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\n'";done

mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE '/opt/soft/data_dir/dates.tbl' INTO TABLE dates FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\n'"

 mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE '/opt/soft/data_dir/customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\n'" 

mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE '/opt/soft/data_dir/supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\n'"

 mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE '/opt/soft/data_dir/part.tbl' INTO TABLE part FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\n'" 

【添加至TiFlash】:

alter table lineorder SET TIFLASH REPLICA 1;    

alter table customer SET TIFLASH REPLICA 1;    

alter table dates SET TIFLASH REPLICA 1;        

alter table part SET TIFLASH REPLICA 1;     
alter table supplier SET TIFLASH REPLICA 1;    

【查看】:

SELECT * FROM information_schema.tiflash_replica ;

【汇聚数据到大单表】:

导出:

time ./dumpling -uroot -pxxx -h127.0.0.1 -Pxxx -o dumping_lineorder_flat/ --filetype sql --no-header -F 256MiB --threads 4 --sql 'SELECT `LO_ORDERKEY` , `LO_ORDERDATE` , `LO_LINENUMBER` , `LO_CUSTKEY` , `LO_PARTKEY` , `LO_SUPPKEY` , `LO_ORDERPRIORITY` , `LO_SHIPPRIORITY` , `LO_QUANTITY` , `LO_EXTENDEDPRICE` , `LO_ORDTOTALPRICE` , `LO_DISCOUNT` , `LO_REVENUE` , `LO_SUPPLYCOST` , `LO_TAX` , `LO_COMMITDATE` , `LO_SHIPMODE` , `C_NAME` , `C_ADDRESS` , `C_CITY` , `C_NATION` , `C_REGION` , `C_PHONE` , `C_MKTSEGMENT` , `S_NAME` , `S_ADDRESS` , `S_CITY` , `S_NATION` , `S_REGION` ,  `S_PHONE` , `P_NAME` , `P_MFGR` , `P_CATEGORY` , `P_BRAND` , `P_COLOR` , `P_TYPE` , `P_SIZE` , `P_CONTAINER`  FROM lcl.lineorder l INNER JOIN  lcl.customer c  ON (c.C_CUSTKEY = l.LO_CUSTKEY)  INNER JOIN lcl.supplier s  ON (s.S_SUPPKEY = l.LO_SUPPKEY)  INNER JOIN lcl.part p  ON  (p.P_PARTKEY = l.LO_PARTKEY) '     

【导入到大单表】:

【模拟loader 相关文件-创建文件】:

lcl.lineorder_flat-schema.sql

lcl-schema-create.sql

【修改文件名】:

for i in {000..885};do echo "------- $i -----";mv result.000000$i.sql lcl.lineorder_flat.000000$i.sql;done

【替换文件内容】:

for i in {000..885};do echo "------- $i -----";sed -i 's/INSERT INTO ``/INSERT INTO `lineorder_flat`/g'  lcl.lineorder_flat.000000$i.sql;done 

【导入至大单表】:

cd /opt/soft/tidb-enterprise-tools-nightly-linux-amd64

./bin/loader -c=loader.toml

5.2、测试SQL

参考上面结果:


文章转载自雷雷DBA,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论