Greenplum 海量数据，大宽表行存 VS 列存

digoal 2018-04-24

737

作者

digoal

日期

2018-04-24

背景

GPDB支持行存列存，而且相关的选择原理我们之前的文章也有很详细的分析，同时在性能方面也做过较多的评测对比。

但是在大宽表上，相差到底有多悬殊我们之前没有对其进行过对比。主要的差异来自deform和QUERY扫描的存储空间（理论上，数据量越大，同时访问的列越少，列存的优势越明显。或者是访问到的列越靠后，优势越明显）。

一些原理参考

《PostgreSQL VOPS 向量计算 + DBLINK异步并行 - 单实例 10亿聚合计算跑进2秒》

《PostgreSQL 向量化执行插件(瓦片式实现-vops) 10x提速OLAP》

一些选择参考本文末尾的链接文章。

测试

1、创建一个函数，用于创建1000个列的表(每列类型INT8)

create or replace function crt_tbl(name, text, text) returns void as $$ declare sql text; begin sql := 'create table '||$1||'('; for i in 1..1000 loop sql := sql||'c'||i||' int8 default random()*10000,'; end loop; sql := rtrim(sql, ','); sql := sql||') with (APPENDONLY=true, ORIENTATION='||$2||', COMPRESSTYPE='||$3||')'; execute sql; end; $$ language plpgsql strict;

2、创建行存表

```
select crt_tbl('tbl_1000_row_nonc', 'row', 'none');

select crt_tbl('tbl_1000_row_c', 'row', 'zlib');
```

3、创建列存表

```
select crt_tbl('tbl_1000_column_nonc', 'column', 'none');

select crt_tbl('tbl_1000_column_c', 'column', 'zlib');
```

4、写入100万行测试数据

postgres=# insert into tbl_1000_row_nonc (c1) select generate_series(1,1000000); INSERT 0 1000000 Time: 221003.467 ms postgres=# insert into tbl_1000_row_c select * from tbl_1000_row_nonc; INSERT 0 1000000 Time: 12298.931 ms postgres=# insert into tbl_1000_column_nonc select * from tbl_1000_row_nonc; INSERT 0 1000000 Time: 23332.039 ms postgres=# insert into tbl_1000_column_c select * from tbl_1000_row_nonc; INSERT 0 1000000 Time: 17017.119 ms

5、空间对比

```
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_row_nonc'));
pg_size_pretty

7653 MB
(1 row)

Time: 282.180 ms
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_column_nonc'));
pg_size_pretty

7647 MB
(1 row)

Time: 55.315 ms

postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_row_c'));
pg_size_pretty

2522 MB
(1 row)

Time: 56.017 ms
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_column_c'));
pg_size_pretty

2520 MB
(1 row)

Time: 55.557 ms
```

6、第一列查询对比

```
postgres=# explain analyze select c1,count(*) from tbl_1000_row_nonc group by c1;
QUERY PLAN

Gather Motion 48:1 (slice1; segments: 48) (cost=259714.00..272214.00 rows=1000000 width=16)
Rows out: 1000000 rows at destination with 352 ms to end, start offset by 1.483 ms.
-> HashAggregate (cost=259714.00..272214.00 rows=20834 width=16)
Group By: c1
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 0.005 ms to first row, 101 ms to end, start offset by 63 ms.
-> Append-only Scan on tbl_1000_row_nonc (cost=0.00..254714.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 19 ms to end, start offset by 82 ms.
Slice statistics:
(slice0) Executor memory: 405K bytes.
(slice1) Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 353.793 ms
(15 rows)

postgres=# explain analyze select c1,count(*) from tbl_1000_row_c group by c1;
QUERY PLAN

Gather Motion 48:1 (slice1; segments: 48) (cost=95537.00..108037.00 rows=1000000 width=16)
Rows out: 1000000 rows at destination with 1128 ms to end, start offset by 1.360 ms.
-> HashAggregate (cost=95537.00..108037.00 rows=20834 width=16)
Group By: c1
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 0.003 ms to first row, 699 ms to end, start offset by 42 ms.
-> Append-only Scan on tbl_1000_row_c (cost=0.00..90537.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 13 ms to end, start offset by 42 ms.
Slice statistics:
(slice0) Executor memory: 405K bytes.
(slice1) Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 1130.006 ms
(15 rows)

postgres=# explain analyze select c1,count(*) from tbl_1000_column_nonc group by c1;
QUERY PLAN

Gather Motion 48:1 (slice1; segments: 48) (cost=259518.00..272018.00 rows=1000000 width=16)
Rows out: 1000000 rows at destination with 245 ms to end, start offset by 1.365 ms.
-> HashAggregate (cost=259518.00..272018.00 rows=20834 width=16)
Group By: c1
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 0.005 ms to first row, 9.968 ms to end, start offset by 60 ms.
-> Append-only Columnar Scan on tbl_1000_column_nonc (cost=0.00..254518.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 16 ms to end, start offset by 60 ms.
Slice statistics:
(slice0) Executor memory: 405K bytes.
(slice1) Executor memory: 821K bytes avg x 48 workers, 821K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 246.967 ms
(15 rows)

postgres=# explain analyze select c1,count(*) from tbl_1000_column_c group by c1;
QUERY PLAN

Gather Motion 48:1 (slice1; segments: 48) (cost=259518.00..272018.00 rows=1000000 width=16)
Rows out: 1000000 rows at destination with 249 ms to end, start offset by 1.450 ms.
-> HashAggregate (cost=259518.00..272018.00 rows=20834 width=16)
Group By: c1
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 0.004 ms to first row, 8.861 ms to end, start offset by 46 ms.
-> Append-only Columnar Scan on tbl_1000_column_c (cost=0.00..254518.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 16 ms to end, start offset by 45 ms.
Slice statistics:
(slice0) Executor memory: 405K bytes.
(slice1) Executor memory: 853K bytes avg x 48 workers, 853K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 250.583 ms
(15 rows)
```

7、最后一列查询对比

```
postgres=# explain analyze select c1000,count(*) from tbl_1000_row_nonc group by c1000;
QUERY PLAN

Gather Motion 48:1 (slice2; segments: 48) (cost=260048.04..260167.34 rows=9544 width=16)
Rows out: 10001 rows at destination with 296 ms to end, start offset by 1.806 ms.
-> HashAggregate (cost=260048.04..260167.34 rows=199 width=16)
Group By: tbl_1000_row_nonc.c1000
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.002 ms to first row, 133 ms to end, start offset by 21 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=259714.00..259904.88 rows=199 width=16)
Hash Key: tbl_1000_row_nonc.c1000
Rows out: Avg 8749.4 rows x 48 workers at destination. Max 9360 rows (seg46) with 141 ms to end, start offset by 21 ms.
-> HashAggregate (cost=259714.00..259714.00 rows=199 width=16)
Group By: tbl_1000_row_nonc.c1000
Rows out: Avg 8749.4 rows x 48 workers. Max 8814 rows (seg18) with 0.003 ms to first row, 209 ms to end, start offset by 63 ms.
-> Append-only Scan on tbl_1000_row_nonc (cost=0.00..254714.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 16 ms to end, start offset by 64 ms.
Slice statistics:
(slice0) Executor memory: 417K bytes.
(slice1) Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).
(slice2) Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 298.222 ms
(22 rows)

postgres=# explain analyze select c1000,count(*) from tbl_1000_row_c group by c1000;
QUERY PLAN

Gather Motion 48:1 (slice2; segments: 48) (cost=95886.65..96011.53 rows=9990 width=16)
Rows out: 10001 rows at destination with 1244 ms to end, start offset by 1.760 ms.
-> HashAggregate (cost=95886.65..96011.53 rows=209 width=16)
Group By: tbl_1000_row_c.c1000
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.002 ms to first row, 432 ms to end, start offset by 32 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=95537.00..95736.80 rows=209 width=16)
Hash Key: tbl_1000_row_c.c1000
Rows out: Avg 8749.4 rows x 48 workers at destination. Max 9360 rows (seg46) with 755 ms to end, start offset by 57 ms.
-> HashAggregate (cost=95537.00..95537.00 rows=209 width=16)
Group By: tbl_1000_row_c.c1000
Rows out: Avg 8749.4 rows x 48 workers. Max 8814 rows (seg18) with 0.004 ms to first row, 1085 ms to end, start offset by 63 ms.
-> Append-only Scan on tbl_1000_row_c (cost=0.00..90537.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 22 ms to end, start offset by 59 ms.
Slice statistics:
(slice0) Executor memory: 417K bytes.
(slice1) Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).
(slice2) Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 1246.276 ms
(22 rows)

postgres=# explain analyze select c1000,count(*) from tbl_1000_column_c group by c1000;
QUERY PLAN

Gather Motion 48:1 (slice2; segments: 48) (cost=259870.80..259996.80 rows=10080 width=16)
Rows out: 10001 rows at destination with 78 ms to end, start offset by 1.783 ms.
-> HashAggregate (cost=259870.80..259996.80 rows=210 width=16)
Group By: tbl_1000_column_c.c1000
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.004 ms to first row, 24 ms to end, start offset by 26 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=259518.00..259719.60 rows=210 width=16)
Hash Key: tbl_1000_column_c.c1000
Rows out: Avg 8749.4 rows x 48 workers at destination. Max 9360 rows (seg46) with 31 ms to end, start offset by 23 ms.
-> HashAggregate (cost=259518.00..259518.00 rows=210 width=16)
Group By: tbl_1000_column_c.c1000
Rows out: Avg 8749.4 rows x 48 workers. Max 8814 rows (seg18) with 0.004 ms to first row, 5.962 ms to end, start offset by 42 ms.
-> Append-only Columnar Scan on tbl_1000_column_c (cost=0.00..254518.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 28 ms to end, start offset by 30 ms.
Slice statistics:
(slice0) Executor memory: 417K bytes.
(slice1) Executor memory: 1787K bytes avg x 48 workers, 1787K bytes max (seg0).
(slice2) Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 80.152 ms
(22 rows)

postgres=# explain analyze select c1000,count(*) from tbl_1000_column_nonc group by c1000;
QUERY PLAN

Gather Motion 48:1 (slice2; segments: 48) (cost=259856.48..259977.37 rows=9671 width=16)
Rows out: 10001 rows at destination with 77 ms to end, start offset by 2.019 ms.
-> HashAggregate (cost=259856.48..259977.37 rows=202 width=16)
Group By: tbl_1000_column_nonc.c1000
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.003 ms to first row, 26 ms to end, start offset by 35 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=259518.00..259711.42 rows=202 width=16)
Hash Key: tbl_1000_column_nonc.c1000
Rows out: Avg 8749.4 rows x 48 workers at destination. Max 9360 rows (seg46) with 20 ms to end, start offset by 30 ms.
-> HashAggregate (cost=259518.00..259518.00 rows=202 width=16)
Group By: tbl_1000_column_nonc.c1000
Rows out: Avg 8749.4 rows x 48 workers. Max 8814 rows (seg18) with 0.006 ms to first row, 5.476 ms to end, start offset by 25 ms.
-> Append-only Columnar Scan on tbl_1000_column_nonc (cost=0.00..254518.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 27 ms to end, start offset by 25 ms.
Slice statistics:
(slice0) Executor memory: 417K bytes.
(slice1) Executor memory: 1755K bytes avg x 48 workers, 1755K bytes max (seg0).
(slice2) Executor memory: 1330K bytes avg x 48 workers, 1330K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 78.976 ms
(22 rows)
```

9、更新对比

postgres=# update tbl_1000_row_nonc set c1000=c1000+1; UPDATE 1000000 Time: 7085.645 ms postgres=# update tbl_1000_row_c set c1000=c1000+1; UPDATE 1000000 Time: 6734.279 ms postgres=# update tbl_1000_column_nonc set c1000=c1000+1; UPDATE 1000000 Time: 13514.749 ms postgres=# update tbl_1000_column_c set c1000=c1000+1; UPDATE 1000000 Time: 10629.104 ms

小结

表|空间|查询第一列|查询最后一列|全表更新
---|---|---|---|---
AO行存不压缩 | 7653 MB | 353.793 ms | 298.222 ms | 7085.645 ms
AO行存压缩 | 2522 MB | 1130.006 ms | 1246.276 ms | 6734.279 ms
AO列存不压缩 | 7647 MB | 246.967 ms | 80.152 ms | 13514.749 ms
AO列存压缩 | 2520 MB | 250.583 ms | 78.976 ms | 10629.104 ms

对于大宽表，查询性能显然列存储要好很多(不管是第一列还是最后一列，性能一致)，写入和更新性能列存储略差因为需要操作更多的数据文件。

列存第一列和最后一列性能差异是返回记录数导致的，除了第一列，其他列的取值空间都是10000，第一列是100万。

有兴趣可以再测试一些其他参数，比如压缩级别，块大小，HEAP TABLE：

where storage_parameter is: APPENDONLY={TRUE|FALSE} BLOCKSIZE={8192-2097152} ORIENTATION={COLUMN|ROW} COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE} COMPRESSLEVEL={0-9} CHECKSUM={TRUE|FALSE} FILLFACTOR={10-100} OIDS[=TRUE|FALSE]

参考

《Greenplum 行存、列存，堆表、AO表性能对比 - 阿里云HDB for PostgreSQL最佳实践》

《Greenplum 行存、列存，堆表、AO表的原理和选择》

《Greenplum 最佳实践 - 行存与列存的选择以及转换方法》

《PostgreSQL VOPS 向量计算 + DBLINK异步并行 - 单实例 10亿聚合计算跑进2秒》

《PostgreSQL 向量化执行插件(瓦片式实现-vops) 10x提速OLAP》

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

Greenplum 海量数据，大宽表行存 VS 列存

作者

日期

标签

背景

测试

小结

参考

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论

Greenplum 海量数据，大宽表 行存 VS 列存

作者

日期

标签

背景

测试

小结

参考

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论

Greenplum 海量数据，大宽表行存 VS 列存