greenplum 数据库参数优化

数据库平台优化 2021-11-09

4684

Greenplum调整的参数如下：

(1)全局死锁检测开关
在Greenplum 6中其默认关闭，需要打开它才可以支持并发更新/删除操作；
gpconfig -c gp_enable_global_deadlock_detector -v on

(2) 禁用GPORCA优化器（据说GPDB6默认的优化器为：GPORCA）
gpconfig -c optimizer -v off

(3)关闭日志
此GUC减少不必要的日志，避免日志输出对I/O性能的干扰。

gpconfig -c log_statement -v none

gpconfig -c max_connections -v 1250 -m 250 #segment为master的5-10倍
gpconfig -c max_prepared_transactions -v 250 #和master的max_connections参数配置一致
gpconfig -c shared_buffers -v 8192MB #至少>max_connections*16k
gpconfig -c gp_vmem_protect_limit -v 46933 #为在每个segment数据库中完成的所有工作分配的最大内存
gp_vmem（Greenplum数据库可用的主机内存）:
gp_vmem = ((SWAP + RAM) – (7.5GB + 0.05 * RAM)) / 1.7
= (500 -(7.5+0.05*500))/1.7 = 275G
max_acting_primary_segments = 4 (格外加2个容错的primary)=6
gp_vmem_protect_limit = gp_vmem / max_acting_primary_segments
= 275/6=46G = 46933MB
gpconfig -c statement_mem -v 1408MB #statement_mem服务器配置参数是分配给segment数据库中任何单个查询的内存量
statement_mem =（(gp_vmem_protect_limit * .9) / max_expected_concurrent_queries）
= 46 933*0.9/30 =1408MB
gpconfig -c effective_cache_size -v 250GB #250GB (物理内存一半)，设置有关Postgres查询优化器（计划程序）的单个查询可用的磁盘高速缓存的有效大小的假设,估算使用指数的成本的因素; 较高的值使得更有可能使用索引扫描，较低的值使得更有可能使用顺序扫描
gpconfig -c work_mem -v 25600MB #设置每个segment内存排序的大小，先测试为系统总内存的5%=5%*500=25G=25*1024=25600MB
gpconfig -c temp_buffers -v 2048 #默认1024，以允许每个数据库会话使用临时缓冲区。这些是仅用于访问临时表的会话本地缓冲区
gpconfig -c gp_fts_probe_threadcount -v 32 #默认16个，故障检测的线程数。大于等于每个节点的segment数。

gpconfig -c gp_hashjoin_tuples_per_bucket -v 5 #默认5，设置HashJoin操作使用的哈希表

的目标密度。较小的值往往会产生较大的哈希表，这可以提高连接性能
gpconfig

gpconfig -c gp_interconnect_setup_timeout -v 2h #默认2h，指定在超时之前等待Greenplum数据库interconnect完成设置的时间
gpconfig -c max_statement_mem -v 8192MB #2000M,设置查询的最大内存限制
gpconfig -c gp_resqueue_priority_cpucores_per_segment -v 6 #默认4，指定每个segment实例分配的CPU单元数
gpconfig -c maintenance_work_mem -v 12GB #默认16MB，指定要在维护操作中使用的最大内存量，例如VACUUM和CREATE INDEX。指定要在维护操作中使用的最大内存量，例如VACUUM和CREATE INDEX。
gpocnfig -c gp_vmem_protect_segworker_cache_limit -v 2048 #默认500，如果查询执行程序进程消耗的数量超过此配置的数量，则在进程完成后，将不会高速缓存该进程以用于后续查询。

postgre系统表收缩的另一个思路

vacuum full会锁表，而且效率很低，在实际中不可能使用vacuum来缩小pg_class，这样会有很长的停机时间。
其实要实现vacuum full最简单的方法就是将一个表重新复制一遍，create table b as select * from a;然后再使用b表代替a表使用就可以了。
鉴于pg_class是所有表的基础，我们就算将其拷贝也无法将其取代掉。这样，我们可以以另外一种方式来实现，替换底层数据文件。由于pg_class有一个系统列，oid，这一个列我们无法简单的直接copy，所以我们采用一种迂回的方法。
1.新建一个表with oid， create table cxf with oids as select * from pg_class limit 0;
在这个表中建立跟pg_class一样的索引，因为如果我们将底层数据文件替换掉，而还是用老的索引文件的话，会错乱的
2.将整个pg_class使用copy命令导成文本（使用参数with oids）将oid也导出
3.然后将这个数据文件用copy with oids命令存入到第一步建的表中
4.停止数据库（让所有在缓存的数据全部写入到文件中）
5.替换底层的数据文件跟索引文件
6.重启数据库即可

使用这种方法使pg_class的数据文件大小从1.4G变成了63M，pg_attribute从1.5G变成602M，实现了vacuum full的效果，具体步骤如下：

2.在数据库base目录下查看这几个文件
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>ll -h 1259 1259.* 15687137 15687137.* 15687138 15687138.*
ls: 15687137.*: No such file or directory
ls: 15687138.*: No such file or directory
-rw------- 1 gpadmin gpadmin 1.0G Dec 11 20:14 1259
-rw------- 1 gpadmin gpadmin 395M Dec 11 20:16 1259.1
-rw------- 1 gpadmin gpadmin 20M Dec 11 20:16 15687137
-rw------- 1 gpadmin gpadmin 83M Dec 11 20:16 15687138

3.创建一个表，结构跟pg_class一致，建表的时候必须加上with oids
aligputf8=# create table cxf with oids as select * from pg_class limit 0;
SELECT 0
aligputf8=# create index cxf_pg_class_oid_index on cxf(oid);
CREATE INDEX
aligputf8=# create index cxf_pg_class_relname_nsp_index on cxf(relname, relnamespace);
CREATE INDEX
创建索引，由于启动数据库的时候他会去找pg_class，然后通过索引去查找记录，所以这里我们需要重建索引，最后也一起把底层文件给覆盖掉
   oid    |            relname             | relfilenode
----------+--------------------------------+-------------
19317362 | cxf                            |    19317362
19317367 | cxf_pg_class_oid_index         |    19317367
19317368 | cxf_pg_class_relname_nsp_index |    19317368
(3 rows)

可以看出两个表的字段信息跟字段内容是一致的
aligputf8=# select count(*) from pg_attribute where attrelid = 19317362;
count
-------
38
(1 row)

aligputf8=# select count(*) from pg_attribute where attrelid = 1259;
count
-------
38
(1 row)

4.查看pg_class现在的数据量
aligputf8=# select count(*) from pg_class;
count
--------
331799
(1 row)

5.将pg_class 导出成文件，然后再导入到cxf中
aligputf8=# copy pg_class to '/tmp/pg_class_cxf' with null as '' delimiter E'/5' oids;
COPY 331799
aligputf8=# copy cxf from '/tmp/pg_class_cxf' with null as '' delimiter E'/5' oids;
COPY 331799

6.关闭数据库，备份现有的pg_class数据文件跟索引文件，以免发生意外，然后替换底层的数据文件（必须关闭数据库，如果不关闭数据库，刚刚copy回去的信息可能还没有刷到硬盘中，这个时候覆盖原有的文件会有问题的，我之前试过，结果由于数据丢失，连pg_class表也找不到了，整个数据库都不能用了）。
$GPHOME/bin/pg_ctl -w -D /home/gpadmin/cxf/aligp-1/ -o " -E -i -p 5132 --silent-mode=true " stop
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>ll -h 19317362 19317367 19317368
-rw------- 1 gpadmin gpadmin 63M Dec 11 20:39 19317362
-rw------- 1 gpadmin gpadmin 9.8M Dec 11 20:39 19317367
-rw------- 1 gpadmin gpadmin 47M Dec 11 20:39 19317368

www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>mv 1259 1259.bak
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>mv 1259.1 1259.1.bak
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>mv 15687137 15687137.bak
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>mv 15687138 15687138.bak
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>cp 19317362 1259
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>cp 19317367 15687137
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>cp 19317368 15687138
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>

7.重启数据库，验证
www.linuxidc.com @linuxidc:/home/gpadmin/cxf/aligp-1/base/16384>PGOPTIONS="-c gp_session_role=utility" psql -E
psql (8.2.13)
Type "help" for help.

aligputf8=# select count(*) from pg_class;
count
--------
331799
(1 row)

aligputf8=# explain select * from pg_class where oid = 1259;
QUERY PLAN
---------------------------------------------------------------------------------------
Index Scan using pg_class_oid_index on pg_class (cost=0.00..200.58 rows=1 width=268)
Index Cond: oid = 1259::oid
(2 rows)

8.使用同样的方法给pg_attribute，这个时候直接insert就可以了，不用copy成外部表，因为这个表没有oid。
数据量由1.5G变成602M

数据库

文章转载自数据库平台优化，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

greenplum 数据库参数优化

评论