暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

postgresql测试copy和pg_bulkload导入效率,使用pg_bulkload性能提升大于30%

原创 jieguo 2023-02-12
1492

使用pg_bulkload比copy方式性能提升30%以上,数据量越大提升越明显!而且pg_bulkload命令不需要交互式了,方便写脚本。比较可惜的是不支持分区表的导入。

系统环境:

[postgres@lnpg o2p]$ lscpu|grep CPU
CPU op-mode(s):        32-bit, 64-bit
CPU(s):                8
On-line CPU(s) list:   0-7
CPU 系列:          6
型号名称:        Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz
CPU MHz:             2693.671
NUMA 节点0 CPU:    0-7
[postgres@lnpg o2p]$ free -g
              total        used        free      shared  buff/cache   available
Mem:             15           0           0           8          14           6
Swap:             7           0           7
[postgres@lnpg o2p]$ cat /etc/system-release
CentOS Linux release 7.9.2009 (Core)
[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143
psql (15.1)
Type "help" for help.
resdb=# select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
resdb=# SHOW server_version_num;
 server_version_num 
--------------------
 150001
(1 row)

resdb=# SELECT current_setting('server_version_num');
 current_setting 
-----------------
 150001
(1 row)

resdb=# show server_version;
 server_version 
----------------
 15.1
(1 row)
复制

安装pg_bulkload

下载地址:https://pgxn.org/dist/pg_bulkload/
root安装编译:

unzip pg_bulkload-3.1.20.zip
cd pg_bulkload-3.1.20
make
make install

复制

进入pg创建扩展插件:

[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143
psql (15.1)
Type "help" for help.
resdb=# create extension pg_bulkload;
CREATE EXTENSION

resdb=# \dx
                                      List of installed extensions
      Name      | Version |   Schema   |                           Description                           
----------------+---------+------------+-----------------------------------------------------------------
 oracle_fdw     | 1.2     | public     | foreign data wrapper for Oracle access
 pg_bulkload    | 1.0     | public     | pg_bulkload is a high speed data loading utility for PostgreSQL
 pgrouting      | 2.6.3   | public     | pgRouting Extension
 plpgsql        | 1.0     | pg_catalog | PL/pgSQL procedural language
 postgis        | 3.3.1   | public     | PostGIS geometry and geography spatial types and functions
 postgis_raster | 3.3.1   | public     | PostGIS raster types and functions
 postgis_sfcgal | 3.3.1   | public     | PostGIS SFCGAL functions
(7 rows)

复制

测试copy和pg_bulkload结果:

测试记录3687361条,
copy使用10.32秒
image.png
pg_bulkload使用6.59秒
image.png
image.png

使用pg_bulkload比copy方式性能提升30%以上,数据量越大提升越明显!而且pg_bulkload命令不需要交互式了,方便写脚本。

测试记录如下:

[postgres@lnpg o2p]$ 
[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143
psql (15.1)
Type "help" for help.

resdb=# truncate table jyc;
TRUNCATE TABLE
resdb=# \timing
Timing is on.
resdb=# select count(*) from jyc;
 count 
-------
     0
(1 row)

Time: 0.991 ms
resdb=# copy jyc from '/db/o2p/9.csv' with (FORMAT csv,quote '"', DELIMITER ',',HEADER false,encoding 'UTF8');
COPY 3687361
Time: 10321.795 ms (00:10.322)
resdb=# select count(*) from jyc;
  count  
---------
 3687361
(1 row)

Time: 199.546 ms
resdb=# truncate table jyc;
TRUNCATE TABLE
Time: 156.815 ms
resdb=# select count(*) from jyc;
 count 
-------
     0
(1 row)

Time: 0.368 ms
resdb=# \q
[postgres@lnpg o2p]$ pg_bulkload -h 192.168.207.143 -p 5432 -P ./error.log -l ./input.log -i /db/o2p/9.csv -O jyc -d resdb -U gistar
NOTICE: BULK LOAD START
NOTICE: BULK LOAD END
        0 Rows skipped.
        3687361 Rows successfully loaded.
        0 Rows not loaded due to parse errors.
        0 Rows not loaded due to duplicate errors.
        0 Rows replaced with new rows.
[postgres@lnpg o2p]$ more input.log 

pg_bulkload 3.1.20 on 2023-02-12 13:12:57.084554+08

INPUT = /db/o2p/9.csv
PARSE_BADFILE = /db/o2p/error.log
LOGFILE = /db/o2p/input.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 0
DELIMITER = ,
QUOTE = "\""
ESCAPE = "\""
NULL = 
OUTPUT = public.jyc
MULTI_PROCESS = NO
VERBOSE = NO
WRITER = DIRECT
DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212131257_resdb_public_jyc.dup.csv
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = NO


  0 Rows skipped.
  3687361 Rows successfully loaded.
  0 Rows not loaded due to parse errors.
  0 Rows not loaded due to duplicate errors.
  0 Rows replaced with new rows.

Run began on 2023-02-12 13:12:57.084554+08
Run ended on 2023-02-12 13:13:04.64806+08

CPU 0.94s/5.86u sec elapsed 7.56 sec

pg_bulkload 3.1.20 on 2023-02-12 13:17:54.893514+08

INPUT = /db/o2p/9.csv
PARSE_BADFILE = /db/o2p/error.log
LOGFILE = /db/o2p/input.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 0
DELIMITER = ,
QUOTE = "\""
ESCAPE = "\""
NULL = 
OUTPUT = public.jyc
MULTI_PROCESS = NO
VERBOSE = NO
WRITER = DIRECT
DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212131754_resdb_public_jyc.dup.csv
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = NO


  0 Rows skipped.
  3687361 Rows successfully loaded.
  0 Rows not loaded due to parse errors.
  0 Rows not loaded due to duplicate errors.
  0 Rows replaced with new rows.

Run began on 2023-02-12 13:17:54.893514+08
Run ended on 2023-02-12 13:18:02.73289+08

CPU 0.68s/5.92u sec elapsed 7.84 sec

pg_bulkload 3.1.20 on 2023-02-12 13:20:28.948894+08

INPUT = /db/o2p/9.csv
PARSE_BADFILE = /db/o2p/error.log
LOGFILE = /db/o2p/input.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 0
DELIMITER = ,
QUOTE = "\""
ESCAPE = "\""
NULL = 
OUTPUT = public.jyc
MULTI_PROCESS = NO
VERBOSE = NO
WRITER = DIRECT
DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212132028_resdb_public_jyc.dup.csv
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = NO


  0 Rows skipped.
  3687361 Rows successfully loaded.
  0 Rows not loaded due to parse errors.
  0 Rows not loaded due to duplicate errors.
  0 Rows replaced with new rows.

Run began on 2023-02-12 13:20:28.948894+08
Run ended on 2023-02-12 13:20:35.540349+08

CPU 0.58s/5.87u sec elapsed 6.59 sec
[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143
psql (15.1)
Type "help" for help.

resdb=# select count(*) from jyc;
  count  
---------
 3687361
(1 row)

resdb=# 

复制

不支持分区表:
https://github.com/ossc-db/pg_bulkload/issues/58
image.png

最后修改时间:2023-02-14 10:36:03
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论