使用pg_bulkload比copy方式性能提升30%以上,数据量越大提升越明显!而且pg_bulkload命令不需要交互式了,方便写脚本。比较可惜的是不支持分区表的导入。
系统环境:
[postgres@lnpg o2p]$ lscpu|grep CPU CPU op-mode(s): 32-bit, 64-bit CPU(s): 8 On-line CPU(s) list: 0-7 CPU 系列: 6 型号名称: Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz CPU MHz: 2693.671 NUMA 节点0 CPU: 0-7 [postgres@lnpg o2p]$ free -g total used free shared buff/cache available Mem: 15 0 0 8 14 6 Swap: 7 0 7 [postgres@lnpg o2p]$ cat /etc/system-release CentOS Linux release 7.9.2009 (Core) [postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143 psql (15.1) Type "help" for help. resdb=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit (1 row) resdb=# SHOW server_version_num; server_version_num -------------------- 150001 (1 row) resdb=# SELECT current_setting('server_version_num'); current_setting ----------------- 150001 (1 row) resdb=# show server_version; server_version ---------------- 15.1 (1 row)
复制
安装pg_bulkload
下载地址:https://pgxn.org/dist/pg_bulkload/
root安装编译:
unzip pg_bulkload-3.1.20.zip cd pg_bulkload-3.1.20 make make install
复制
进入pg创建扩展插件:
[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143 psql (15.1) Type "help" for help. resdb=# create extension pg_bulkload; CREATE EXTENSION resdb=# \dx List of installed extensions Name | Version | Schema | Description ----------------+---------+------------+----------------------------------------------------------------- oracle_fdw | 1.2 | public | foreign data wrapper for Oracle access pg_bulkload | 1.0 | public | pg_bulkload is a high speed data loading utility for PostgreSQL pgrouting | 2.6.3 | public | pgRouting Extension plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language postgis | 3.3.1 | public | PostGIS geometry and geography spatial types and functions postgis_raster | 3.3.1 | public | PostGIS raster types and functions postgis_sfcgal | 3.3.1 | public | PostGIS SFCGAL functions (7 rows)
复制
测试copy和pg_bulkload结果:
测试记录3687361条,
copy使用10.32秒
pg_bulkload使用6.59秒
使用pg_bulkload比copy方式性能提升30%以上,数据量越大提升越明显!而且pg_bulkload命令不需要交互式了,方便写脚本。
测试记录如下:
[postgres@lnpg o2p]$ [postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143 psql (15.1) Type "help" for help. resdb=# truncate table jyc; TRUNCATE TABLE resdb=# \timing Timing is on. resdb=# select count(*) from jyc; count ------- 0 (1 row) Time: 0.991 ms resdb=# copy jyc from '/db/o2p/9.csv' with (FORMAT csv,quote '"', DELIMITER ',',HEADER false,encoding 'UTF8'); COPY 3687361 Time: 10321.795 ms (00:10.322) resdb=# select count(*) from jyc; count --------- 3687361 (1 row) Time: 199.546 ms resdb=# truncate table jyc; TRUNCATE TABLE Time: 156.815 ms resdb=# select count(*) from jyc; count ------- 0 (1 row) Time: 0.368 ms resdb=# \q [postgres@lnpg o2p]$ pg_bulkload -h 192.168.207.143 -p 5432 -P ./error.log -l ./input.log -i /db/o2p/9.csv -O jyc -d resdb -U gistar NOTICE: BULK LOAD START NOTICE: BULK LOAD END 0 Rows skipped. 3687361 Rows successfully loaded. 0 Rows not loaded due to parse errors. 0 Rows not loaded due to duplicate errors. 0 Rows replaced with new rows. [postgres@lnpg o2p]$ more input.log pg_bulkload 3.1.20 on 2023-02-12 13:12:57.084554+08 INPUT = /db/o2p/9.csv PARSE_BADFILE = /db/o2p/error.log LOGFILE = /db/o2p/input.log LIMIT = INFINITE PARSE_ERRORS = 0 CHECK_CONSTRAINTS = NO TYPE = CSV SKIP = 0 DELIMITER = , QUOTE = "\"" ESCAPE = "\"" NULL = OUTPUT = public.jyc MULTI_PROCESS = NO VERBOSE = NO WRITER = DIRECT DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212131257_resdb_public_jyc.dup.csv DUPLICATE_ERRORS = 0 ON_DUPLICATE_KEEP = NEW TRUNCATE = NO 0 Rows skipped. 3687361 Rows successfully loaded. 0 Rows not loaded due to parse errors. 0 Rows not loaded due to duplicate errors. 0 Rows replaced with new rows. Run began on 2023-02-12 13:12:57.084554+08 Run ended on 2023-02-12 13:13:04.64806+08 CPU 0.94s/5.86u sec elapsed 7.56 sec pg_bulkload 3.1.20 on 2023-02-12 13:17:54.893514+08 INPUT = /db/o2p/9.csv PARSE_BADFILE = /db/o2p/error.log LOGFILE = /db/o2p/input.log LIMIT = INFINITE PARSE_ERRORS = 0 CHECK_CONSTRAINTS = NO TYPE = CSV SKIP = 0 DELIMITER = , QUOTE = "\"" ESCAPE = "\"" NULL = OUTPUT = public.jyc MULTI_PROCESS = NO VERBOSE = NO WRITER = DIRECT DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212131754_resdb_public_jyc.dup.csv DUPLICATE_ERRORS = 0 ON_DUPLICATE_KEEP = NEW TRUNCATE = NO 0 Rows skipped. 3687361 Rows successfully loaded. 0 Rows not loaded due to parse errors. 0 Rows not loaded due to duplicate errors. 0 Rows replaced with new rows. Run began on 2023-02-12 13:17:54.893514+08 Run ended on 2023-02-12 13:18:02.73289+08 CPU 0.68s/5.92u sec elapsed 7.84 sec pg_bulkload 3.1.20 on 2023-02-12 13:20:28.948894+08 INPUT = /db/o2p/9.csv PARSE_BADFILE = /db/o2p/error.log LOGFILE = /db/o2p/input.log LIMIT = INFINITE PARSE_ERRORS = 0 CHECK_CONSTRAINTS = NO TYPE = CSV SKIP = 0 DELIMITER = , QUOTE = "\"" ESCAPE = "\"" NULL = OUTPUT = public.jyc MULTI_PROCESS = NO VERBOSE = NO WRITER = DIRECT DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212132028_resdb_public_jyc.dup.csv DUPLICATE_ERRORS = 0 ON_DUPLICATE_KEEP = NEW TRUNCATE = NO 0 Rows skipped. 3687361 Rows successfully loaded. 0 Rows not loaded due to parse errors. 0 Rows not loaded due to duplicate errors. 0 Rows replaced with new rows. Run began on 2023-02-12 13:20:28.948894+08 Run ended on 2023-02-12 13:20:35.540349+08 CPU 0.58s/5.87u sec elapsed 6.59 sec [postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143 psql (15.1) Type "help" for help. resdb=# select count(*) from jyc; count --------- 3687361 (1 row) resdb=#
复制
不支持分区表:
https://github.com/ossc-db/pg_bulkload/issues/58
最后修改时间:2023-02-14 10:36:03
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
王炸!OGG 23ai 终于支持从PostgreSQL备库抽取数据了
曹海峰
429次阅读
2025-03-09 12:54:06
明明想执行的SQL是DELETE、UPDATE,但为什么看到的是SELECT(FDW的实现原理解析)
小满未满、
371次阅读
2025-03-19 23:11:26
PostgreSQL初/中/高级认证考试(3.15)通过考生公示
开源软件联盟PostgreSQL分会
335次阅读
2025-03-20 09:50:36
IvorySQL 4.4 发布 - 基于 PostgreSQL 17.4,增强平台支持
通讯员
214次阅读
2025-03-20 15:31:04
套壳论
梧桐
209次阅读
2025-03-09 10:58:17
命名不规范,事后泪两行
xiongcc
193次阅读
2025-03-13 14:26:08
openHalo问世,全球首款基于PostgreSQL兼容MySQL协议的国产开源数据库
严少安
166次阅读
2025-04-07 12:14:29
PG vs MySQL 执行计划解读的异同点
进击的CJR
136次阅读
2025-03-21 10:50:08
版本发布| IvorySQL 4.4 发布
IvorySQL开源数据库社区
126次阅读
2025-03-13 09:52:33
宝藏PEV,助力你成为SQL优化高手
xiongcc
124次阅读
2025-03-09 23:34:23
热门文章
oracle19c rac grid安装报错的快速处理libasmclntsh19.ohso libasmperl19.ohso client_sharedlib' of makefile
2023-03-15 12827浏览
奇怪的ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
2021-11-26 8323浏览
oracle19c rac+asm-->oracle 19c single+fs的adg搭建(实战于生产)
2021-11-11 5776浏览
(1)一次失败的割接经历:麒麟linux6.5+oracle 12c rac打PSU补丁
2022-09-21 5676浏览
gpu p2p多卡训练运行不正常问题
2023-03-09 5281浏览
目录