作者
digoal
日期
2019-10-11
标签
PostgreSQL , postgres_fdw , file_fdw , fdw , 外部表 , 同步 , 迁移 , 汇聚 , 归档 , 冷热分离 , etl , SHARDING
背景
PostgreSQL fdw是一种外部访问接口,可以在PG数据库中创建外部表,用户访问的时候与访问本地表的方法一样,支持增删改查。
而数据则是存储在外部,外部可以是一个远程的pg数据库或者其他数据库(mysql, oracle等),又或者是文件等。
这是目前支持的fdw远程数据源汇总:
https://wiki.postgresql.org/wiki/Foreign_data_wrappers
为什么PostgreSQL支持这么多外部数据源?开放了fdw的开发接口。
https://www.postgresql.org/docs/12/fdwhandler.html
通常FDW可以被应用于什么场景?
1、sharding
2、
同步数据,
etl,
迁移数据,
汇聚,
归档数据,
冷热分离存储(阿里rds pg, OSS_FDW)
3、偶尔需要访问外部数据(dba,分析师)
postgres_fdw的使用举例
场景:
```
local db1 -> remote db(user1) (table1, table2)
postgres=# create database db01;
CREATE DATABASE
postgres=# create database db02;
CREATE DATABASE
postgres=# create role user1 superuser login encrypted password 'digoal';
CREATE ROLE
postgres=# create role user2 superuser login encrypted password 'digoal';
CREATE ROLE
db02=# create table table1 (id int, crt_Time timestamp, info text, c1 int);
CREATE TABLE
db02=# create table table2 (id int, crt_Time timestamp, info text, c1 int);
CREATE TABLE
db02=# insert into table1 select generate_series(1,1000000), clock_timestamp(), md5(random()::text), random()1000;
INSERT 0 1000000
db02=# insert into table2 select generate_series(1,1000000), clock_timestamp(), md5(random()::text), random()1000;
INSERT 0 1000000
```
1、前提条件
网络通:本地数据库可以访问远程目标库
数据库防火墙(pg_hba.conf):远程数据库防火墙,允许本地数据库发起的访问
远程库用户:必须有远程数据库用户和账号(或者是trust访问控制)
权限:远程库提供的用户,必须有访问目标表、视图、物化视图、外部表的权限
2、安装插件
db02=# \c db01 user1
You are now connected to database "db01" as user "user1".
db01=# create extension postgres_fdw;
CREATE EXTENSION
3、创建远程-server
```
CREATE SERVER db02
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '127.0.0.1', port '1921', dbname 'db02');
db01=# select * from pg_foreign_server ;
srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions
---------+-----------+-----------+---------+------------+--------+----------------------------------------
db02 | 158005434 | 158005453 | | | | {host=127.0.0.1,port=1921,dbname=db02}
(1 row)
```
4、配置远程访问用户密码-mapping
db01=# CREATE USER MAPPING FOR user1
db01-# SERVER db02
db01-# OPTIONS (user 'user2', password 'digoal');
CREATE USER MAPPING
注意权限设计的理念
mapping (local user -> remote server -> remote user-pwd)
5、创建foreign table
方法一批量导入:
```
db01=# \h import
Command: IMPORT FOREIGN SCHEMA
Description: import table definitions from a foreign server
Syntax:
IMPORT FOREIGN SCHEMA remote_schema
[ { LIMIT TO | EXCEPT } ( table_name [, ...] ) ]
FROM SERVER server_name
INTO local_schema
[ OPTIONS ( option 'value' [, ... ] ) ]
db01=# import foreign schema public from server db02 into ft;
IMPORT FOREIGN SCHEMA
db01=# \det ft.*
List of foreign tables
Schema | Table | Server
--------+--------+--------
ft | table1 | db02
ft | table2 | db02
(2 rows)
```
方法二单个创建:
```
db01=# drop foreign table ft.table1;
DROP FOREIGN TABLE
db01=# CREATE FOREIGN TABLE ft.table1 (
db01(# id int, crt_Time timestamp, info text, c1 int
db01(# )
db01-# SERVER db02
db01-# OPTIONS (schema_name 'public', table_name 'table1');
CREATE FOREIGN TABLE
```
6、操作foreign table
```
db01=# select count(*) from ft.table1;
count
1000000
(1 row)
db01=# select count(*) from ft.table2;
count
1000000
(1 row)
```
7、查询远程sql: EXPLAIN VERBOSE
```
db01=# explain verbose select count(*) from ft.table2;
QUERY PLAN
Foreign Scan (cost=107.40..108.20 rows=1 width=8)
Output: (count())
Relations: Aggregate on (ft.table2)
Remote SQL: SELECT count() FROM public.table2
(4 rows)
```
8、了解pushdown, pull, parallel, dml, conn pool|cache, 事务
什么可以push down?
built-in data types, IMMUTABLE operators, or IMMUTABLE functions
8.1、projection
```
db01=# explain verbose select id from ft.table2;
QUERY PLAN
Foreign Scan on ft.table2 (cost=100.00..130.50 rows=2925 width=4)
Output: id
Remote SQL: SELECT id FROM public.table2
(3 rows)
```
8.2、where
```
db01=# explain verbose select * from ft.table1 where id=1;
QUERY PLAN
Foreign Scan on ft.table1 (cost=100.00..2604.07 rows=1 width=49)
Output: id, crt_time, info, c1
Remote SQL: SELECT id, crt_time, info, c1 FROM public.table1 WHERE ((id = 1))
(3 rows)
```
8.3、agg
```
db01=# explain verbose select count(*) from ft.table1;
QUERY PLAN
Foreign Scan (cost=2267.49..2604.07 rows=1 width=8)
Output: (count())
Relations: Aggregate on (ft.table1)
Remote SQL: SELECT count() FROM public.table1
(4 rows)
```
8.4、join (cbo)
```
db01=# explain verbose select t1.* from ft.table1 t1 inner join ft.table2 using (id) limit 2;
QUERY PLAN
Limit (cost=100.00..4434.99 rows=2 width=49)
Output: t1.id, t1.crt_time, t1.info, t1.c1
-> Foreign Scan (cost=100.00..2167496388.07 rows=1000000 width=49)
Output: t1.id, t1.crt_time, t1.info, t1.c1
Relations: (ft.table1 t1) INNER JOIN (ft.table2)
Remote SQL: SELECT r1.id, r1.crt_time, r1.info, r1.c1 FROM (public.table1 r1 INNER JOIN public.table2 r2 ON (((r1.id = r2.id))))
(6 rows)
```
8.5、limit
```
db01=# explain verbose select * from ft.table2 limit 10;
QUERY PLAN
Limit (cost=100.00..100.11 rows=10 width=49)
Output: id, crt_time, info, c1
-> Foreign Scan on ft.table2 (cost=100.00..10625.41 rows=1000000 width=49)
Output: id, crt_time, info, c1
Remote SQL: SELECT id, crt_time, info, c1 FROM public.table2
(5 rows)
fetch_size相关
```
8.6、sort
```
db01=# explain verbose select * from ft.table2 order by id desc limit 10;
QUERY PLAN
Limit (cost=100.00..100.11 rows=10 width=49)
Output: id, crt_time, info, c1
-> Foreign Scan on ft.table2 (cost=100.00..10692.73 rows=1000000 width=49)
Output: id, crt_time, info, c1
Remote SQL: SELECT id, crt_time, info, c1 FROM public.table2 ORDER BY id DESC NULLS FIRST
(5 rows)
```
9、控制参数
foreign table, server : fetch_size
foreign server : extensions
```
db01=# alter server db02 options (add extensions 'dblink', fetch_size '200');
WARNING: extension "dblink" is not installed
ALTER SERVER
db01=# alter server db02 options (set extensions 'dblink', set fetch_size '200');
WARNING: extension "dblink" is not installed
ALTER SERVER
db01=# alter server db02 options (drop extensions );
ALTER SERVER
db01=# alter server db02 options (drop fetch_size );
ALTER SERVER
```
10、pull
```
db01=# create table t as select * from ft.table1;
SELECT 1000000
db01=# explain verbose select count(*) from t join ft.table1 t1 on (t.id=t1.id and t1.c1=1); QUERY PLAN
Aggregate (cost=5424.74..5424.74 rows=1 width=8) Output: count(*) -> Hash Join (cost=2848.88..5422.59 rows=994 width=0) Hash Cond: (t1.id = t.id) -> Foreign Scan on ft.table1 t1 (cost=100.00..2614.19 rows=994 width=4) Output: t1.id Remote SQL: SELECT id FROM public.table1 WHERE ((c1 = 1)) -> Hash (cost=336.57..336.57 rows=1000000 width=4) Output: t.id -> Seq Scan on public.t (cost=0.00..336.57 rows=1000000 width=4) Output: t.id (11 rows) ```
11、parallel
https://www.postgresql.org/docs/12/fdwhandler.html
12、dml
```
db01=# explain verbose update ft.table1 set crt_time=now() where id=1;
QUERY PLAN
Update on ft.table1 (cost=100.00..2604.07 rows=1 width=55)
Remote SQL: UPDATE public.table1 SET crt_time = $2 WHERE ctid = $1
-> Foreign Scan on ft.table1 (cost=100.00..2604.07 rows=1 width=55)
Output: id, now(), info, c1, ctid
Remote SQL: SELECT id, info, c1, ctid FROM public.table1 WHERE ((id = 1)) FOR UPDATE
(5 rows)
```
13、环境变量
-
In the remote sessions opened by postgres_fdw, the search_path parameter is set to just pg_catalog
-
postgres_fdw likewise establishes remote session settings for various parameters:
TimeZone is set to UTC
DateStyle is set to ISO
IntervalStyle is set to postgres
extra_float_digits is set to 3 for remote servers 9.0 and newer and is set to 2 for older versions
14、连接池-conn pool|cache
-
postgres_fdw establishes a connection to a foreign server during the first query that uses a foreign table associated with the foreign server. This connection is kept and re-used for subsequent queries in the same session.
-
However, if multiple user identities (user mappings) are used to access the foreign server, a connection is established for each user mapping.
session 1: 访问了这些server的外部表:rmt_server1(user1), rmt_server2(user), rmt_server1(user1), rmt_server1(user2) 几个链接?
15、事务
-
During a query that references any remote tables on a foreign server, postgres_fdw opens a transaction on the remote server if one is not already open corresponding to the current local transaction.
-
The remote transaction is committed or aborted when the local transaction commits or aborts.
-
Savepoints are similarly managed by creating corresponding remote savepoints.
-
The remote transaction uses SERIALIZABLE isolation level when the local transaction has SERIALIZABLE isolation level;
-
otherwise it uses REPEATABLE READ isolation level.
16、性能:
1亿tpcb
```
pgbench -i -s 1000 db02
db01=# import foreign schema public except (table1,table2) from server db02 into public;
IMPORT FOREIGN SCHEMA
db01=# \det
List of foreign tables
Schema | Table | Server
--------+------------------+--------
public | pgbench_accounts | db02
public | pgbench_branches | db02
public | pgbench_history | db02
public | pgbench_tellers | db02
(4 rows)
```
直接操作表
```
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 120 -S db02
transaction type:
scaling factor: 1000
query mode: prepared
number of clients: 32
number of threads: 32
duration: 120 s
number of transactions actually processed: 54167206
latency average = 0.071 ms
latency stddev = 0.014 ms
tps = 451390.893220 (including connections establishing)
tps = 451509.501209 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.070 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
```
操作postgres_fdw外表
```
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 120 -S db01 -U user1
transaction type:
scaling factor: 1000
query mode: prepared
number of clients: 32
number of threads: 32
duration: 120 s
number of transactions actually processed: 5917365
latency average = 0.649 ms
latency stddev = 0.099 ms
tps = 49310.904457 (including connections establishing)
tps = 49330.503336 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid random(1, 100000 * :scale)
0.647 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
```
why?
xact
declare
open cursor
fetch
close cursor
end
参考
https://www.postgresql.org/docs/12/postgres-fdw.html
https://www.postgresql.org/docs/12/libpq-connect.html#LIBPQ-PARAMKEYWORDS
https://www.postgresql.org/docs/12/sql-createextension.html
https://www.postgresql.org/docs/12/sql-createserver.html
https://www.postgresql.org/docs/12/sql-createusermapping.html
https://www.postgresql.org/docs/12/sql-createforeigntable.html
https://www.postgresql.org/docs/12/sql-importforeignschema.html
https://wiki.postgresql.org/wiki/Foreign_data_wrappers
https://www.postgresql.org/docs/12/fdwhandler.html
《使用SQL查询数据库日志 - file_fdw , csvlog , program , find - 1》
《使用SQL查询数据库日志 - file_fdw , csvlog , program , find - 2》
《[未完待续] PostgreSQL PRO 特性 - sharding(pg_shardman)》
https://github.com/postgrespro/pg_shardman
《PostgreSQL 9.6 sharding based on FDW & pg_pathman》
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.