暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 大对象使用

digoal 2020-12-05
2475

作者

digoal

日期

2020-12-05

标签

PostgreSQL , 大对象 , 垃圾 , 服务端 , 客户端 , 风险 , 注入 , 文件操作


背景

大对象实际上就是结合了数据库的ACID(可靠性、一致性、隔离性、持久性)的操作系统文件操作API的数据库API封装.

创建文件、打开文件返回文件描述符FD、移动已打开文件位点、写入、读出、截断、删除文件、关闭FD 等.

对应大对象也有类似接口.

为什么数据库需要支持大对象呢?

1、存储单个大小超过1GB的文件. 因为bytea类型一行最多只能存储1GB.
2、方便超大文件数据内容的小范围编辑, 例如只修改其中的某些字节. 使用大对象可以seek, 写入或修改部分内容. 普通类型只能整个内容replace.
3、支持稀疏写入, 未写入数据的block不占用空间, 例如直接seed到10GB后写入1MB, 只占用1MB, 未写入的BLOCK在读取时返回zero bits.

注意: 大对象的使用接口分为客户端接口、数据库端接口.

客户端接口被封装在客户端驱动中, 例如libpq, 如果需要把客户端文件导入到数据库的大对象中, 需要使用客户端接口.

数据库端接口是数据库的SQL接口, 与libpq的接口功能对应, 但是export, import的sql接口操作的是数据库所在服务器的文件, 不是客户端的文件(切记).

熟悉PG COPY的小伙伴一定直到数据库端和客户端接口的区别:

《PostgreSQL 服务端COPY和客户端COPY - 暨PG有哪些服务端操作接口》

《PostgreSQL copy (quote,DELIMITER,...) single byte char 的输入》

PG大对象设计比较奇特, 大对象数据都存储在pg_largeobjects表中, 通过OID来引用, 即每创建一个大对象, 会分配一个OID, 以后要引用到这个大对象时, 使用它对应的OID来定位. 一个大对象可以被任意数量的记录和表引用.

所以在表里面, 我们用OID字段来引用, 有点类似我把图片存储在某个文件系统里, 这个表的字段存储内容是文件路径, 而不是图片文件的内容本身.

也因为这样, 通常我们在删除大对象时, 经常出现大对象的数据没被删, 删掉的只是存储了引用大对象OID的记录而已. OID还在, 对应大对象内容也还在.

user import large object -导入> pg_largeobjects(large obj)

user use normal table(oid) -引用> pg_largeobjects(large obj)

清理没有被引用的大对象: 找出所有大对象对应的OID, 搜索数据库里面包含OID数据类型的表, 如果在这些表中, 没有任何oid内容匹配大对象的OID, 说明这些OID没有被引用. 使用lo_unlink(oid)就可以删除对应的大对象数据.

还有一种方法避免产生大对象垃圾, 使用lo插件, 并在oid字段创建触发器, 在update, delete时触发, 自动调用lo_unlink(oid)清理不需要的大对象数据. https://www.postgresql.org/docs/current/lo.html

客户端驱动接口介绍

https://www.postgresql.org/docs/current/lo-interfaces.html

1、creates a new large object.

creates a new large object. The return value is the OID that was assigned to the new large object, or InvalidOid (zero) on failure.
mode is unused and ignored as of PostgreSQL 8.1;
however, for backward compatibility with earlier releases it is best to set it to INV_READ, INV_WRITE, or INV_READ | INV_WRITE.
(These symbolic constants are defined in the header file libpq/libpq-fs.h.)

Oid lo_creat(PGconn *conn, int mode);

SQL接口的lo_OPEN和lo_CREAT使用十进制代替mode值. (参考 libpq/libpq-fs.h 头文件)

```
postgres=> select x'20000'::int; -- 读
int4


131072
(1 row)

postgres=> select x'40000'::int; -- 写
int4


262144
(1 row)

postgres=> select 131072+262144; -- 读写
?column?


393216
(1 row)
```

建议直接使用下面这个lo_create.

Oid lo_create(PGconn *conn, Oid lobjId); --lobjId=0表示数据库自动分配未使用的OID并返回 inv_oid = lo_create(conn, desired_oid);

2、To import an operating system file as a large object, call

Oid lo_import(PGconn *conn, const char *filename); -- 数据库自动分配未使用的OID并返回 Oid lo_import_with_oid(PGconn *conn, const char *filename, Oid lobjId); -- 导入到指定的OID

import的是客户端的文件

3、export a large object into an operating system file, call

int lo_export(PGconn *conn, Oid lobjId, const char *filename);

export到客户端的文件

4、To open an existing large object for reading or writing, call

int lo_open(PGconn *conn, Oid lobjId, int mode); -- 模式参考头文件libpq/libpq-fs.h inv_fd = lo_open(conn, inv_oid, INV_READ|INV_WRITE);

文件描述符只能在事务中使用, 事务结束自动释放. 所以LO操作都需要在事务中完成.

5、writes len bytes from buf (which must be of size len) to large object descriptor fd.

int lo_write(PGconn *conn, int fd, const char *buf, size_t len); -- 建议一次写若干MB

6、reads up to len bytes from large object descriptor fd into buf (which must be of size len).

int lo_read(PGconn *conn, int fd, char *buf, size_t len); -- 建议一次读若干MB

7、To change the current read or write location associated with a large object descriptor, call

int lo_lseek(PGconn *conn, int fd, int offset, int whence); pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence); -- 超过2GB使用64位接口

从哪个位置开始移动

The valid values for whence are

SEEK_SET (seek from object start), SEEK_CUR (seek from current position), and SEEK_END (seek from object end).

8、To obtain the current read or write location of a large object descriptor, call

int lo_tell(PGconn *conn, int fd); pg_int64 lo_tell64(PGconn *conn, int fd); -- 超过2GB使用64位接口

9、To truncate a large object to a given length, call

int lo_truncate(PGcon *conn, int fd, size_t len); int lo_truncate64(PGcon *conn, int fd, pg_int64 len); -- 超过2GB使用64位接口

lo_truncate和SQL truncate意义一样, 但是lo_truncate支持指定截断多少字节, 而sql truncate只能整张表.

10、A large object descriptor can be closed by calling

int lo_close(PGconn *conn, int fd);

Any large object descriptors that remain open at the end of a transaction will be closed automatically.

事务结束, 打开的lo会自动关闭.

11、To remove a large object from the database, call

int lo_unlink(PGconn *conn, Oid lobjId);

数据库端SQL接口介绍

https://www.postgresql.org/docs/current/lo-funcs.html

postgres=> \df lo* List of functions Schema | Name | Result data type | Argument data types | Type ------------+-------------------------+------------------+------------------------------------------------------------------------------------------------------+------ pg_catalog | lo_close | integer | integer | func pg_catalog | lo_creat | oid | integer | func pg_catalog | lo_create | oid | oid | func pg_catalog | lo_export | integer | oid, text | func pg_catalog | lo_from_bytea | oid | oid, bytea | func pg_catalog | lo_get | bytea | oid | func pg_catalog | lo_get | bytea | oid, bigint, integer | func pg_catalog | lo_import | oid | text | func pg_catalog | lo_import | oid | text, oid | func pg_catalog | lo_lseek | integer | integer, integer, integer | func pg_catalog | lo_lseek64 | bigint | integer, bigint, integer | func pg_catalog | lo_open | integer | oid, integer | func pg_catalog | lo_put | void | oid, bigint, bytea | func pg_catalog | lo_tell | integer | integer | func pg_catalog | lo_tell64 | bigint | integer | func pg_catalog | lo_truncate | integer | integer, integer | func pg_catalog | lo_truncate64 | integer | integer, bigint | func pg_catalog | lo_unlink | integer | oid | func pg_catalog | loread | bytea | integer, integer | func pg_catalog | lowrite | integer | integer, bytea | func

loread 对应 lo_get, 但是loread需要先打开文件, 通过FD进行操作. lo_get封装了lo_open和loread可以直接读大对象.
lowrite 对应 lo_put, 但是lowrite需要先打开文件, 通过FD进行操作. lo_put封装了lo_open和lowrite可以直接写入大对象. 而且lo_put还封装了lo_lseek, 可以直接从目标位置写入.

1、Creates a large object and stores data in it. If loid is zero then the system will choose a free OID, otherwise that OID is used (with an error if some large object already has that OID). On success, the large object's OID is returned.

lo_from_bytea ( loid oid, data bytea ) → oid lo_from_bytea(0, '\xffffff00') → 24528

2、Writes data starting at the given offset within the large object; the large object is enlarged if necessary.

lo_put ( loid oid, offset bigint, data bytea ) → void lo_put(24528, 1, '\xaa')

3、Extracts the large object's contents, or a substring thereof.

lo_get ( loid oid [, offset bigint, length integer ] ) → bytea lo_get(24528, 0, 3) → \xffaaff

其他函数和客户端函数使用类似, 记住FD只能在事务中存活, 所以开启事务操作即可.

例子:

《PostgreSQL large row|column performance tuning case》

```
CREATE TABLE image (
name text,
raster oid
);

SELECT lo_creat(-1); -- returns OID of new, empty large object

SELECT lo_create(43213); -- attempts to create large object with OID 43213

或 SELECT lo_create(0); -- returns OID of new, empty large object

SELECT lo_unlink(173454); -- deletes large object with OID 173454

INSERT INTO image (name, raster)
VALUES ('beautiful image', lo_import('/etc/motd'));

INSERT INTO image (name, raster) -- same as above, but specify OID to use
VALUES ('beautiful image', lo_import('/etc/motd', 68583));

SELECT lo_export(image.raster, '/tmp/motd') FROM image
WHERE name = 'beautiful image';
```

只有超级用户有SQL接口(即数据库端大对象导出、导入函数)lo_export, lo_import的权限. 因为这两个函数可以读写文件系统. 必须数据库超级用户才能执行. 否则可能被攻击:

《[转] 关于入侵PostgreSQL的那些事儿(文件读取写入、命令执行的办法) -大对象》

《大对象 - 数据库common安全自动渗透测试与防范 - sqlmap》

《大对象攻击 - Hacking PostgreSQL》

大对象清理

dead 大对象清理 (未被任何引用的大对象清理)

方法1、直接调用lo_unlink

select lo_unlink(指定large oid);

方法2、使用vacuumlo

vacuumlo -n -v postgres Connected to database "postgres" Test run: no large objects will be removed! Checking lo in public.t_lob Checking lo in public.t_lob1 Would remove 0 large objects from database "postgres".

大对象权限

1、当lo_compat_privileges设置为ON时, 任意用户创建的larobject都可以被其他用户访问, 不需要赋予任何权限.

```
postgres=> select *,lomowner::regrole from pg_largeobject_metadata;
oid | lomowner | lomacl | lomowner
---------+----------+----------+----------
1093397 | 10 | | postgres
1093419 | 10 | | postgres
1093420 | 1093417 | {a=rw/a} | a
(3 rows)

postgres=# \c postgres b
You are now connected to database "postgres" as user "b".

postgres=> show lo_compat_privileges;
lo_compat_privileges


on
(1 row)

postgres=> select lo_open(1093420,393216);
lo_open


   0
复制

(1 row)
```

2、当lo_compat_privileges设置为OFF时, 赋予了对应权限的大对象, 才能读写. 否则只有owner或超级用户能读写.

GRANT { { SELECT | UPDATE } [, ...] | ALL [ PRIVILEGES ] } ON LARGE OBJECT loid [, ...] TO role_specification [, ...] [ WITH GRANT OPTION ]

lo_open will fail if SELECT privilege is not available for the large object, or if INV_WRITE is specified and UPDATE privilege is not available.
(Prior to PostgreSQL 11, these privilege checks were instead performed at the first actual read or write call using the descriptor.)
These privilege checks can be disabled with the lo_compat_privileges run-time parameter.

```
postgres=# \c postgres b
You are now connected to database "postgres" as user "b".
postgres=> show lo_compat_privileges;
lo_compat_privileges


off
(1 row)

-- lo_compat_privileges=off, 当large obj无权限, 报错
postgres=> select lo_open(1093420,393216);
ERROR: permission denied for large object 1093420

-- 赋权
postgres=> \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> grant select , update on LARGE OBJECT 1093420 to b;
GRANT

-- 赋权后, 可以读写
postgres=> \c postgres b
You are now connected to database "postgres" as user "b".
postgres=> select lo_open(1093420,393216);
lo_open


   0
复制

(1 row)
```

src/include/libpq/libpq-fs.h

```
/
* Read/write mode flags for inversion (large object) calls
/

define INV_WRITE 0x00020000

define INV_READ 0x00040000

```

```
postgres=> select x'20000'::int;
int4


131072
(1 row)

postgres=> select x'40000'::int;
int4


262144
(1 row)

postgres=> select 131072+262144;
?column?


393216
(1 row)
```

其他例子

```
postgres=> begin;
BEGIN
postgres=> select lo_create(0);
lo_create


982275
复制

(1 row)

postgres=> select lo_open(982275,393216);
lo_open


   0
复制

(1 row)

postgres=> select lo_truncate(0, 100000);
lo_truncate


       0
复制

(1 row)

postgres=> end;
COMMIT

postgres=> begin;
BEGIN
postgres=> select lo_open(982275,393216);
lo_open


   0
复制

(1 row)

postgres=> select loread(0,100);


\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
(1 row)

db1=> begin;
BEGIN
db1=> select lo_create(0);
lo_create


982277
复制

(1 row)

db1=> select lo_open(982277,393216);
lo_open


   0
复制

(1 row)

If len is greater than the large object's current length, the large object is extended to the specified length with null bytes ('\0'). On success, lo_truncate returns zero.
When dealing with large objects that might exceed 2GB in size, instead use
int lo_truncate64(PGcon *conn, int fd, pg_int64 len);

db1=> select lo_truncate64(0, 1000000000000);
lo_truncate64


         0
复制

(1 row)

db1=> end;
COMMIT

db1=> \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+------------+----------+---------+------------+---------------------------+---------+------------+--------------------------------------------
db1 | digoal | UTF8 | C | en_US.utf8 | | 15 MB | pg_default |
```

The chunks stored for a large object do not have to be contiguous. For example, if an application opens a new large object, seeks to offset 1000000, and writes a few bytes there, this does not result in allocation of 1000000 bytes worth of storage; only of chunks covering the range of data bytes actually written. A read operation will, however, read out zeroes for any unallocated locations preceding the last existing chunk. This corresponds to the common behavior of “sparsely allocated” files in Unix file systems.

参考

https://www.postgresql.org/docs/current/lo.html

https://www.postgresql.org/docs/current/largeobjects.html

《[转] 关于入侵PostgreSQL的那些事儿(文件读取写入、命令执行的办法) -大对象》

《大对象 - 数据库common安全自动渗透测试与防范 - sqlmap》

《大对象攻击 - Hacking PostgreSQL》

《PostgreSQL large row|column performance tuning case》

《PostgreSQL psql的元素周期表 - 包括大对象操作》

《在java中正确使用PostgreSQL大对象和字节流(bytea)类型的姿势》

《PostgreSQL 大对象或bytea存取pdf,jpg等文件 - PostgreSQL export regular file like pdf, word, text, doc stored in bytea type》

《大对象 - PostgreSQL 9.3 Add API for 64-bit large object access》

《大对象 - Use PostgreSQL server program import binary data into database bytea type》

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论