暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 通过old version tuple模拟update, delete闪回

原创 digoal 2022-01-20
414

作者

digoal

日期

2021-11-18

标签

PostgreSQL , old version tuple , update , delete


《PostgreSQL flashback(闪回) 功能实现与介绍》

《PostgreSQL 14 preview - corrupted tuple 修复功能 - pg_surgery》

http://www.df7cb.de/blog/2021/postgresql-undelete.html

Earlier this week, I updated pg_dirtyread to work with PostgreSQL 14. pg_dirtyread is a PostgreSQL extension that allows reading "dead" rows from tables, i.e. rows that have already been deleted, or updated. Of course that works only if the table has not been cleaned-up yet by a VACUUM command or autovacuum, which is PostgreSQL's garbage collection machinery.

Here's an example of pg_dirtyread in action:

# create table foo (id int, t text);  
CREATE TABLE  
# insert into foo values (1, 'Doc1');  
INSERT 0 1  
# insert into foo values (2, 'Doc2');  
INSERT 0 1  
# insert into foo values (3, 'Doc3');  
INSERT 0 1  
# select * from foo;  
 id │  t  
────┼──────  
  1 │ Doc1  
  2 │ Doc2  
  3 │ Doc3  
(3 rows)  
# delete from foo where id < 3;  
DELETE 2  
# select * from foo;  
 id │  t  
────┼──────  
  3 │ Doc3  
(1 row)  
复制

Oops! The first two documents have disappeared.

Now let's use pg_dirtyread to look at the table:

# create extension pg_dirtyread;  
CREATE EXTENSION  
# select * from pg_dirtyread('foo') t(id int, t text);  
 id │  t  
────┼──────  
  1 │ Doc1  
  2 │ Doc2  
  3 │ Doc3  
复制

All three documents are still there, but only one of them is visible.

pg_dirtyread can also show PostgreSQL's system colums with the row location and visibility information. For the first two documents, xmax is set, which means the row has been deleted:

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);  
 ctid  │ xmin │ xmax │ id │  t  
───────┼──────┼──────┼────┼──────  
 (0,1) │ 1577 │ 1580 │  1 │ Doc1  
 (0,2) │ 1578 │ 1580 │  2 │ Doc2  
 (0,3) │ 1579 │    0 │  3 │ Doc3  
(3 rows)  
复制

I always had plans to extend pg_dirtyread to include some "undelete" command to make deleted rows reappear, but never got around to trying that. But rows can already be restored by using the output of pg_dirtyread itself:

# insert into foo select * from pg_dirtyread('foo') t(id int, t text) where id = 1;  
复制

This is not a true "undelete", though - it just inserts new rows from the data read from the table.

Enter pg_surgery, which is a new PostgreSQL extension supplied with PostgreSQL 14. It contains two functions to "perform surgery on a damaged relation". As a side-effect, they can also make delete tuples reappear.

As I discovered now, one of the functions, heap_force_freeze(), works nicely with pg_dirtyread. It takes a list of ctids (row locations) that it marks "frozen", but at the same time as "not deleted".

Let's apply it to our test table, using the ctids that pg_dirtyread can read:

# create extension pg_surgery;  
CREATE EXTENSION  
# select heap_force_freeze('foo', array_agg(ctid))  
    from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text) where id = 1;  
 heap_force_freeze  
───────────────────  
(1 row)  
复制

Et voilà, our deleted document is back:

# select * from foo;  
 id │  t  
────┼──────  
  1 │ Doc1  
  3 │ Doc3  
(2 rows)  
# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);  
 ctid  │ xmin │ xmax │ id │  t  
───────┼──────┼──────┼────┼──────  
 (0,1) │    2 │    0 │  1 │ Doc1  
 (0,2) │ 1578 │ 1580 │  2 │ Doc2  
 (0,3) │ 1579 │    0 │  3 │ Doc3  
(3 rows)  
复制

Most importantly, none of the above methods will work if the data you just deleted has already been purged by VACUUM or autovacuum. These actively zero out reclaimed space. Restore from backup to get your data back.

Since both pg_dirtyread and pg_surgery operate outside the normal PostgreSQL MVCC machinery, it's easy to create corrupt data using them. This includes duplicated rows, duplicated primary key values, indexes being out of sync with tables, broken foreign key constraints, and others. You have been warned.

pg_dirtyread does not work (yet) if the deleted rows contain any toasted values. Possible other approaches include using pageinspect and pg_filedump to retrieve the ctids of deleted rows.

Please make sure you have working backups and don't need any of the above.

202107/20210715_03.md 《PostgreSQL 数据块恢复 - pg_filedump》
201703/20170310_03.md 《PostgreSQL 数据文件灾难恢复 - 解析与数据pg_filedump》
201105/20110526_01.md 《use pg_filedump dump block contents》
202102/20210215_03.md 《PostgreSQL 14 preview - pageinspect 新增gist索引inspect, 同时支持 lp_dead 展示》
202101/20210113_02.md 《PostgreSQL 14 preview - pageinspect 内窥heap,index存储结构 , 新增对gist索引的支持》
201809/20180919_02.md 《PostgreSQL pageinspect 诊断与优化GIN (倒排) 索引合并延迟导致的查询性能下降问题》
201105/20110527_02.md 《Use pageinspect EXTENSION view PostgreSQL Page's raw infomation》

期望 PostgreSQL 增加什么功能?

类似Oracle RAC架构的PostgreSQL已开源: 阿里云PolarDB for PostgreSQL云原生分布式开源数据库!

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论