作者
digoal
日期
2020-04-03
标签
PostgreSQL , undo , undam , access method
背景
Postgrespro出品的undo access method存储引擎. 非常轻量, 值得拥有, PG的AM接口必将大放异彩, 内存表、列存表、lsm表等等, 针对不同行业最适合的存储结构都可以通过am来支持.
This is PostgreSQL extension implementing UNDO storage based on table-AM API. The primary goal is to address write amplification problem. Right now Postgres MVCC implementation is creating new version of the object for each update and unused version are collected later by vacuum. Hot update mechanism is used to avoid insertion of new versions in indexes but it is applicable only when version is located at the same page as update record. Also hot updates does't eliminate table bloating on frequent updates.
Undam storage performs in-place update and stores old versions in undo chains. Also tuple is splitted in fixed size chunks linked in L1-list. So TOAST is not needed.
Undam relation is stored in two forks: main fork is used only for tuples headers (first tuple chunk). Tail chunks as well as undo versions are stored in the extension fork. Old versions are also linked in L1 UNDO list. This list is traversed by transaction which snapshot precedes creation of the last version and by vacuum.
Unfortunately current table-AM was developed mostly for hot-update model and doesn't allow to update individual indexes which keys are affected by update. It certainly limits advantages of Undam storage.
Undam storage is using the same visibility checking mechanism as standard PostgreSQL heap (based on tuple's XMIN/XMAX and snapshots).
Undam storage also requires vacuum which freeze old tuples and truncated unneeded undo chains.
Undam use fixed size allocator for relation chunks. Head of the listis stored in one of root pages of relations. To prevent this root page from becoming battelenck we use several lists (undam.alloc_chains), header of each is stored in its own root page. List is choosed randomely. List with index K is used for allocation of pages which (blockno % undam.alloc_chains) == K.
Undam storage is using generic WAL messages to provide ACID transaction behavior. As far as it never shift data on the page, generic WAL messages delta calculation mechanism is quite efficient in this case.
参考
https://github.com/postgrespro/undam
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.