PostgreSQL 数据rotate用法介绍 - 按时间覆盖历史数据

digoal 2017-03-21

842

TAG 15 , TAG 18

作者

digoal

日期

2017-03-21

背景

在某些业务场景中，数据有冷热之分，例如业务只关心最近一天、一周或者一个月的数据。对于历史的数据可以丢弃。

比如某些指标的监控场景，保留一周的监控数据，历史的都可以丢弃。

如何丢弃历史数据？或者说如何实现rotate？

1. 使用delete, 删除7天前的数据。

delete from table where crt_time<=now()-interval '7 day';

这种方法会带来额外的开销，包括写REDO日志，垃圾回收等。如果删除的数据量很大，还需要重新收集统计信息，甚至收集不及时会导致统计信息不准确。

另一方面，还可能引入merge join的问题。

《PostgreSQL merge join 评估成本时可能会查询索引 - 硬解析务必引起注意 - 批量删除数据后, 未释放empty索引页导致mergejoin执行计划变慢 case》

2. 使用分区表，轮询使用，并且使用truncate清理分区。

这种方法可以避免DELETE带来的问题。

但是使用不当也可能引入新的问题: truncate是DDL操作，rename table也是DDL操作，建议对DDL加上锁超时，否则DDL等待会堵塞任意其他SQL。

交换表名时，需要一个临时表名，名字不能被占用。

下面就以分区表为例，讲一下数据rotate用法。

例子

以保留一周数据为例，看看第二种方法如何来实施。

得益于PostgreSQL支持DDL事务。

方法

一共9张表，一张主表，8张分区表，其中7个对应dow，还有一个对应默认分区(交换分区)。

dow分区使用约束，好处是查询时可以根据约束直接过滤分区。

1. 创建主表

create table test(id int primary key, info text, crt_time timestamp);

2. 创建分区

```
create table test0(like test including all, constraint ck check(extract(dow from crt_time)=0)) inherits(test);
create table test1(like test including all, constraint ck check(extract(dow from crt_time)=1)) inherits(test);
create table test2(like test including all, constraint ck check(extract(dow from crt_time)=2)) inherits(test);
create table test3(like test including all, constraint ck check(extract(dow from crt_time)=3)) inherits(test);
create table test4(like test including all, constraint ck check(extract(dow from crt_time)=4)) inherits(test);
create table test5(like test including all, constraint ck check(extract(dow from crt_time)=5)) inherits(test);
create table test6(like test including all, constraint ck check(extract(dow from crt_time)=6)) inherits(test);

create table test_def(like test including all) inherits(test);
```

3. SELECT/UPDATE/DELETE数据时，直接操作主表，代入时间条件，可以过滤分区

```
postgres=# explain select * from test where crt_time=now() and extract(dow from crt_time)=0; QUERY PLAN

Append (cost=0.00..65.20 rows=3 width=44) -> Seq Scan on test (cost=0.00..0.00 rows=1 width=44) Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision)) -> Seq Scan on test0 (cost=0.00..32.60 rows=1 width=44) Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision)) -> Seq Scan on test_def (cost=0.00..32.60 rows=1 width=44) Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision)) (7 rows) ```

4. INSERT时，建议程序根据crt_time的dow自动拼接表名,直接对分区表进行操作。

如果程序不想对分区进行操作，那么可以使用触发器或规则。

例子

``` create or replace function ins_tg() returns trigger as $$ declare begin case extract(dow from NEW.crt_time) when 0 then insert into test0 values (NEW.); when 1 then insert into test1 values (NEW.); when 2 then insert into test2 values (NEW.); when 3 then insert into test3 values (NEW.); when 4 then insert into test4 values (NEW.); when 5 then insert into test5 values (NEW.); when 6 then insert into test6 values (NEW.*); end case; return null; end; $$ language plpgsql strict;

create trigger tg before insert on test for each row execute procedure ins_tg();

insert into test values (1,'test',now()+interval '1 day'); insert into test values (1,'test',now()+interval '2 day'); insert into test values (1,'test',now()+interval '3 day'); insert into test values (1,'test',now()+interval '4 day'); insert into test values (1,'test',now()+interval '5 day'); insert into test values (1,'test',now()+interval '6 day'); insert into test values (1,'test',now()+interval '7 day');

rotate用法(每天在空闲时间点处理一次即可，DDL支持事务，如果事务失败，可重新发起重试)

在一个事务中完成如下动作

1. 计算明天的dow

2. 清除test_def约束

3. 清除test_def数据

4. test_def重命名test_def_tmp(一个不存在的表名)

5. 明天的分区表，重命名为test_def

6. test_def_tmp添加约束

7. test_def_tmp重命名为明天的分区

例子

```
1. 开始事务
begin;

设置锁超时
set lock_timeout = '60s';
查询明天的dow
select extract(dow from current_date+1);
date_part

(1 row)

清除test_def约束
alter table test_def drop constraint IF EXISTS ck;
清除test_def数据
truncate test_def;
重命名test_def
alter table test_def rename to test_def_tmp;
重命名明天的分区表
alter table test3 rename to test_def;
test_def_tmp添加约束(明天)
alter table test_def_tmp add constraint ck check(extract(dow from crt_time)=3);
test_def_tmp重命名为明天的分区
alter table test_def_tmp rename to test3;
提交或回滚 commit;

如果失败，回滚事务。
rollback; ```

注意事项:

1. 锁超时

2. 事务失败注意回滚

3. 中间表名必须不存在

4. 约束名统一

小结

使用DELETE的方法清除历史数据，会带来额外的开销，包括写REDO日志，垃圾回收等。如果删除的数据量很大，还需要重新收集统计信息，甚至收集不及时会导致统计信息不准确。

另一方面，还可能引入merge join的问题。

《PostgreSQL merge join 评估成本时可能会查询索引 - 硬解析务必引起注意 - 批量删除数据后, 未释放empty索引页导致mergejoin执行计划变慢 case》

因为PostgreSQL支持DDL封装在事务中，所以也可以使用分区表，轮询使用，并且使用truncate清理分区。

这种方法可以避免DELETE带来的问题。

但是使用不当也可能引入新的问题: truncate是DDL操作，rename table也是DDL操作，建议对DDL加上锁超时，否则DDL等待会堵塞任意其他SQL。

交换表名时，需要一个临时表名，名字不能被占用。

祝使用愉快。

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL 数据rotate用法介绍 - 按时间覆盖历史数据

TAG 15 , TAG 18

作者

日期

标签

背景

例子

方法

rotate用法(每天在空闲时间点处理一次即可，DDL支持事务，如果事务失败，可重新发起重试)

小结

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论