PG生成列(Generated Columns)
问题背景
偶然遇到一个项目需要优化慢SQL,其中的一个条件是and coalesce(zc.n_qxzc_zje,0)< zc.n_je可以看到就是判断一个表中两个金额的大小
该SQL不能走索引,并且还使用了coalesce函数
测试
测试例子如下
select
distinct crzgxB.c_jkdj_id
from
db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid
where
zc.n_zt = 20
and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100;
postgres=# explain (analyze,buffers)
postgres-# select
postgres-# distinct crzgxB.c_jkdj_id
postgres-# from db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid
postgres-# where
postgres-# zc.n_zt = 20
postgres-# and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100;
QUERY PLAN
HashAggregate (cost=115926.73…115928.73 rows=200 width=132) (actual time=1491.445…1491.445 rows=0 loops=1)
Group Key: crzgxb.c_jkdj_id
Buffers: shared hit=87551, temp read=14076 written=14074
-> Hash Join (cost=67835.54…115635.36 rows=116548 width=132) (actual time=1491.434…1491.434 rows=0 loops=1)
Hash Cond: (crzgxb.c_ywid = zc.c_bh)
Buffers: shared hit=87551, temp read=14076 written=14074
-> Seq Scan on test crzgxb (cost=0.00…41892.88 rows=1264388 width=165) (actual time=0.033…372.691 rows=1264388 loops=1)
Buffers: shared hit=29249
-> Hash (cost=67246.62…67246.62 rows=47114 width=33) (actual time=576.678…576.678 rows=107355 loops=1)
Buckets: 65536 (originally 65536) Batches: 2 (originally 1) Memory Usage: 3918kB
Buffers: shared hit=58302, temp written=347
-> Seq Scan on tbl zc (cost=0.00…67246.62 rows=47114 width=33) (actual time=3.073…541.440 rows=107355 loops=1)
Filter: ((n_zt = 20) AND ((COALESCE(n_qxzc_zje, ‘0’::numeric) - n_je) > ‘100’::numeric))
Rows Removed by Filter: 403766
Buffers: shared hit=58302
Planning time: 4.308 ms
Execution time: 1491.610 ms
(17 rows)
创建一个函数
create or replace function db_test.f_test(a numeric ,b numeric) returns int as $$
declare v_re int;
begin
v_re :=coalesce(a,0) -b;
RETURN v_re;
end;
language plpgsql IMMUTABLE;
--创建索引
--create index i_test1_qxzc_zje on db_test.test1(db_test.f_test(n_qxzc_zje,n_je));
create index i_test1_qxzc_zje_n_zt on db_test.test1(db_test.f_add(n_qxzc_zje,n_je),n_zt);
--改写sql如下:
select
distinct crzgxB.c_jkdj_id
from
db_test.test crzgxB inner join db_test.test1 zc on zc.c_bh = crzgxB.c_ywid
where
zc.n_zt = 20
and db_test.f_test(zc.n_qxzc_zje,zc.n_je)<0
postgres=# explain (analyze,buffers)
postgres-# select
postgres-# distinct crzgxB.c_jkdj_id
postgres-# from db_test.test crzgxB inner join db_test.test1 zc on zc.c_bh = crzgxB.c_ywid
postgres-# where
postgres-# zc.n_zt = 20
postgres-# and db_test.f_add(zc.n_qxzc_zje,zc.n_je)>100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=122609.15..122611.15 rows=200 width=132) (actual time=768.911..768.911 rows=0 loops=1)
Group Key: crzgxb.c_jkdj_id
Buffers: shared hit=43011, temp read=14076 written=14074
-> Hash Join (cost=74517.96..122317.78 rows=116548 width=132) (actual time=768.906..768.906 rows=0 loops=1)
Hash Cond: (crzgxb.c_ywid = zc.c_bh)
Buffers: shared hit=43011, temp read=14076 written=14074
-> Seq Scan on test crzgxb (cost=0.00..41892.88 rows=1264388 width=165) (actual time=0.008..184.230 rows=1264388 loops=1)
Buffers: shared hit=29249
-> Hash (cost=73929.04..73929.04 rows=47114 width=33) (actual time=91.050..91.050 rows=107355 loops=1)
Buckets: 65536 (originally 65536) Batches: 2 (originally 1) Memory Usage: 3918kB
Buffers: shared hit=13762, temp written=347
-> Bitmap Heap Scan on test1 zc (cost=3591.94..73929.04 rows=47114 width=33) (actual time=17.254..64.301 rows=107355 loops=1)
Recheck Cond: ((db_test.f_add(n_qxzc_zje, n_je) > 100) AND (n_zt = 20))
Heap Blocks: exact=13173
Buffers: shared hit=13762
-> Bitmap Index Scan on i_test1_qxzc_zje_n_zt (cost=0.00..3580.16 rows=47114 width=0) (actual time=14.806..14.806 rows=107355 loops=1)
Index Cond: ((db_test.f_add(n_qxzc_zje, n_je) > 100) AND (n_zt = 20))
Buffers: shared hit=589
Planning time: 0.435 ms
Execution time: 769.009 ms
(20 rows)
创建函数索引可以走索引,但是这个方法太复杂了,而且增加了维护难度
生成列介绍
PG 12带有一个很棒的新功能--生成列(Generated Columns),也就是计算列,在之前的版本也可以实现,但需要定义函数和触发器,利用该功能可以更容易使用并可以提升性能
生成列是给表指定计算列,其数据可以根据其他列数据自动生成,当原数据更新时其自动更新
在PostgreSQL 12+中生成列作为内置功能,可以在创建或修改表时指定列作为生成列,指定其内容通过表达式自动填充,可以为简单基于其他进行数学运算,或更复杂的函数。其优势包括:
无需在插入或修改操作时需要应用代码负责生成数据,生成列作为计算列自动实现。
在频繁的select语句避免计算处理时间。因为插入或修改时计算列已经填充,因此查询无需临时计算,但需要额外空间进行存储。
因为更新原数据时,生成列数据自动更新,因此计算列的值正确性有了保障。
在PostgreSQL 12+中仅STORED类型生成列有效。其他数据库类型,VIRTUAL类型的生成列也有效,该实现更类似于视图方式实现,在数据返回时进行实时计算。这里不讨论两者的优劣,业务未来版本会支持两种方式。
生成列示例
下面创建表演示生成列。我们指定计算列自动计算利润,依据销售价格和采购价格,公示为:
profit = ((sale_price - purchase_price) * quantity_sold)
创建交易表transactions:
CREATE TABLE public.transactions (
transactions_sid serial primary key,
transaction_date timestamp with time zone DEFAULT now() NOT NULL,
product_name character varying NOT NULL,
purchase_price double precision NOT NULL,
sale_price double precision NOT NULL,
quantity_sold integer NOT NULL,
profit double precision NOT NULL GENERATED ALWAYS AS ((sale_price - purchase_price) * quantity_sold) STORED
);
插入示例数据,假设交易表存储咖啡店的交易信息:
INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1);
INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4);
INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6);
SELECT * FROM public.transactions;
transactions_sid transaction_date product_name purchase_price sale_price quantity_sold profit
1 2021-09-13 17:06:12 House Blend Coffee 5.0 11.99 1 6.99
2 2021-09-13 17:06:12 French Roast Coffee 6.0 12.99 4 27.96
3 2021-09-13 17:06:12 BULK: House Blend Coffee, 10LB 40.0 100.0 6 360.0
返回结果:
图片
我们更新数据验证是否会自动变化:
UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3;
SELECT * FROM public.transactions;
返回结果:
transactions_sid transaction_date product_name purchase_price sale_price quantity_sold profit
1 2021/9/13 17:06 House Blend Coffee 5 11.99 1 6.99
2 2021/9/13 17:06 French Roast Coffee 6 12.99 4 27.96
3 2021/9/13 17:06 BULK: House Blend Coffee, 10LB 40 95 6 330
与期望一致,这种机制确保计算列值的正确性,无需额外的应用程序代码负责实现。
注意:计算列(生成列)不能被直接插入或更新,否则会返回错误。
生成列可以创建索引
我们加压一些数据,然后观察查询,可以看到生成列上可以创建索引,并且查询生效
postgres=# insert into transactions(product_name,purchase_price,sale_price,quantity_sold) select 'ceshi',random()*100,random()*100,random()*10 from generate_series(1,100000);
INSERT 0 100000
postgres=# select count(*) from transactions where profit >100;
count
-------
53189
(1 row)
postgres=# explain select * from transactions where profit >100;
QUERY PLAN
---------------------------------------------------------------------------------------------
Bitmap Heap Scan on transactions (cost=1129.16..3856.87 rows=53257 width=46)
Recheck Cond: (profit > '100'::double precision)
-> Bitmap Index Scan on i_t_transactions_profit (cost=0.00..1115.85 rows=53257 width=0)
Index Cond: (profit > '100'::double precision)
(4 rows)
总结
本文介绍了PostgreSQL 12+ 生成列,可以方便实现计算列。数据库中存储计算列,便不需要查询的时候再去计算,也可以在计算列上创建索引