PostgreSQL生成列(Generated Columns)

原创 yBmZlQzJ 2023-01-02

658

PG生成列(Generated Columns)
问题背景
偶然遇到一个项目需要优化慢SQL，其中的一个条件是and coalesce(zc.n_qxzc_zje,0)< zc.n_je可以看到就是判断一个表中两个金额的大小

该SQL不能走索引，并且还使用了coalesce函数

测试
测试例子如下

select
distinct crzgxB.c_jkdj_id
from
db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid
where
zc.n_zt = 20
and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100；

postgres=# explain (analyze,buffers)
postgres-# select
postgres-# distinct crzgxB.c_jkdj_id
postgres-# from db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid
postgres-# where
postgres-# zc.n_zt = 20
postgres-# and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100;
QUERY PLAN

HashAggregate (cost=115926.73…115928.73 rows=200 width=132) (actual time=1491.445…1491.445 rows=0 loops=1)
Group Key: crzgxb.c_jkdj_id
Buffers: shared hit=87551, temp read=14076 written=14074
-> Hash Join (cost=67835.54…115635.36 rows=116548 width=132) (actual time=1491.434…1491.434 rows=0 loops=1)
Hash Cond: (crzgxb.c_ywid = zc.c_bh)
Buffers: shared hit=87551, temp read=14076 written=14074
-> Seq Scan on test crzgxb (cost=0.00…41892.88 rows=1264388 width=165) (actual time=0.033…372.691 rows=1264388 loops=1)
Buffers: shared hit=29249
-> Hash (cost=67246.62…67246.62 rows=47114 width=33) (actual time=576.678…576.678 rows=107355 loops=1)
Buckets: 65536 (originally 65536) Batches: 2 (originally 1) Memory Usage: 3918kB
Buffers: shared hit=58302, temp written=347
-> Seq Scan on tbl zc (cost=0.00…67246.62 rows=47114 width=33) (actual time=3.073…541.440 rows=107355 loops=1)
Filter: ((n_zt = 20) AND ((COALESCE(n_qxzc_zje, ‘0’::numeric) - n_je) > ‘100’::numeric))
Rows Removed by Filter: 403766
Buffers: shared hit=58302
Planning time: 4.308 ms
Execution time: 1491.610 ms
(17 rows)
创建一个函数

create or replace function db_test.f_test(a numeric ,b numeric) returns int as $$
declare v_re int;
begin
v_re :=coalesce(a,0) -b;
RETURN v_re;
end;

language plpgsql IMMUTABLE; --创建索引 --create index i_test1_qxzc_zje on db_test.test1(db_test.f_test(n_qxzc_zje,n_je)); create index i_test1_qxzc_zje_n_zt on db_test.test1(db_test.f_add(n_qxzc_zje,n_je),n_zt); --改写sql如下： select distinct crzgxB.c_jkdj_id from db_test.test crzgxB inner join db_test.test1 zc on zc.c_bh = crzgxB.c_ywid where zc.n_zt = 20 and db_test.f_test(zc.n_qxzc_zje,zc.n_je)<0 postgres=# explain (analyze,buffers) postgres-# select postgres-# distinct crzgxB.c_jkdj_id postgres-# from db_test.test crzgxB inner join db_test.test1 zc on zc.c_bh = crzgxB.c_ywid postgres-# where postgres-# zc.n_zt = 20 postgres-# and db_test.f_add(zc.n_qxzc_zje,zc.n_je)>100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ HashAggregate (cost=122609.15..122611.15 rows=200 width=132) (actual time=768.911..768.911 rows=0 loops=1) Group Key: crzgxb.c_jkdj_id Buffers: shared hit=43011, temp read=14076 written=14074 -> Hash Join (cost=74517.96..122317.78 rows=116548 width=132) (actual time=768.906..768.906 rows=0 loops=1) Hash Cond: (crzgxb.c_ywid = zc.c_bh) Buffers: shared hit=43011, temp read=14076 written=14074 -> Seq Scan on test crzgxb (cost=0.00..41892.88 rows=1264388 width=165) (actual time=0.008..184.230 rows=1264388 loops=1) Buffers: shared hit=29249 -> Hash (cost=73929.04..73929.04 rows=47114 width=33) (actual time=91.050..91.050 rows=107355 loops=1) Buckets: 65536 (originally 65536) Batches: 2 (originally 1) Memory Usage: 3918kB Buffers: shared hit=13762, temp written=347 -> Bitmap Heap Scan on test1 zc (cost=3591.94..73929.04 rows=47114 width=33) (actual time=17.254..64.301 rows=107355 loops=1) Recheck Cond: ((db_test.f_add(n_qxzc_zje, n_je) > 100) AND (n_zt = 20)) Heap Blocks: exact=13173 Buffers: shared hit=13762 -> Bitmap Index Scan on i_test1_qxzc_zje_n_zt (cost=0.00..3580.16 rows=47114 width=0) (actual time=14.806..14.806 rows=107355 loops=1) Index Cond: ((db_test.f_add(n_qxzc_zje, n_je) > 100) AND (n_zt = 20)) Buffers: shared hit=589 Planning time: 0.435 ms Execution time: 769.009 ms (20 rows) 创建函数索引可以走索引，但是这个方法太复杂了，而且增加了维护难度生成列介绍 PG 12带有一个很棒的新功能--生成列(Generated Columns)，也就是计算列，在之前的版本也可以实现，但需要定义函数和触发器，利用该功能可以更容易使用并可以提升性能生成列是给表指定计算列，其数据可以根据其他列数据自动生成，当原数据更新时其自动更新在PostgreSQL 12+中生成列作为内置功能，可以在创建或修改表时指定列作为生成列，指定其内容通过表达式自动填充，可以为简单基于其他进行数学运算，或更复杂的函数。其优势包括：无需在插入或修改操作时需要应用代码负责生成数据，生成列作为计算列自动实现。在频繁的select语句避免计算处理时间。因为插入或修改时计算列已经填充，因此查询无需临时计算，但需要额外空间进行存储。因为更新原数据时，生成列数据自动更新，因此计算列的值正确性有了保障。在PostgreSQL 12+中仅STORED类型生成列有效。其他数据库类型，VIRTUAL类型的生成列也有效，该实现更类似于视图方式实现，在数据返回时进行实时计算。这里不讨论两者的优劣，业务未来版本会支持两种方式。生成列示例下面创建表演示生成列。我们指定计算列自动计算利润，依据销售价格和采购价格，公示为： profit = ((sale_price - purchase_price) * quantity_sold) 创建交易表transactions： CREATE TABLE public.transactions ( transactions_sid serial primary key, transaction_date timestamp with time zone DEFAULT now() NOT NULL, product_name character varying NOT NULL, purchase_price double precision NOT NULL, sale_price double precision NOT NULL, quantity_sold integer NOT NULL, profit double precision NOT NULL GENERATED ALWAYS AS ((sale_price - purchase_price) * quantity_sold) STORED ); 插入示例数据，假设交易表存储咖啡店的交易信息： INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1); INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4); INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6); SELECT * FROM public.transactions; transactions_sid transaction_date product_name purchase_price sale_price quantity_sold profit 1 2021-09-13 17:06:12 House Blend Coffee 5.0 11.99 1 6.99 2 2021-09-13 17:06:12 French Roast Coffee 6.0 12.99 4 27.96 3 2021-09-13 17:06:12 BULK: House Blend Coffee, 10LB 40.0 100.0 6 360.0 返回结果：图片我们更新数据验证是否会自动变化： UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3; SELECT * FROM public.transactions; 返回结果： transactions_sid transaction_date product_name purchase_price sale_price quantity_sold profit 1 2021/9/13 17:06 House Blend Coffee 5 11.99 1 6.99 2 2021/9/13 17:06 French Roast Coffee 6 12.99 4 27.96 3 2021/9/13 17:06 BULK: House Blend Coffee, 10LB 40 95 6 330 与期望一致，这种机制确保计算列值的正确性，无需额外的应用程序代码负责实现。注意：计算列（生成列）不能被直接插入或更新，否则会返回错误。生成列可以创建索引我们加压一些数据，然后观察查询，可以看到生成列上可以创建索引，并且查询生效 postgres=# insert into transactions(product_name,purchase_price,sale_price,quantity_sold) select 'ceshi',random()*100,random()*100,random()*10 from generate_series(1,100000); INSERT 0 100000 postgres=# select count(*) from transactions where profit >100; count ------- 53189 (1 row) postgres=# explain select * from transactions where profit >100; QUERY PLAN --------------------------------------------------------------------------------------------- Bitmap Heap Scan on transactions (cost=1129.16..3856.87 rows=53257 width=46) Recheck Cond: (profit > '100'::double precision) -> Bitmap Index Scan on i_t_transactions_profit (cost=0.00..1115.85 rows=53257 width=0) Index Cond: (profit > '100'::double precision) (4 rows) 总结本文介绍了PostgreSQL 12+ 生成列，可以方便实现计算列。数据库中存储计算列，便不需要查询的时候再去计算，也可以在计算列上创建索引

postgresql

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

PostgreSQL生成列(Generated Columns)

postgres=# explain (analyze,buffers)
postgres-# select
postgres-# distinct crzgxB.c_jkdj_id
postgres-# from db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid
postgres-# where
postgres-# zc.n_zt = 20
postgres-# and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100;
QUERY PLAN

评论

相关阅读

PostgreSQL生成列(Generated Columns)

postgres=# explain (analyze,buffers) postgres-# select postgres-# distinct crzgxB.c_jkdj_id postgres-# from db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid postgres-# where postgres-# zc.n_zt = 20 postgres-# and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100; QUERY PLAN

评论

相关阅读

postgres=# explain (analyze,buffers)
postgres-# select
postgres-# distinct crzgxB.c_jkdj_id
postgres-# from db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid
postgres-# where
postgres-# zc.n_zt = 20
postgres-# and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100;
QUERY PLAN