暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL生成列(Generated Columns)

原创 yBmZlQzJ 2023-01-02
658

PG生成列(Generated Columns)
问题背景
偶然遇到一个项目需要优化慢SQL,其中的一个条件是and coalesce(zc.n_qxzc_zje,0)< zc.n_je可以看到就是判断一个表中两个金额的大小

该SQL不能走索引,并且还使用了coalesce函数

测试
测试例子如下

select
distinct crzgxB.c_jkdj_id
from
db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid
where
zc.n_zt = 20
and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100;

postgres=# explain (analyze,buffers)
postgres-# select
postgres-# distinct crzgxB.c_jkdj_id
postgres-# from db_test.test crzgxB inner join db_test.tbl zc on zc.c_bh = crzgxB.c_ywid
postgres-# where
postgres-# zc.n_zt = 20
postgres-# and coalesce(zc.n_qxzc_zje,0)- zc.n_je>100;
QUERY PLAN

HashAggregate (cost=115926.73…115928.73 rows=200 width=132) (actual time=1491.445…1491.445 rows=0 loops=1)
Group Key: crzgxb.c_jkdj_id
Buffers: shared hit=87551, temp read=14076 written=14074
-> Hash Join (cost=67835.54…115635.36 rows=116548 width=132) (actual time=1491.434…1491.434 rows=0 loops=1)
Hash Cond: (crzgxb.c_ywid = zc.c_bh)
Buffers: shared hit=87551, temp read=14076 written=14074
-> Seq Scan on test crzgxb (cost=0.00…41892.88 rows=1264388 width=165) (actual time=0.033…372.691 rows=1264388 loops=1)
Buffers: shared hit=29249
-> Hash (cost=67246.62…67246.62 rows=47114 width=33) (actual time=576.678…576.678 rows=107355 loops=1)
Buckets: 65536 (originally 65536) Batches: 2 (originally 1) Memory Usage: 3918kB
Buffers: shared hit=58302, temp written=347
-> Seq Scan on tbl zc (cost=0.00…67246.62 rows=47114 width=33) (actual time=3.073…541.440 rows=107355 loops=1)
Filter: ((n_zt = 20) AND ((COALESCE(n_qxzc_zje, ‘0’::numeric) - n_je) > ‘100’::numeric))
Rows Removed by Filter: 403766
Buffers: shared hit=58302
Planning time: 4.308 ms
Execution time: 1491.610 ms
(17 rows)
创建一个函数

create or replace function db_test.f_test(a numeric ,b numeric) returns int as $$
declare v_re int;
begin
v_re :=coalesce(a,0) -b;
RETURN v_re;
end;

language plpgsql IMMUTABLE; --创建索引 --create index i_test1_qxzc_zje on db_test.test1(db_test.f_test(n_qxzc_zje,n_je)); create index i_test1_qxzc_zje_n_zt on db_test.test1(db_test.f_add(n_qxzc_zje,n_je),n_zt); --改写sql如下: select distinct crzgxB.c_jkdj_id from db_test.test crzgxB inner join db_test.test1 zc on zc.c_bh = crzgxB.c_ywid where zc.n_zt = 20 and db_test.f_test(zc.n_qxzc_zje,zc.n_je)<0 postgres=# explain (analyze,buffers) postgres-# select postgres-# distinct crzgxB.c_jkdj_id postgres-# from db_test.test crzgxB inner join db_test.test1 zc on zc.c_bh = crzgxB.c_ywid postgres-# where postgres-# zc.n_zt = 20 postgres-# and db_test.f_add(zc.n_qxzc_zje,zc.n_je)>100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ HashAggregate (cost=122609.15..122611.15 rows=200 width=132) (actual time=768.911..768.911 rows=0 loops=1) Group Key: crzgxb.c_jkdj_id Buffers: shared hit=43011, temp read=14076 written=14074 -> Hash Join (cost=74517.96..122317.78 rows=116548 width=132) (actual time=768.906..768.906 rows=0 loops=1) Hash Cond: (crzgxb.c_ywid = zc.c_bh) Buffers: shared hit=43011, temp read=14076 written=14074 -> Seq Scan on test crzgxb (cost=0.00..41892.88 rows=1264388 width=165) (actual time=0.008..184.230 rows=1264388 loops=1) Buffers: shared hit=29249 -> Hash (cost=73929.04..73929.04 rows=47114 width=33) (actual time=91.050..91.050 rows=107355 loops=1) Buckets: 65536 (originally 65536) Batches: 2 (originally 1) Memory Usage: 3918kB Buffers: shared hit=13762, temp written=347 -> Bitmap Heap Scan on test1 zc (cost=3591.94..73929.04 rows=47114 width=33) (actual time=17.254..64.301 rows=107355 loops=1) Recheck Cond: ((db_test.f_add(n_qxzc_zje, n_je) > 100) AND (n_zt = 20)) Heap Blocks: exact=13173 Buffers: shared hit=13762 -> Bitmap Index Scan on i_test1_qxzc_zje_n_zt (cost=0.00..3580.16 rows=47114 width=0) (actual time=14.806..14.806 rows=107355 loops=1) Index Cond: ((db_test.f_add(n_qxzc_zje, n_je) > 100) AND (n_zt = 20)) Buffers: shared hit=589 Planning time: 0.435 ms Execution time: 769.009 ms (20 rows) 创建函数索引可以走索引,但是这个方法太复杂了,而且增加了维护难度 生成列介绍 PG 12带有一个很棒的新功能--生成列(Generated Columns),也就是计算列,在之前的版本也可以实现,但需要定义函数和触发器,利用该功能可以更容易使用并可以提升性能 生成列是给表指定计算列,其数据可以根据其他列数据自动生成,当原数据更新时其自动更新 在PostgreSQL 12+中生成列作为内置功能,可以在创建或修改表时指定列作为生成列,指定其内容通过表达式自动填充,可以为简单基于其他进行数学运算,或更复杂的函数。其优势包括: 无需在插入或修改操作时需要应用代码负责生成数据,生成列作为计算列自动实现。 在频繁的select语句避免计算处理时间。因为插入或修改时计算列已经填充,因此查询无需临时计算,但需要额外空间进行存储。 因为更新原数据时,生成列数据自动更新,因此计算列的值正确性有了保障。 在PostgreSQL 12+中仅STORED类型生成列有效。其他数据库类型,VIRTUAL类型的生成列也有效,该实现更类似于视图方式实现,在数据返回时进行实时计算。这里不讨论两者的优劣,业务未来版本会支持两种方式。 生成列示例 下面创建表演示生成列。我们指定计算列自动计算利润,依据销售价格和采购价格,公示为: profit = ((sale_price - purchase_price) * quantity_sold) 创建交易表transactions: CREATE TABLE public.transactions ( transactions_sid serial primary key, transaction_date timestamp with time zone DEFAULT now() NOT NULL, product_name character varying NOT NULL, purchase_price double precision NOT NULL, sale_price double precision NOT NULL, quantity_sold integer NOT NULL, profit double precision NOT NULL GENERATED ALWAYS AS ((sale_price - purchase_price) * quantity_sold) STORED ); 插入示例数据,假设交易表存储咖啡店的交易信息: INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1); INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4); INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6); SELECT * FROM public.transactions; transactions_sid transaction_date product_name purchase_price sale_price quantity_sold profit 1 2021-09-13 17:06:12 House Blend Coffee 5.0 11.99 1 6.99 2 2021-09-13 17:06:12 French Roast Coffee 6.0 12.99 4 27.96 3 2021-09-13 17:06:12 BULK: House Blend Coffee, 10LB 40.0 100.0 6 360.0 返回结果: 图片 我们更新数据验证是否会自动变化: UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3; SELECT * FROM public.transactions; 返回结果: transactions_sid transaction_date product_name purchase_price sale_price quantity_sold profit 1 2021/9/13 17:06 House Blend Coffee 5 11.99 1 6.99 2 2021/9/13 17:06 French Roast Coffee 6 12.99 4 27.96 3 2021/9/13 17:06 BULK: House Blend Coffee, 10LB 40 95 6 330 与期望一致,这种机制确保计算列值的正确性,无需额外的应用程序代码负责实现。 注意:计算列(生成列)不能被直接插入或更新,否则会返回错误。 生成列可以创建索引 我们加压一些数据,然后观察查询,可以看到生成列上可以创建索引,并且查询生效 postgres=# insert into transactions(product_name,purchase_price,sale_price,quantity_sold) select 'ceshi',random()*100,random()*100,random()*10 from generate_series(1,100000); INSERT 0 100000 postgres=# select count(*) from transactions where profit >100; count ------- 53189 (1 row) postgres=# explain select * from transactions where profit >100; QUERY PLAN --------------------------------------------------------------------------------------------- Bitmap Heap Scan on transactions (cost=1129.16..3856.87 rows=53257 width=46) Recheck Cond: (profit > '100'::double precision) -> Bitmap Index Scan on i_t_transactions_profit (cost=0.00..1115.85 rows=53257 width=0) Index Cond: (profit > '100'::double precision) (4 rows) 总结 本文介绍了PostgreSQL 12+ 生成列,可以方便实现计算列。数据库中存储计算列,便不需要查询的时候再去计算,也可以在计算列上创建索引
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论