作者简介
张晋 来自沈阳美行科技
背景
模拟
数据库表中约有500w条车牌号记录,对表中的车牌号进行模糊查询。即支持 car_id like ‘%XXXX%XXX%’ 查询
---创建表
create table t_car (id int , car_id text);
--插入500万车牌数据
insert into t_car select generate_series(1,5000000), (array['辽A','辽B','吉A','吉B','黑A','黑B'])[floor(random()*6+1)] || substring(md5(random()::text),0,6);
--查看数据
select * from t_car limit 5;
id | car_id
----+----------
1 | 吉A43bb9
2 | 吉B19b64
3 | 辽Afb04e
4 | 吉Bcf90c
5 | 辽Be67df
(5 行记录)
索引
· 顺序扫描
explain analyze verbose select * from t_car where car_id = '辽Be67df';
QUERY PLAN
-----------------------------------------------------------------
Gather (cost=1000.00..54069.87 rows=2 width=14) (actual time=0.458..268.782 rows=4 loops=1)
Output: id, car_id
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on public.t_car (cost=0.00..53069.67 rows=1 width=14) (actual time=140.151..253.061 rows=1 loops=3)
Output: id, car_id
Filter: (t_car.car_id = '辽Be67df'::text)
Rows Removed by Filter: 1666665
Worker 0: actual time=246.618..246.619 rows=0 loops=1
Worker 1: actual time=173.812..251.916 rows=1 loops=1
Planning time: 0.174 ms
Execution time: 268.820 ms
(12 行记录)
时间:269.684 ms
· btree
创建btree类型索引
create index btree_index_01 on t_car using btree (car_id);
等值查询,结果还是相当给力
postgres=# explain analyze verbose select * from t_car where car_id = '辽Be67df';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Index Scan using btree_index_01 on public.t_car (cost=0.43..3.77 rows=2 width=14) (actual time=0.047..0.057 rows=4 loops=1)
Output: id, car_id
Index Cond: (t_car.car_id = '辽Be67df'::text)
Planning time: 0.218 ms
Execution time: 0.091 ms
(5 行记录)
时间:0.971 ms
后模糊查询,不尽人意。不符合预期。
postgres=# explain analyze verbose select * from t_car where car_id like '辽Be67df%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..54119.47 rows=498 width=14) (actual time=0.426..281.444 rows=4 loops=1)
Output: id, car_id
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on public.t_car (cost=0.00..53069.67 rows=208 width=14) (actual time=138.857..270.492 rows=1 loops=3)
Output: id, car_id
Filter: (t_car.car_id ~~ '辽Be67df%'::text)
Rows Removed by Filter: 1666665
Worker 0: actual time=148.488..267.226 rows=2 loops=1
Worker 1: actual time=268.058..268.058 rows=0 loops=1
Planning time: 0.153 ms
Execution time: 281.481 ms
(12 行记录)
时间:282.275 ms
以上btree索引没有起到作用的原因,是因为在创建索引时,默认的opclass为等值查询。
详情:https://www.postgresql.org/docs/10/indexes-opclass.html
重新创建btree索引
drop index btree_index_01;
create index btree_index_01 on t_car using btree (car_id text_pattern_ops);
这次查询结果符合预期
explain analyze verbose select * from t_car where car_id like '辽Be67df%';
QUERY PLAN
-----------------------------------------------------------------
Index Scan using btree_index_01 on public.t_car (cost=0.43..2.66 rows=498 width=14) (actual time=0.050..0.073 rows=4 loops=1)
Output: id, car_id
Index Cond: ((t_car.car_id ~>=~ '辽Be67df'::text) AND (t_car.car_id ~<~ '辽Be67dg'::text))
Filter: (t_car.car_id ~~ '辽Be67df%'::text)
Planning time: 0.473 ms
Execution time: 0.106 ms
(6 行记录)
时间:1.407 ms
前模糊查询还是不行
explain analyze verbose select * from t_car where car_id like '%辽Be67df';
QUERY PLAN
----------------------------------------------------------------- Gather (cost=1000.00..54119.47 rows=498 width=14) (actual time=0.476..309.504 rows=4 loops=1)
Output: id, car_id
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on public.t_car (cost=0.00..53069.67 rows=208 width=14) (actual time=121.528..297.927 rows=1 loops=3)
Output: id, car_id
Filter: (t_car.car_id ~~ '%辽Be67df'::text)
Rows Removed by Filter: 1666665
Worker 0: actual time=200.075..294.789 rows=1 loops=1
Worker 1: actual time=164.486..295.015 rows=1 loops=1
Planning time: 0.139 ms
Execution time: 309.536 ms
(12 行记录)
时间:310.278 ms
解决前模糊查询的方法,反转查询字段。在逻辑上变成后模糊查询。
create index btree_index_02 on t_car using btree (reverse(car_id) text_pattern_ops);
· gin
真正的支持模糊查询
创建扩展
create extension pg_trgm;
创建gin索引
create index gin_index_01 on t_car using gin(car_id gin_trgm_ops);
模糊查询速度也能飞
explain analyze verbose select * from t_car where car_id like '%辽Be67df%';
QUERY PLAN
-----------------------------------------------------------------
Bitmap Heap Scan on public.t_car (cost=36.86..579.80 rows=498 width=14) (actual time=3.936..4.018 rows=4 loops=1)
Output: id, car_id
Recheck Cond: (t_car.car_id ~~ '%辽Be67df%'::text)
Heap Blocks: exact=4
-> Bitmap Index Scan on gin_index_01 (cost=0.00..36.73 rows=498 width=0) (actual time=3.900..3.900 rows=4 loops=1)
Index Cond: (t_car.car_id ~~ '%辽Be67df%'::text)
Planning time: 2.918 ms
Execution time: 4.359 ms
explain analyze verbose select * from t_car where car_id like '%辽Be6%df%';
QUERY PLAN
-----------------------------------------------------------------
Bitmap Heap Scan on public.t_car (cost=17.06..560.00 rows=498 width=14) (actual time=5.958..15.353 rows=35 loops=1)
Output: id, car_id
Recheck Cond: (t_car.car_id ~~ '%辽Be6%df%'::text)
Rows Removed by Index Recheck: 3273
Heap Blocks: exact=3098
-> Bitmap Index Scan on gin_index_01 (cost=0.00..16.93 rows=498 width=0) (actual time=5.172..5.172 rows=3308 loops=1)
Index Cond: (t_car.car_id ~~ '%辽Be6%df%'::text)
Planning time: 0.224 ms
Execution time: 15.462 ms
(9 行记录)
扩展阅读
rum(https://github.com/postgrespro/rum?spm=a2c4e.11153940.blogcont111793.51.50575bf0HjdIsl) 是一个索引插件,由Postgrespro开源,适合全文检索,属于GIN的增强版本。
增强包括:
1、在RUM索引中,存储了lexem的位置信息,所以在计算ranking时,不需要回表查询(而GIN需要回表查询)。
2、RUM支持phrase搜索,而GIN无法支持。
3、在一个RUM索引中,允许用户在posting tree中存储除ctid(行号)以外的字段VALUE,例如时间戳。
如果这种需求多了还是考虑elasticsearch吧
zombodb是PostgreSQL与ElasticSearch结合的一个索引接口,可以直接读写ES。
请点击文章底部“阅读原文”查看原文内容。
PostgreSQL中文社区欢迎广大技术人员投稿
投稿邮箱:press@postgres.cn
最后修改时间:2021-03-23 09:33:04
文章转载自PostgreSQL中文社区,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。