暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 数组里面的元素,模糊搜索,模糊查询,like,前后百分号,正则查询,倒排索引

digoal 2019-03-20
2570

作者

digoal

日期

2019-03-20

标签

PostgreSQL , 数组 , 元素模糊匹配 , 倒排索引 , gin


背景

PostgreSQL通过倒排索引,支持模糊查询。

《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》

如果你使用了数组类型,那么在数组里面的元素如何实现模糊查询呢?

同样可以使用PG的GIN倒排索引。

例子

1、新建一个数据库,为了支持中文模糊查询,请务必不要使用lc_ctype='C'的数据库,这个无法改,不能支持中文模糊查询。

db4=# create database newdb1 with template template0 encoding 'UTF8' lc_collate 'C' lc_ctype 'en_US.utf8'; CREATE DATABASE

2、创建pg_trgm插件,支持模糊查询。

db4=# \c newdb1 You are now connected to database "newdb1" as user "postgres". newdb1=# create extension pg_trgm; CREATE EXTENSION

3、新建测试表,文本数组

newdb1=# create table t (i text[]); CREATE TABLE

4、新建immutable函数,将数组转换为文本类型(因为表达式索引的表达式、函数、操作符等都必须是immutable的)

newdb1=# create or replace function imm_array_to_string (anyarray,text) returns text as $$ select array_to_string($1,$2); $$ language sql strict immutable; CREATE FUNCTION

5、创建索引

newdb1=# create index idx_t_1 on t using gin ( imm_array_to_string(i,',') gin_trgm_ops); CREATE INDEX

6、模糊查询语法测试,已支持索引检索。

```
newdb1=# explain select * from t where imm_array_to_string(i,',') like '你好PostgreSQL%';
QUERY PLAN


Bitmap Heap Scan on t (cost=14.35..22.33 rows=7 width=32)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..14.35 rows=7 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text)
(4 rows)

newdb1=# explain select * from t where imm_array_to_string(i,',') like '你好PostgreSQL%';
QUERY PLAN


Bitmap Heap Scan on t (cost=668.78..1051.63 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..668.71 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL%'::text)
(4 rows)

Time: 0.462 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') like '%你好PostgreSQL%';
QUERY PLAN


Bitmap Heap Scan on t (cost=557.68..940.53 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL%'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..557.61 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL%'::text)
(4 rows)

Time: 0.484 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') like '%你好PostgreSQL';
QUERY PLAN


Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL'::text)
(4 rows)

Time: 0.473 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '%你好PostgreSQL';
QUERY PLAN


Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '%你好PostgreSQL'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~
'%你好PostgreSQL'::text)
(4 rows)

Time: 0.918 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '你好PostgreSQL';
QUERY PLAN


Bitmap Heap Scan on t (cost=723.78..1106.63 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '你好PostgreSQL'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..723.71 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~
'你好PostgreSQL'::text)
(4 rows)

Time: 0.816 ms
newdb1=# explain select * from t where imm_array_to_string(i,',') ilike '你好%PostgreSQL';
QUERY PLAN


Bitmap Heap Scan on t (cost=612.68..995.53 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~~ '你好%PostgreSQL'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..612.61 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~~
'你好%PostgreSQL'::text)
(4 rows)

Time: 0.796 ms
```

包含任意元素:元素以xx开头

```
select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5;

不区分大小写

select * from t where imm_array_to_string(i,',') ~* '[,|^]d0dc1' limit 5;
```

包含任意元素:元素以xx结尾

```
select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5;

不区分大小写

select * from t where imm_array_to_string(i,',') ~* '19d0b[,|$]' limit 5;
```

包含任意元素:元素内包含xx

```
select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;

不区分大小写

select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;
```

性能测试

1、创建函数,生成随机字符串数组

create or replace function gen_rand_arr(int) returns text[] as $$ select array(select md5(random()::text) from generate_series(1,$1)); $$ language sql strict;

newdb1=# select gen_rand_arr(5); -[ RECORD 1 ]+----------------------------------------------------------------------------------------------------------------------------------------------------------------------- gen_rand_arr | {9d193327808319b0d66c12e29eaa33f6,5d761dad5b6e886eb87480d86d8640fb,07856d9f11b156bde20b527192c00361,35895b270c7665afd34d62f58220a83a,309b4d117f07974cb938bab890a7f327}

2、写入大量数据

```
vi test.sql
insert into t select gen_rand_arr(8) from generate_series(1,100);

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 8 -j 8 -T 120 newdb1
```

3、记录数

```
newdb1=# select count(*) from t;
count


2816900
(1 row)
```

4、收集统计信息,将索引的pending list合并。

newdb1=# vacuum (analyze,verbose) t;

5、查询效率

包含任意元素:元素以xx开头

```
select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5;

newdb1=# explain select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5;
QUERY PLAN


Limit (cost=224.38..231.17 rows=5 width=312)
-> Bitmap Heap Scan on t (cost=224.38..607.23 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~ '[,|^]d0dc1'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..224.31 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~ '[,|^]d0dc1'::text)
(5 rows)

newdb1=# select * from t where imm_array_to_string(i,',') ~ '[,|^]d0dc1' limit 5;
i

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

{1a2dfc55b7b988a2483c6243d8c2ecf9,d0dc1a12d874d6a93fe769982d0095d4,362c933460bf5a5c4920153b4dc7de69,cef755962c451ccb892ff23405763a14,82466b1477f74e136b4c0d38fcee703e,26e04120f3d16a3f6763e5d95bfa6069,10dccf2d7372f88437dbc501c2a7181d,2b9b
8fef33d8ed03178d77b7ab0eabe1}
{2a6beb43a3dcb0f880f564b402931ece,6f623041e6291ddef51c2b353ff9b5c3,d0dc13e47510225b130de68332a5cf82,5abb5ec622a3bcde0e28e8ca9df9b0b2,123440fe5c6d1000223aaa667219fb02,7f25ad33f3114ded94c86c12681450f5,fa6748e69665fa5c1b5b497457bd3e59,40f8
3894401e278d72ffe923cd106f69}
{5a826d56b29a6d35df7c026875f4d88c,ca998171aeadf1a9aa4ce5cf29216984,27cae196fbf43d352828271b236535d2,8d31bfd2dd50d0321de6d88a441ffdb0,fc405e0388c6420d647421cb31bc29d7,096af6a4e8b9e945392b8419ffa987b6,2243b26416b365ece9287f3c2251a1bd,d0dc
146dff328d79311b120efd91ce59}
{708d002fc1c940ae2f89b0dee1420f26,bcd71f398c104ed22b46afe551822f5c,d0dc1a1929ecf7df838012d40accf4dc,49627dd60c3639fcb5d4e5414a8e9417,5bf98eb3cf0f2740f8f25370972e63e8,df8b198bfe0fc1a465dc875fce4c7651,12525bdd07fcde399a5fe7673ba1f139,61f2
d94d7a710670f7115bab85502893}
{963c013b3569bf5fbe3f68eeb51117d4,8d572873fd6c828c50be6a6e30c9bba5,d0dc1c96169c37d05b4979e7a4195005,6195d7bebf0a92380c1896ac9f2ad9b6,1ef2fba453aa86e571b338c188ddfea6,683b48f7bfcccd030700f00e28430b07,b413909126a0f427efc7ed64d25c4dd3,6a9f
3ded525226d5d689adc6d611b9db}
(5 rows)

Time: 23.184 ms
```

包含任意元素:元素以xx结尾

```
select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5;

newdb1=# explain select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5;
QUERY PLAN


Limit (cost=224.38..231.17 rows=5 width=312)
-> Bitmap Heap Scan on t (cost=224.38..607.23 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~ '19d0b[,|$]'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..224.31 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~ '19d0b[,|$]'::text)
(5 rows)

newdb1=# select * from t where imm_array_to_string(i,',') ~ '19d0b[,|$]' limit 5;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {68e92a1eae2aab0283c003bf1abd5f90,f9dc4ce34dd4cac3fa8f26e816705227,41200ccb84d0dc364e0127b681010858,8636f55c5bfc09bbe3ccb2acefa19d0b,eb3579e6b44896bd04e1edade3f44de6,88a6c079f261d45f7ed9793bfd877bff,af4bc563d1be87bf9fb6953269fb7466,da73c8735908799ab1b168b24a7b73ab}
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {0661d60b8dffee5f51c6b6d7403aeceb,ba6434c35e17dc53e0c981136128a3d9,12ecc4a208c48bfc7870efcf01dc0a12,e9ab9565953b828e727dfb62222061d4,41d1d82a7a397de69d0e0436ea05d639,45f6564f866de964599850fc12119d0b,9e6415c9da0c83375926940684b330bd,a5981c3baf438e3bd66df816c3cd85fa}
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {dce1c900a5a4b8ff88694e4dd1f7727b,92d2f6b5e46139f854f0f0f795b04e6f,e6aae989ef89fcbe2aa045a250c19f0f,dea731b2a24ce4e91d9a5b83a0e35bd2,74aec9033979243c149b029a1c84948d,1f0021e7cd8ea1a32d480068d2aac632,bdb3d27aef3d8d9a8c87c52c25c19d0b,cf02176aa82cb369ba55de698047780e}
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {1866ab11ebacdc621c320f3376fe4b5f,efbcd77cb4f44e562bd19a6a50a38960,76cbcbc1c0f05ec4121ef20962baaaba,7ac92fb5785eda268b456675b7bf7dd9,1fb6e2ead6f5c9fa7959572d61e9a746,7b08eac90dcba3857a53eeeb43e19d0b,c1e7c7d29329243cf8c0a1bee25b44bf,40c27c32a3961f2b9182efbd6eb51ec5}
-[ RECORD 5 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
i | {fd8103d4db72cdcfb4768febd61abeb4,b128ccb9e419f3b4be6c35b6e3b7ccd8,5e39686fd5c527e6d1905c31d1cabfa6,6e37d73b2838c5f74134a64d683b602b,1d6587aa90385e4f1f421cec319d4f7d,423b4ab4eeb87a15f3b164a8ecc69366,4a723c53547effc20c3a3b5916a19d0b,5d0db5a0b90ec54cc474456576156d87}

Time: 23.950 ms
```

包含任意元素:元素内包含xx

```
select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;

newdb1=# explain select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;
QUERY PLAN


Limit (cost=113.28..120.07 rows=5 width=312)
-> Bitmap Heap Scan on t (cost=113.28..496.13 rows=282 width=312)
Recheck Cond: (imm_array_to_string(i, ','::text) ~ 'abce'::text)
-> Bitmap Index Scan on idx_t_1 (cost=0.00..113.21 rows=282 width=0)
Index Cond: (imm_array_to_string(i, ','::text) ~ 'abce'::text)
(5 rows)

newdb1=# select * from t where imm_array_to_string(i,',') ~ 'abce' limit 5;
i

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

{adfa0cebeef5e67f15d1428c1c42f9ec,95abce5cece413de8fbc2a90846e37bc,86b367a1d39b0de39b2259d368d7fe2e,02d1ffd0f89676139b88fcd1422bb9c5,374a49309103cfd1ff4deeb61d650e32,67da44f5ee6ee28f960d847c72bfdea5,78d39fab51e2ee91756c9c5dd955cc85,cca5
f52519170de192873ac5caf4a4c9}
{7e45e4c81c9f8a184f1dab58adce061a,281bf5ac39c801d65ce5e5939a43d852,42c4a6a28d851653001b24a5296803e2,21cad5d25b736c70369fabcec9336c50,be9260c156f7165781db45419c1b64a8,20ed1a34337781b7bfbd6714d0fe89a5,9df8459f5b68ba52dd20c96baef200a6,d2b5
052977df6c4377b963d9dfd80914}
{8f99bf47bdbeb6972e564b2ef0abce9f,e705fbcc909792a7ef9585e2a381ed6d,015eafcb49e4558ab99733b95c0a895b,34b2762f16a85f1a10bb4e1c21228591,2e86f833ca4f5208e84a57f0ef1e945e,f1262c8a852ab4e174627cf5e091f91c,888f4f36bfb0e4274e576f1afc831117,fe1c
3638c18cd54da49311c09a279e5d}
{95926e58ed85b5e40a748cf5894b1ed5,571a4aa15aad73036fd570356e16bd60,af23841c7dc79ce8c4a10d21e914f7fe,8306d637609dab1e6d8aced9d29cb172,c8ca176600bffa9de6f1d0e20c0aa490,f29522edfdbcf1b00d62a978ee1182fd,19d77522d9b1c85b419fdca00eb2320c,991d
b9143192dabcee56b0bd5370b523}
{e1378cb77c91879455302cda26cf8c84,0605fb91660d844acf975aa6a2501d91,1013e3eab01c7aea8f4f2ccd1dcffd4f,5f63e3543e8a0850a456c0d83d53f0f3,b3abceebb447b11314e841d26922dd59,27b6139b685aab9cfeb699d12eb536df,7cee2103fc14869cc7e49f41a191da54,9ca8
6b54ed3f97870f15ddcde27e4de3}
(5 rows)

Time: 28.734 ms
```

中文,单字,双字匹配,请参考

《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》

参考

《PostgreSQL 模糊查询最佳实践 - (含单字、双字、多字模糊查询方法)》

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论