PostgreSQL ADHoc(任意字段组合)查询(rums索引加速) - 非字典化，普通、数组等组合字段生成新数组

digoal 2018-05-18

191

作者

digoal

日期

2018-05-18

背景

《PostgreSQL ADHoc(任意字段组合)查询与字典化 (rum索引加速) - 实践与方案1》

这篇文档介绍了PostgreSQL AdHOC加速的原理，利用rum索引，精准搜索任意字段组合。10亿级数据，任意组合查询，RT低至毫秒级，TPS达到万级。文章提到了数据字典化，将多个字段转换为一个大数组，然后利用RUM索引加速的方法。

如果你嫌麻烦，可以有更加简单的方法，比如我们的表字段中本身就有普通字段，还有数组字段，多种组合，并且你不想字典化，然后有有任意字段组合的包含，相交，等值组合查询，（更复杂的ADhoc查询）怎么办呢？

利用UDF，把要参与ADHOC(等值、包含、相交等组合条件)搜索的字段们组合起来，创建表达式RUM索引即可。

例子

1、测试表结构

create table test ( column1 varchar, column2 int[], column3 int[], column4 text );

目标

将数据转换为这样的数组

[column1_val, column2_val1, column2_val2,..., column3_val1, column3_val2,..., column4_val]

然后针对这个数组来组合查询（利用rum索引精准搜索）。

2、创建一个函数，给数组添加前缀，返回一个TEXT数组

postgres=# create or replace function f_array_prefix(text, anyarray) returns text[] as $$ select array(select $1||unnest($2)); $$ language sql strict immutable;

效果

```
postgres=# select f_array_prefix('abc_', array[1,2,3]);
f_array_prefix

{abc_1,abc_2,abc_3}
(1 row)

postgres=# select f_array_prefix('abc_', array['a','b','c']);
f_array_prefix

{abc_a,abc_b,abc_c}
(1 row)
```

2.1、扩展知识，解读一个带前缀的数组的指定前缀的后缀。

``` postgres=# create or replace function get_suffix(text[], text) returns text[] as $$ select array_remove(array(select substring(unnest, $2||'(.*)') from unnest($1)),null); $$ language sql strict immutable; CREATE FUNCTION

postgres=# select get_suffix(array['abc_1','abc_2','t_2','t_123'], 't_'); get_suffix

{2,123} (1 row) ```

3、创建表达式函数，将test表参与ADHoc查询的字段们组合成一个新的TEXT数组

create or replace function f_search(varchar, int[], int[], text) returns text[] as $$ select array_remove( array_append( array_cat( array_cat( array_append(array[]::text[], 'column1_'||$1), -- 初始数组为空 f_array_prefix('column2_', $2) ), f_array_prefix('column3_', $3) ), 'column4_'||$4 ), null); $$ language sql CALLED ON NULL INPUT immutable;

效果

```
postgres=# select f_search('abcde', array[1,2,3], array[5,6,7], 'hello');
f_search

{column1_abcde,column2_1,column2_2,column2_3,column3_5,column3_6,column3_7,column4_hello}
(1 row)
```

4、创建一个函数，生成随机数组：1万个取值空间, 20个随机值.

create or replace function gen_rand() returns int[] as $$ select array(select (10000*random())::int from generate_series(1,20)); $$ language sql strict volatile;

效果

```
postgres=# select gen_rand();
gen_rand

{6714,935,1593,8801,4097,5959,2059,3306,8710,4663,8671,7999,9122,4405,8874,236,822,6524,8093,8368}
(1 row)

postgres=# select gen_rand();
gen_rand

{3640,5125,5307,4672,1943,9987,6141,8813,6347,6007,9652,3061,6942,1245,1862,1039,7204,3921,4345,5914}
(1 row)
```

5、生成100万测试数据

insert into test select md5(random()::text), gen_rand(), gen_rand(), md5(random()::text) from generate_series(1,1000000);

6、创建表达式rum索引

create index idx_test_1 on test using rum (f_search(column1, column2, column3, column4) rum_anyarray_ops);

7、表和索引大小如下

8、对等查询例子

```
explain select * from test where f_search(column1, column2, column3, column4) @> array['column2_1', 'column3_5'];

等价于

explain select * from test where column2 @> array[1] and column3 @> array[5];
```

```
explain select * from test where f_search(column1, column2, column3, column4) && array['column2_1', 'column3_5'];

等价于

explain select * from test where column2 @> array[1] or column3 @> array[5];
```

```
explain select * from test
where
f_search(column1, column2, column3, column4) @> array['column2_1', 'column3_5', 'column1_abc']
or
f_search(column1, column2, column3, column4) @> array['column2_2', 'column3_5', 'column1_abc']
or
f_search(column1, column2, column3, column4) @> array['column2_3', 'column3_5', 'column1_abc'];

等价于

explain select * from test where column2 && array[1,2,3] and column3 @> array[5] and column1='abc';
```

9、例子

任意字段组合, 精准检索，1毫秒内响应。

```
postgres=# explain (analyze,verbose,timing,costs,buffers)
select * from test
where -- 表达式查询
f_search(column1, column2, column3, column4) @> array['column2_1', 'column3_5'];

                                                      QUERY PLAN

Index Scan using idx_test_1 on public.test (cost=12.00..43.64 rows=25 width=268) (actual time=0.511..0.530 rows=10 loops=1)
Output: column1, column2, column3, column4
Index Cond: (f_search(test.column1, test.column2, test.column3, test.column4) @> '{column2_1,column3_5}'::text[])
Buffers: shared hit=20
Planning Time: 0.245 ms
Execution Time: 0.548 ms
(6 rows)
```