PostgreSQL 电商业务(任意维度商品圈选应用) - json包range数组的命中优化 - 展开+索引优化

digoal 2018-07-03

252

作者

digoal

日期

2018-07-03

背景

电商，任意维度商品圈选应用，其中一个查询请求是这样的：

求 "某个国家、某个时间点、调价+折扣后的价格" 落在某个价格范围的商品。

首先需要有的要素包括：

1、商品ID

2、不同国家的商品价格

3、商品原价

4、商品日常价

5、不同时间段的价格折扣

6、调价系数

例子

1、表结构设计

create table t_item ( id int8 primary key, -- 商品ID country jsonb, -- 每个国家的价格取值范围 price jsonb, -- 每个时间段的折扣，（时间可能重叠，根据优先级LIMIT 1个折扣） ratio float4 -- 调价比例 -- 其他属性scalar类型, 使用rum或gin索引，本文末尾有案例 );

2、数据样本

insert into t_item values ( 1, jsonb '{"global":{"min": 100, "max":200}, "china":{"min": 120, "max":260}, "us":{"min": 170, "max":300}}', jsonb '{"100|[1514764800,1515542400)":0.4, "200|[1514764800,1515542400)":0.9, "0|[-62135596800,253402214400)":1}', 0.1 );

其中时间区间可以使用epoch表示

```
postgres=# select extract(epoch from date '2018-01-01');
date_part

1514764800
(1 row)

postgres=# select extract(epoch from date '2018-01-10');
date_part

1515542400
(1 row)

postgres=# select extract(epoch from date '0001-01-01');
date_part

-62135596800
(1 row)

postgres=# select extract(epoch from date '9999-12-31');
date_part

253402214400
(1 row)
```

3、由于不同时间段的折扣不一样，并且优先级也不一样，所以，使用一个函数来获取某个时间点的这块。

当输入的时间点有多个时间区间包括它时，取优先级最高的那个折扣，并返回，如果没有任何匹配的时间区间，则返回1。

create or replace function get_discount( jsonb, -- 每个时间段的折扣字段 int8 -- epoch 时间值 ) returns float4 as $$ declare res float4; begin -- select split_part(key,'|',1) as priority, split_part(key,'|',2) as ts, value from jsonb_each_text($1); select value into res from jsonb_each_text($1) where split_part(key,'|',2)::int8range @> $2 order by split_part(key,'|',1)::numeric desc limit 1; if found then return res; end if; return 1; end; $$ language plpgsql strict parallel safe;

例子

```
postgres=# select get_discount(jsonb '{"100|[1514764800,1515542400)":0.4, "200|[1514764800,1515542400)":0.9, "0|[-62135596800,253402214400)":1}', 100000);
get_discount

(1 row)

postgres=# select get_discount(jsonb '{"100|[1514764800,1515542400)":0.4, "200|[1514764800,1515542400)":0.9, "0|[-62135596800,253402214400)":1}', 1515542200);
get_discount

0.9

(1 row)
```

4、不同的国家，价格不一样，输入国家编码，返回对应国家的价格，如果输入的编码在JSONB中没有，则返回global的价格。

create or replace function get_price( jsonb, -- 国家价格区间 text -- 国家编码 ) returns float8 as $$ select case when ($1->$2->>'max')::float8 is not null then ($1->$2->>'max')::float8 else ($1->'global'->>'max')::float8 end; $$ language sql strict parallel safe;

例子

```
postgres=# select get_price(jsonb '{"global":{"min": 100, "max":200}, "china":{"min": 120, "max":260}, "us":{"min": 170, "max":300}}', 'hello');
get_price

(1 row)

postgres=# select get_price(jsonb '{"global":{"min": 100, "max":200}, "china":{"min": 120, "max":260}, "us":{"min": 170, "max":300}}', 'china');
get_price

(1 row)
```

5、求 "某个国家、某个时间点、调价+折扣后的价格" 落在某个价格范围的商品。

SQL

```
postgres=# select * from t_item where get_price(country, 'china') * get_discount(price, 1515542200) * (1+ratio) < 100;
id | country | price | ratio
----+---------+-------+-------
(0 rows)

postgres=# select * from t_item where get_price(country, 'china') * get_discount(price, 1515542200) * (1+ratio) < 1000;
id | country | price | ratio
----+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+-------
1 | {"us": {"max": 300, "min": 170}, "china": {"max": 260, "min": 120}, "global": {"max": 200, "min": 100}} | {"100|[1514764800,1515542400)": 0.4, "200|[1514764800,1515542400)": 0.9, "0|[-62135596800,253402214400)": 1} | 0.1
(1 row)
```

6、压测

写入5.3亿数据

insert into t_item select * from t_item ; ..... insert into t_item select * from t_item ;

单表约 186 GB

7、使用并行计算

```
postgres=# alter function get_price ;
ALTER FUNCTION
postgres=# alter function get_discount parallel safe;
ALTER FUNCTION

postgres=# set max_parallel_workers_per_gather =56;
SET
postgres=# alter table t_item set (parallel_workers =56);
ALTER TABLE

postgres=# set min_parallel_table_scan_size =0;
SET
postgres=# set min_parallel_index_scan_size =0;
SET
postgres=# set parallel_setup_cost =0;
SET
postgres=# set parallel_tuple_cost =0;
SET
```

8、最差的情况，没有一条命中的数据，耗时为处理完5.3亿条记录的耗时

```
postgres=# explain select * from t_item where get_price(country, 'china') * get_discount(price, 1515542200) * (1+ratio) < 100 ;
QUERY PLAN

Gather (cost=1000.00..51024073.42 rows=178956971 width=332)
Workers Planned: 32
-> Parallel Seq Scan on t_item (cost=0.00..33127376.32 rows=5592405 width=332)
Filter: (((get_price(country, 'china'::text) * get_discount(price, '1515542200'::bigint)) * ('1'::double precision + ratio)) < '100'::double precision)
(4 rows)

postgres=# explain analyze select * from t_item where get_price(country, 'china') * get_discount(price, 1515542200) * (1+ratio) < 100 ;
QUERY PLAN

Gather (cost=1000.00..47285151.00 rows=178956971 width=332) (actual time=444448.106..444448.106 rows=0 loops=1)
Workers Planned: 56
Workers Launched: 56
-> Parallel Seq Scan on t_item (cost=0.00..29388453.90 rows=3195660 width=332) (actual time=444292.055..444292.055 rows=0 loops=57)
Filter: (((get_price(country, 'china'::text) * get_discount(price, '1515542200'::bigint)) * ('1'::double precision + ratio)) < '100'::double precision)
Rows Removed by Filter: 9418788
Planning Time: 0.072 ms
Execution Time: 462253.627 ms
(8 rows)
```

56 core 虚拟机，耗时462秒。

索引优化

将数据展开为两张表(其中一张可以使用原始表，不需要新建)

其中表1的数据，需要业务方维护，当原价、折扣、调价系数发生变化时，需要实时的更新这里的记录。

表1

折扣区间展开表：

商品ID 国家时间区间折后价

```
create table t_item1 (
id int8,
country text,
ts int8range,
price float8,
exclude using gist (id with =, country with =, ts with &&) -- 排他约束，同一个商品ID同一个国家不允许有TS相交的折扣数据 );

create extension IF NOT EXISTS btree_gist;

create index idx_t_item1_1 on t_item1 using gist (country,price,ts);
```

表2

常规价格查原始表：

商品ID 国家日常价原价调价比例

```
create table t_item2 (
id int8,
country text,
price1 float8,
price2 float8,
ratio float4,
primary key (country, id)
);

create index idx_t_item2_1 on t_item2 (country, (least(price1ratio,price2ratio)));
```

SQL

求 "某个国家、某个时间点、调价+折扣后的价格" 落在某个价格范围的商品。

```
select id from t_item1 where country ='china' and price < 50::float8 and ts @> 10000000::int8
union
select id from t_item2 where country='china' and (least(price1ratio,price2ratio)) < 50;

                                                       QUERY PLAN

HashAggregate (cost=5.91..5.94 rows=3 width=8) Group Key: t_item1.id -> Append (cost=0.14..5.90 rows=3 width=8) -> Index Scan using idx_t_item1_1 on t_item1 (cost=0.14..2.37 rows=1 width=8) Index Cond: ((country = 'china'::text) AND (price < '50'::double precision) AND (ts @> '10000000'::bigint)) -> Index Scan using idx_t_item2_1 on t_item2 (cost=0.15..3.49 rows=2 width=8) Index Cond: ((country = 'china'::text) AND (LEAST((price1 * ratio), (price2 * ratio)) < '50'::double precision)) (7 rows) ```