PostgreSQL中的虚拟索引

飞象数据 2019-07-23

1167

在最近的会议上，我们收到了过去使用Oracle并且现在使用PostgreSQL的用户发来的一个奇怪的问题：“我们在PostgreSQL中有虚拟索引吗 ” 。这个问题的答案是肯定的。虚拟的真正含义是想象的、不真实的。我们可以试着为PostgreSQL的优化器创建虚构的索引，这些索引在现实中并不存在。我们来看一些例子。

虚拟索引有什么用？

大多数关系数据库中SQL的执行时间取决于执行的成本。优化器选择具有最低成本的计划，从而使用该计划考虑进一步的阶段，例如执行和获取。优化编写良好的SQL的最简单的方法之一是使用适合该查询的适当索引。索引可能会减少从磁盘中获取的页面数，并且可能会根据其大小（<<<表大小）存在于缓存中。所以，对于希望调优SQL的管理员和开发人员来说，索引始终是一个容易实现的目标。

但通常开发人员希望看一个索引是否真的有用。唯一可用的选项是创建索引并检查查询是否能够使用它缩减成本。如果它在一个测试环境中，或者一个规模不大的表，那么创建索引可能没问题，但是由于某种原因你想看一个索引是否对SQL有帮助而没有实际创建它，那么可以测试并尝试用此扩展来创建虚拟索引。

用于虚拟索引的PostgreSQL扩展

为了实现在PostgreSQL中创建虚构索引的功能，我们需要使用名为 hypopg的扩展。此扩展从PostgreSQL 9.2的PGDG存储库中可用，并一直工作到最新的活动版本PostgreSQL 11.3。为了创建这个扩展，你可以根据你的linux发行版使用yum或apt，或者从源码编译它。

安装hypopg

在RedHat/CentOS中

# yum install hypopg
复制

对于Debian/Ubuntu，你必须确保拥有名为postgresql-server-dev-X的开发包，其中X为主要版本。

# apt install postgresql-server-dev-11 (Example for PostgreSQL 11)
# apt install postgresql-11-hypopg
复制

**创建扩展**

# su - postgres -c "psql -d percona -c 'CREATE EXTENSION hypopg WITH SCHEMA myextensions'"
CREATE EXTENSION
复制

为了向你展示由hypopg创建的函数列表，我在不同的模式中创建了此扩展。这是列表：

percona=# select proname from pg_proc where pronamespace IN 
(select oid from pg_namespace where nspname = 'myextensions');
proname
----------------------
hypopg
hypopg_create_index
hypopg_drop_index
hypopg_get_indexdef
hypopg_list_indexes
hypopg_relation_size
hypopg_reset
(7 rows)
复制

hypopg的函数

hypopg_create_index: 此函数用于创建虚拟索引。我们不必指定索引的名称，因为此函数无论如何都会忽略它。

percona=# SELECT * FROM myextensions.hypopg_create_index('CREATE INDEX btree_hypo_idx on foo.products USING BTREE(quantity)'); 
indexrelid  | indexname 
------------+------------------------------------ 
16665       | <16665>btree_foo_products_quantity 
(1 row)
复制

hypopg_drop_index: 此函数用于在测试时删除虚拟索引。一旦断开会话，索引就会自动被删除，因为它只在该会话的私有内存中被创建。

percona=# select * from myextensions.hypopg_drop_index(16687);
 hypopg_drop_index 
-------------------
 t
(1 row)
复制

hypopg_get_indexdef: 我们使用此函数创建虚拟索引的定义。

percona=# select * from hypopg_get_indexdef(16713);
                 hypopg_get_indexdef                 
-----------------------------------------------------
 CREATE INDEX ON foo.products USING btree (quantity)
(1 row)
复制

hypopg_list_indexes: 此函数帮助我们列出我们在会话中创建的所有虚拟索引。

percona=# select * from hypopg_list_indexes();
 indexrelid |             indexname              | nspname | relname  | amname 
------------+------------------------------------+---------+----------+--------
      16713 | <16713>btree_foo_products_quantity | foo     | products | btree
(1 row)
复制

hypopg_relation_size: 此函数帮助我们估算索引大小，因此我们可以知道这个索引占用的大概空间量。

percona=# CREATE INDEX idx1 ON foo.products (product_sku);
CREATE INDEX
percona=# \di+ idx1
                         List of relations
 Schema | Name | Type  |  Owner   |  Table   |  Size  | Description 
--------+------+-------+----------+----------+--------+-------------
 foo    | idx1 | index | postgres | products | 631 MB | 
(1 row)

percona=# SELECT * FROM myextensions.hypopg_create_index('CREATE INDEX on foo.products USING BTREE(product_sku)');
 indexrelid |               indexname               
------------+---------------------------------------
      16718 | <16718>btree_foo_products_product_sku
(1 row)

percona=# select * from pg_size_pretty(hypopg_relation_size(16718));
 pg_size_pretty 
----------------
 653 MB
(1 row)
复制

hypopg_reset: 此函数删除我们在单个命令中创建的所有虚拟索引。

percona=# select * from hypopg_list_indexes();
 indexrelid |               indexname               | nspname | relname  | amname 
------------+---------------------------------------+---------+----------+--------
      16715 | <16715>btree_foo_products_quantity    | foo     | products | btree
      16716 | <16716>btree_foo_products_product_sku | foo     | products | btree
(2 rows)

percona=# select * from hypopg_reset();
 hypopg_reset 
--------------
 
(1 row)

percona=# select * from hypopg_list_indexes();
 indexrelid | indexname | nspname | relname | amname 
------------+-----------+---------+---------+--------
(0 rows)
复制

虚拟索引如何在现实中发挥作用？

当你使用hypopg_create_index
函数创建虚拟索引时，它会在我们连接的私有内存中创建索引。实际上没有触及任何目录表和原始表。查看我们是否可以从该索引中受益的唯一方法是运行EXPLAIN<QUERY> 。但是，如果你希望运行执行该SQL并且提供运行时间统计信息的EXPLAIN ANALYZE
，则它将无法使用该虚拟索引。这是因为现实中不存在虚拟索引。

测试虚拟索引

在测试虚拟索引时，我想检查是否存在无效的情况。因为我们使用此扩展的主要原因是测试优化器是否可以识别我们希望创建的索引，所以我考虑测试PostgreSQL中最常用索引的所有类型，例如BTREE,HASH,GIN,GiST,BRIN和LOOM索引。目前仅提到BTREE索引适用于虚拟索引。

出于这个原因，我创建了下面的表，它对于实时用例可能没有太大的意义，但它帮我们测试那些我们可以创建的所有类型的索引。

percona=# CREATE TABLE foo.products (product_sku character(8), quantity int, location point, comments text, 
comments_tsv tsvector, assembled_days integer[], manufactured_date timestamptz);
CREATE TABLE
percona=# INSERT INTO foo.products VALUES ('a1b2c3d4','2',point '(1,1)','This is my first message',NULL,ARRAY[1,3,5], now() - interval '11 days');
.......................................................................................................................................
.......................................................................................................................................
percona=# INSERT INTO foo.products VALUES ('j1l2m3n4','1000000',point '(5,2)','This is my millionth message',NULL,ARRAY[2,5,7], now() - interval '10 days');

percona=# update foo.products set comments_tsv = to_tsvector(comments);
UPDATE 1000000
percona=# select * from foo.products LIMIT 1;
 product_sku | quantity | location |          comments          |      comments_tsv      | assembled_days |       manufactured_date       
-------------+----------+----------+----------------------------+------------------------+----------------+-------------------------------
 a1b2c3d4    |        2 | (1,1)    | This is my first message   | 'first':4 'messag':5   | {1,3,5}        | 2019-06-01 17:31:35.632891-04
(1 row)
复制

Btree索引

我们可以尝试在quantity列上创建BTREE索引，以提高使用该字段作为谓词查询的性能。正如你在下面的日志中所看到的，我创建了一个hypothetical btree index
，SQL上的EXPLAIN显示了一个索引扫描，其中使用索引的成本远低于没有索引的序列扫描。通过本次测试，我们知道创建此索引可以帮助优化我们测试的SQL。

WITHOUT Index
--------------
percona=# EXPLAIN select * from foo.products where quantity IN (4,5,6);
                             QUERY PLAN                              
---------------------------------------------------------------------
 Seq Scan on products  (cost=0.00..716349.40 rows=6246717 width=128)
   Filter: (quantity = ANY ('{4,5,6}'::integer[]))
(2 rows)

Create Hypothetical Index 
--------------------------
percona=# SELECT * FROM 
myextensions.hypopg_create_index('CREATE INDEX btree_hypo_idx on foo.products USING BTREE(quantity)');
 indexrelid |             indexname              
------------+------------------------------------
      16665 | <16665>btree_foo_products_quantity
(1 row)

WITH Hypothetical Index 
------------------------
percona=# EXPLAIN select * from foo.products where quantity IN (4,5,6);
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Index Scan using <16665>btree_foo_products_quantity on products  (cost=0.06..546930.72 rows=6246729 width=128)
   Index Cond: (quantity = ANY ('{4,5,6}'::integer[]))
(2 rows)
复制

BLOOM索引

文档没有提到虚拟索引当前适用于BLOOM索引。但是，我已经测试过看看EXPLAIN是否可以显示使用虚拟bloom索引的计划，它确实有效。

percona=# CREATE EXTENSION bloom ;
CREATE EXTENSION

percona=# SELECT * FROM
myextensions.hypopg_create_index('create index on foo.products using bloom(quantity);');
 indexrelid |             indexname              
------------+------------------------------------
      16703 | <16703>bloom_foo_products_quantity
(1 row)

percona=# SET enable_seqscan TO OFF;
SET
percona=# EXPLAIN select * from foo.products WHERE quantity = 4;
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on products  (cost=199449.64..873500.25 rows=4164944 width=128)
   Recheck Cond: (quantity = 4)
   ->  Bitmap Index Scan on <16703>bloom_foo_products_quantity  (cost=0.00..198408.40 rows=4164944 width=0)
         Index Cond: (quantity = 4)
(4 rows)
复制

BRIN索引

然后我尝试创建一个虚拟的BRIN索引，看看我是否得到任何错误，因为目前只支持btree虚拟索引。令我惊讶的是，我没有看到任何错误。但是当我尝试对SQL执行EXPLAIN时，或者当我尝试在该表上运行SELECT时，我确实看到了一个奇怪的错误，如下面的日志中所示。所以，只是因为它允许你创建BRIN虚拟索引但不意味着它有用。

percona=# SELECT * FROM
myextensions.hypopg_create_index('create index on foo.products using brin(manufactured_date) with (pages_per_range=4);');
 indexrelid |                 indexname                  
------------+--------------------------------------------
      16669 | <16669>brin_foo_products_manufactured_date
(1 row)
percona=# select * from hypopg_list_indexes();
 indexrelid |                 indexname                  | nspname | relname  | amname 
------------+--------------------------------------------+---------+----------+--------
      16669 | <16669>brin_foo_products_manufactured_date | foo     | products | brin
(1 row)

percona=# EXPLAIN select * from foo.products WHERE manufactured_date < '2019-06-03 17:31:35';
ERROR:  could not open relation with OID 16669

percona=# select count(*) from foo.products WHERE manufactured_date < '2019-06-03 17:31:35';
ERROR:  could not open relation with OID 16669
复制

Hash,Gin,GiST或其他索引

与BRIN索引不同，当我们尝试创建任何其他类型的虚拟索引时，它会抛出一条确切的错误消息，指出我们制定的索引类型不受支持。

Hash Index
-----------
percona=# SELECT * FROM 
myextensions.hypopg_create_index('CREATE INDEX hash_hypo_idx on foo.products USING HASH(product_sku)'); 
ERROR: hypopg: access method "hash" is not supported

GiST Index
-----------
percona=# SELECT * FROM                                                                                                             
myextensions.hypopg_create_index('create index on foo.products using gist(location);');
ERROR:  hypopg: access method "gist" is not supported
percona=#

GIN Index
----------
percona=# SELECT * FROM
myextensions.hypopg_create_index('create index on foo.products using gin(comments_tsv);');
ERROR:  hypopg: access method "gin" is not supported
percona=#
复制

结论

很高兴看到在帮助开发人员和管理员使用扩展方面做的一些努力，这有助于他们知道是否可以使用索引而无需实际创建它。这是我们可以用来使索引建议自动化的东西，它结合了另一个名为pg_qualstats的扩展，我们将在下一篇博客文章中讨论。感谢所有帮助我们在PostgreSQL中实现此功能的贡献者。

本文翻译自：Hypothetical Indexes in PostgreSQL

postgresql

文章转载自飞象数据，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。