PostgreSQL-Indexes

原创 Oracle 2022-11-22

204

Indexes

In PostgreSQL, indexes are special database objects mainly designed to speed up data access. They are auxiliary structures: each index can be deleted and recreated back from the information in the table. You may sometimes happen to hear that a DBMS can work without indexes although slowly. However, this is not the case, since indexes also serve to enforce some integrity constraints.

At present, six different kinds of indexes are built into PostgreSQL 9.6, and one more index is available as an extension — thanks to significant changes in version 9.6. So expect new types of indexes in a near future.

Despite all differences between types of indexes (also called access methods), each of them eventually associates a key (for example, the value of the indexed column) with table rows that contain this key. Each row is identified by TID (tuple id), which consists of the number of block in the file and the position of the row inside the block. That said, with the known key or some information about it we can quickly read those rows that may contain the information of our interest without scanning the entire table.

It is important to understand that an index speeds up data access at a certain maintenance cost. For each operation on indexed data, whether it be insertion, deletion, or update of table rows, indexes for that table need to be updated too, and in the same transaction. Note that update of table fields for which indexes haven't been built does not result in index update; this technique is called HOT (Heap-Only Tuples).

Extensibility entails some implications. To enable easy addition of a new access method to the system, an interface of the general indexing engine has been implemented. Its main task is to get TIDs from the access method and to work with them:

Read data from corresponding versions of table rows.
Fetch row versions TID by TID or in a batch using a prebuilt bitmap.
Check visibility of row versions for the current transaction taking into account its isolation level.

Indexing engine is involved in performing queries. It is called according to a plan created at the optimization stage. The optimizer, sorting out and evaluating different ways to perform the query, should understand capabilities of all access methods that are potentially applicable. Will the method be able to return data in the needed order or should we anticipate sorting? Can we use this method to search for NULL? These are problems that the optimizer is regularly solving.

Not only the optimizer needs information about the access method. When building an index, the system must decide whether the index can be built on several columns and whether this index ensures uniqueness.

So, each access method should provide all the necessary information about itself. Versions lower than 9.6 used the «pg_am» table for this, while starting with version 9.6 the data moved to deeper levels, inside special functions. We will get acquainted with this interface a bit further.

All the rest is the task of the access method:

Implement an algorithm for building the index and map the data into pages (for the buffer cache manager to uniformly process each index).
Search information in the index by a predicate in the form "indexed-field operator expression".
Evaluate the index usage cost.
Manipulate the locks required for correct parallel processing.
Generate write-ahead log (WAL) records.

We will first consider capabilities of the general indexing engine and then proceed to considering different access methods.

postgresql

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

PostgreSQL-Indexes

评论