PolarDB PostgreSQL版(以下简称 PolarDB-PG)是一款阿里云自主研发的企业级数据库产品,采用计算存储分离架构,兼容 PostgreSQL 与 Oracle。PolarDB-PG 的存储与计算能力均可横向扩展,具有高可靠、高可用、弹性扩展等企业级数据库特性。同时,PolarDB-PG 具有大规模并行计算能力,可以应对 OLTP 与 OLAP 混合负载;还具有时空、向量、搜索、图谱等多模创新特性,可以满足企业对数据处理日新月异的新需求。
背景
PostgreSQL 在优化器中为一个查询树输出一个执行效率最高的物理计划树。其中,执行效率高低的衡量是通过代价估算实现的。比如通过估算查询返回元组的条数,和元组的宽度,就可以计算出 I/O 开销;也可以根据将要执行的物理操作估算出可能需要消耗的 CPU 代价。优化器通过系统表 pg_statistic
获得这些在代价估算过程需要使用到的关键统计信息,而 pg_statistic
系统表中的统计信息又是通过自动或手动的 ANALYZE
操作(或 VACUUM
)计算得到的。ANALYZE
将会扫描表中的数据并按列进行分析,将得到的诸如每列的数据分布、最常见值、频率等统计信息写入系统表。
本文从源码的角度分析一下 ANALYZE
操作的实现机制。源码使用目前 PostgreSQL 最新的稳定版本 PostgreSQL 14。
统计信息
首先,我们应当搞明白分析操作的输出是什么。所以我们可以看一看 pg_statistic
中有哪些列,每个列的含义是什么。这个系统表中的每一行表示其它数据表中 每一列的统计信息。
postgres=# \d+ pg_statistic
Table "pg_catalog.pg_statistic"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+----------+-----------+----------+---------+----------+--------------+-------------
starelid | oid | | not null | | plain | |
staattnum | smallint | | not null | | plain | |
stainherit | boolean | | not null | | plain | |
stanullfrac | real | | not null | | plain | |
stawidth | integer | | not null | | plain | |
stadistinct | real | | not null | | plain | |
stakind1 | smallint | | not null | | plain | |
stakind2 | smallint | | not null | | plain | |
stakind3 | smallint | | not null | | plain | |
stakind4 | smallint | | not null | | plain | |
stakind5 | smallint | | not null | | plain | |
staop1 | oid | | not null | | plain | |
staop2 | oid | | not null | | plain | |
staop3 | oid | | not null | | plain | |
staop4 | oid | | not null | | plain | |
staop5 | oid | | not null | | plain | |
stanumbers1 | real[] | | | | extended | |
stanumbers2 | real[] | | | | extended | |
stanumbers3 | real[] | | | | extended | |
stanumbers4 | real[] | | | | extended | |
stanumbers5 | real[] | | | | extended | |
stavalues1 | anyarray | | | | extended | |
stavalues2 | anyarray | | | | extended | |
stavalues3 | anyarray | | | | extended | |
stavalues4 | anyarray | | | | extended | |
stavalues5 | anyarray | | | | extended | |
Indexes:
"pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, stainherit)
复制
/* ----------------
* pg_statistic definition. cpp turns this into
* typedef struct FormData_pg_statistic
* ----------------
*/
CATALOG(pg_statistic,2619,StatisticRelationId)
{
/* These fields form the unique key for the entry: */
Oid starelid BKI_LOOKUP(pg_class); /* relation containing
* attribute */
int16 staattnum; /* attribute (column) stats are for */
bool stainherit; /* true if inheritance children are included */
/* the fraction of the column's entries that are NULL: */
float4 stanullfrac;
/*
* stawidth is the average width in bytes of non-null entries. For
* fixed-width datatypes this is of course the same as the typlen, but for
* var-width types it is more useful. Note that this is the average width
* of the data as actually stored, post-TOASTing (eg, for a
* moved-out-of-line value, only the size of the pointer object is
* counted). This is the appropriate definition for the primary use of
* the statistic, which is to estimate sizes of in-memory hash tables of
* tuples.
*/
int32 stawidth;
/* ----------------
* stadistinct indicates the (approximate) number of distinct non-null
* data values in the column. The interpretation is:
* 0 unknown or not computed
* > 0 actual number of distinct values
* < 0 negative of multiplier for number of rows
* The special negative case allows us to cope with columns that are
* unique (stadistinct = -1) or nearly so (for example, a column in which
* non-null values appear about twice on the average could be represented
* by stadistinct = -0.5 if there are no nulls, or -0.4 if 20% of the
* column is nulls). Because the number-of-rows statistic in pg_class may
* be updated more frequently than pg_statistic is, it's important to be
* able to describe such situations as a multiple of the number of rows,
* rather than a fixed number of distinct values. But in other cases a
* fixed number is correct (eg, a boolean column).
* ----------------
*/
float4 stadistinct;
/* ----------------
* To allow keeping statistics on different kinds of datatypes,
* we do not hard-wire any particular meaning for the remaining
* statistical fields. Instead, we provide several "slots" in which
* statistical data can be placed. Each slot includes:
* kind integer code identifying kind of data (see below)
* op OID of associated operator, if needed
* coll OID of relevant collation, or 0 if none
* numbers float4 array (for statistical values)
* values anyarray (for representations of data values)
* The ID, operator, and collation fields are never NULL; they are zeroes
* in an unused slot. The numbers and values fields are NULL in an
* unused slot, and might also be NULL in a used slot if the slot kind
* has no need for one or the other.
* ----------------
*/
int16 stakind1;
int16 stakind2;
int16 stakind3;
int16 stakind4;
int16 stakind5;
Oid staop1 BKI_LOOKUP_OPT(pg_operator);
Oid staop2 BKI_LOOKUP_OPT(pg_operator);
Oid staop3 BKI_LOOKUP_OPT(pg_operator);
Oid staop4 BKI_LOOKUP_OPT(pg_operator);
Oid staop5 BKI_LOOKUP_OPT(pg_operator);
Oid stacoll1 BKI_LOOKUP_OPT(pg_collation);
Oid stacoll2 BKI_LOOKUP_OPT(pg_collation);
Oid stacoll3 BKI_LOOKUP_OPT(pg_collation);
Oid stacoll4 BKI_LOOKUP_OPT(pg_collation);
Oid stacoll5 BKI_LOOKUP_OPT(pg_collation);
#ifdef CATALOG_VARLEN /* variable-length fields start here */
float4 stanumbers1[1];
float4 stanumbers2[1];
float4 stanumbers3[1];
float4 stanumbers4[1];
float4 stanumbers5[1];
/*
* Values in these arrays are values of the column's data type, or of some
* related type such as an array element type. We presently have to cheat
* quite a bit to allow polymorphic arrays of this kind, but perhaps
* someday it'll be a less bogus facility.
*/
anyarray stavalues1;
anyarray stavalues2;
anyarray stavalues3;
anyarray stavalues4;
anyarray stavalues5;
#endif
} FormData_pg_statistic;
复制
从数据库命令行的角度和内核 C 代码的角度来看,统计信息的内容都是一致的。所有的属性都以 sta
开头。其中:
starelid
表示当前列所属的表或索引staattnum
表示本行统计信息属于上述表或索引中的第几列stainherit
表示统计信息是否包含子列stanullfrac
表示该列中值为 NULL 的行数比例stawidth
表示该列非空值的平均宽度stadistinct
表示列中非空值的唯一值数量0
表示未知或未计算> 0
表示唯一值的实际数量< 0
表示 negative of multiplier for number of rows
由于不同数据类型所能够被计算的统计信息可能会有一些细微的差别,在接下来的部分中,PostgreSQL 预留了一些存放统计信息的 槽(slots)。目前的内核里暂时预留了五个槽:
#define STATISTIC_NUM_SLOTS 5
复制
每一种特定的统计信息可以使用一个槽,具体在槽里放什么完全由这种统计信息的定义自由决定。每一个槽的可用空间包含这么几个部分(其中的 N
表示槽的编号,取值为 1
到 5
):
stakindN
:标识这种统计信息的整数编号staopN
:用于计算或使用统计信息的运算符 OIDstacollN
:排序规则 OIDstanumbersN
:浮点数数组stavaluesN
:任意值数组
PostgreSQL 内核中规定,统计信息的编号 1
至 99
被保留给 PostgreSQL 核心统计信息使用,其它部分的编号安排如内核注释所示:
/*
* The present allocation of "kind" codes is:
*
* 1-99: reserved for assignment by the core PostgreSQL project
* (values in this range will be documented in this file)
* 100-199: reserved for assignment by the PostGIS project
* (values to be documented in PostGIS documentation)
* 200-299: reserved for assignment by the ESRI ST_Geometry project
* (values to be documented in ESRI ST_Geometry documentation)
* 300-9999: reserved for future public assignments
*
* For private use you may choose a "kind" code at random in the range
* 10000-30000. However, for code that is to be widely disseminated it is
* better to obtain a publicly defined "kind" code by request from the
* PostgreSQL Global Development Group.
*/
复制
目前可以在内核代码中看到的 PostgreSQL 核心统计信息有 7 个,编号分别从 1
到 7
。我们可以看看这 7 种统计信息分别如何使用上述的槽。