暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PolarDB-PG原理解读——ANALYZE 源码解读(二)

PolarDB农夫山泉 2023-08-29
152

PolarDB PostgreSQL版(以下简称 PolarDB-PG)是一款阿里云自主研发的企业级数据库产品,采用计算存储分离架构,兼容 PostgreSQL 与 Oracle。PolarDB-PG 的存储与计算能力均可横向扩展,具有高可靠、高可用、弹性扩展等企业级数据库特性。同时,PolarDB-PG 具有大规模并行计算能力,可以应对 OLTP 与 OLAP 混合负载;还具有时空、向量、搜索、图谱等多模创新特性,可以满足企业对数据处理日新月异的新需求。

Most Common Values (MCV)

/* * In a "most common values" slot, staop is the OID of the "=" operator * used to decide whether values are the same or not, and stacoll is the * collation used (same as column's collation). stavalues contains * the K most common non-null values appearing in the column, and stanumbers * contains their frequencies (fractions of total row count). The values * shall be ordered in decreasing frequency. Note that since the arrays are * variable-size, K may be chosen by the statistics collector. Values should * not appear in MCV unless they have been observed to occur more than once; * a unique column will have no MCV slot. */ #define STATISTIC_KIND_MCV 1
复制

对于一个列中的 最常见值,在 staop 中保存 = 运算符来决定一个值是否等于一个最常见值。在 stavalues 中保存了该列中最常见的 K 个非空值,stanumbers 中分别保存了这 K 个值出现的频率。

Histogram

/* * A "histogram" slot describes the distribution of scalar data. staop is * the OID of the "<" operator that describes the sort ordering, and stacoll * is the relevant collation. (In theory more than one histogram could appear, * if a datatype has more than one useful sort operator or we care about more * than one collation. Currently the collation will always be that of the * underlying column.) stavalues contains M (>=2) non-null values that * divide the non-null column data values into M-1 bins of approximately equal * population. The first stavalues item is the MIN and the last is the MAX. * stanumbers is not used and should be NULL. IMPORTANT POINT: if an MCV * slot is also provided, then the histogram describes the data distribution * *after removing the values listed in MCV* (thus, it's a "compressed * histogram" in the technical parlance). This allows a more accurate * representation of the distribution of a column with some very-common * values. In a column with only a few distinct values, it's possible that * the MCV list describes the entire data population; in this case the * histogram reduces to empty and should be omitted. */ #define STATISTIC_KIND_HISTOGRAM 2
复制

表示一个(数值)列的数据分布直方图。staop 保存 < 运算符用于决定数据分布的排序顺序。stavalues 包含了能够将该列的非空值划分到 M - 1 个容量接近的桶中的 M 个非空值。如果该列中已经有了 MCV 的槽,那么数据分布直方图中将不包含 MCV 中的值,以获得更精确的数据分布。

Correlation

/* * A "correlation" slot describes the correlation between the physical order * of table tuples and the ordering of data values of this column, as seen * by the "<" operator identified by staop with the collation identified by * stacoll. (As with the histogram, more than one entry could theoretically * appear.) stavalues is not used and should be NULL. stanumbers contains * a single entry, the correlation coefficient between the sequence of data * values and the sequence of their actual tuple positions. The coefficient * ranges from +1 to -1. */ #define STATISTIC_KIND_CORRELATION 3
复制

stanumbers 中保存数据值和它们的实际元组位置的相关系数。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论