暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 分位数聚合计算插件quantile

原创 digoal 2022-01-20
684

作者

digoal

日期

2021-09-14

标签

PostgreSQL , 分位


quantile 1.1.7

https://pgxn.org/dist/quantile/1.1.7/

不过PG现在的版本内置分位计算效率已经提高了很多, 建议使用内置的聚合函数:
https://www.postgresql.org/docs/current/functions-aggregate.html

Quantile aggregates

This extension provides three simple aggregate functions to compute
quantiles (http://en.wikipedia.org/wiki/Quantile). There are two
forms of aggregate functions available - the first one returns
a single quantile, the second one returns an arbitrary number of
quantiles (as an array).

This extension was created in 2011, before PostgreSQL added functions to
compute percentiles (percentile_cont and percentile_disc) in 9.4,
which was released in December 2014. Even after introduction of those
built-in functions it made sense to use this extension, because it was
significantly faster in various cases.

The performance of the built-in functions improved a lot since then, and
is usually very close or even faster than this extension. In some cases
the extension is perhaps 2x faster than the built-in functions, but that
may be (at least partially) attributed to not respecting work_mem.

It's therefore recommended to evaluate the built-in functions first, and
only use this extension if it's provably (and consistently) faster than
the built-in functions and the risk of running out of memory is low, or
when it's necessary to support older PostgreSQL releases (pre-9.4) that
do not have the built-in alternatives.

Computes arbitrary quantile of the values - the p_quantile has to be
between 0 and 1. For example this should return 500 because 500 is the
median of a sequence 1 .. 1000.

SELECT quantile(i, 0.5) FROM generate_series(1,1000) s(i);  
复制

but you can choose arbitrary quantile (for example 0.95).

This function is overloaded for the four basic numeric types, i.e.
int, bigint, double precision and numeric.

If you need multiple quantiles at the same time (e.g. all four
quartiles), you can use this function instead of the one described
above. This version allows you to pass an array of quantiles and
returns an array of values.

So if you need all three quartiles, you may do this

SELECT quantile(i, ARRAY[0.25, 0.5, 0.75])  
  FROM generate_series(1,1000) s(i);  
复制

and it should return ARRAY[250, 500, 750]. Compared to calling
the simple quantile function like this

SELECT quantile(i, 0.25), quantile(i, 0.5), quantile(i, 0.75)  
 FROM generate_series(1,1000) s(i);  
复制

the advantage is that the values are collected just once (into
a single array), not for each expression separately. If you're
working with large data sets, this may save a significant amount
of time and memory (if may even be the factor that allows the query
to finish and not being killed by OOM killer or something).

Just as in the first case, there are four functions handling other
basic numeric types: int, bigint, double precision and numeric.

Installing this is very simple, especially if you're using pgxn client.
All you need to do is this:

$ pgxn install quantile  
$ pgxn load -d mydb quantile
复制

and you're done. You may also install the extension manually:

$ make install  
$ psql dbname -c "CREATE EXTENSION quantile"
复制

And if you're on an older version (pre-9.1), you have to run the SQL
script manually

$ psql dbname < `pg_config --sharedir`/contrib/quantile--1.1.7.sql
复制

That's all.

This software is distributed under the terms of BSD 2-clause license.
See LICENSE or http://www.opensource.org/licenses/bsd-license.php for
more details.

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论