暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

pg_stat_kcache 2.2.0 发布, 支持PG 13 - 跟踪cpu、文件系统真实读写行为

digoal 2020-12-15
402

作者

digoal

日期

2020-12-15

标签

PostgreSQL , linux , file system


背景

PG 的bgwriter使用buffer io, shared buffer的统计信息容易让人产生错觉. 例如shared buffer配置较小, 通过统计信息发现命中率低, 但实际上也可能没有发生真实的disk io. 因为read可能发生在page cache中, 这些行为是file system接口来完成的, 所以数据库内核并不知情.

pg_stat_kcache用于跟踪文件系统层行为, 例如 可以分辨到底发生了多少page cache, 多少真实的disk io.

major page fault 说明发生了disk 访问
minor page fault 说明发生了page cache访问

《page fault的两种区别(major、minor)》

https://pgxn.org/dist/pg_stat_kcache/2.2.0/

pg_stat_kcache

Features

Gathers statistics about real reads and writes done by the filesystem layer.
It is provided in the form of an extension for PostgreSQL >= 9.4., and requires
pg_stat_statements extension to be installed. PostgreSQL 9.4 or more is
required as previous version of provided pg_stat_statements didn't expose the
queryid field.

Installation

Compiling

The module can be built using the standard PGXS infrastructure. For this to
work, the pg_config program must be available in your $PATH. Instruction to
install follows::

git clone https://github.com/powa-team/pg_stat_kcache.git
cd pg_stat_kcache
make
make install

PostgreSQL setup

The extension is now available. But, as it requires some shared memory to hold
its counters, the module must be loaded at PostgreSQL startup. Thus, you must
add the module to shared_preload_libraries in your postgresql.conf. You
need a server restart to take the change into account. As this extension
depends on pg_stat_statements, it also need to be added to
shared_preload_libraries.

Add the following parameters into you postgresql.conf::

# postgresql.conf
shared_preload_libraries = 'pg_stat_statements,pg_stat_kcache'

Once your PostgreSQL cluster is restarted, you can install the extension in
every database where you need to access the statistics::

mydb=# CREATE EXTENSION pg_stat_kcache;

Configuration

The following GUCs can be configured, in postgresql.conf:

  • pg_stat_kcache.linux_hz (int, default -1): informs pg_stat_kcache of the
    linux CONFIG_HZ config option. This is used by pg_stat_kcache to compensate
    for sampling errors. The default value is -1, tries to guess it at startup.
  • pg_stat_kcache.track (enum, default top): controls which statements are
    tracked by pg_stat_kcache. Specify top to track top-level statements (those
    issued directly by clients), all to also track nested statements (such as
    statements invoked within functions), or none to disable statement statistics
    collection.
  • pg_stat_kcache.track_planning (bool, default off): controls whether
    planning operations and duration are tracked by pg_stat_kcache (requires
    PostgreSQL 13 or above).

Usage

pg_stat_kcache create several objects.

pg_stat_kcache view

+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Name | Type | Description |
+==================+==================+=========================================================================================================================================================+
| datname | name | Name of the database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_user_time | double precision | User CPU time used planning statements in this database, in seconds and milliseconds (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_system_time | double precision | System CPU time used planning statements in this database, in seconds and milliseconds (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_minflts | bigint | Number of page reclaims (soft page faults) planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_majflts | bigint | Number of page faults (hard page faults) planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nswaps | bigint | Number of swaps planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_reads | bigint | Number of bytes read by the filesystem layer planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_reads_blks | bigint | Number of 8K blocks read by the filesystem layer planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_writes | bigint | Number of bytes written by the filesystem layer planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_writes_blks | bigint | Number of 8K blocks written by the filesystem layer planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_msgsnds | bigint | Number of IPC messages sent planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_msgrcvs | bigint | Number of IPC messages received planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nsignals | bigint | Number of signals received planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nvcsws | bigint | Number of voluntary context switches planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nivcsws | bigint | Number of involuntary context switches planning statements in this database (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_user_time | double precision | User CPU time used executing statements in this database, in seconds and milliseconds |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_system_time | double precision | System CPU time used executing statements in this database, in seconds and milliseconds |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_minflts | bigint | Number of page reclaims (soft page faults) executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_majflts | bigint | Number of page faults (hard page faults) executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nswaps | bigint | Number of swaps executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_reads | bigint | Number of bytes read by the filesystem layer executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_reads_blks | bigint | Number of 8K blocks read by the filesystem layer executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_writes | bigint | Number of bytes written by the filesystem layer executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_writes_blks | bigint | Number of 8K blocks written by the filesystem layer executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_msgsnds | bigint | Number of IPC messages sent executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_msgrcvs | bigint | Number of IPC messages received executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nsignals | bigint | Number of signals received executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nvcsws | bigint | Number of voluntary context switches executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nivcsws | bigint | Number of involuntary context switches executing statements in this database |
+------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+

pg_stat_kcache_detail view

+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Name | Type | Description |
+==================+==================+==========================================================================================================================================+
| query | text | Query text |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| top | bool | True if the statement is top-level |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| datname | name | Database name |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| rolname | name | Role name |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_user_time | double precision | User CPU time used planning the statement, in seconds and milliseconds (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_system_time | double precision | System CPU time used planning the statement, in seconds and milliseconds (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_minflts | bigint | Number of page reclaims (soft page faults) planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_majflts | bigint | Number of page faults (hard page faults) planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nswaps | bigint | Number of swaps planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_reads | bigint | Number of bytes read by the filesystem layer planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_reads_blks | bigint | Number of 8K blocks read by the filesystem layer planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_writes | bigint | Number of bytes written by the filesystem layer planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_writes_blks | bigint | Number of 8K blocks written by the filesystem layer planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_msgsnds | bigint | Number of IPC messages sent planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_msgrcvs | bigint | Number of IPC messages received planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nsignals | bigint | Number of signals received planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nvcsws | bigint | Number of voluntary context switches planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nivcsws | bigint | Number of involuntary context switches planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_user_time | double precision | User CPU time used executing the statement, in seconds and milliseconds |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_system_time | double precision | System CPU time used executing the statement, in seconds and milliseconds |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_minflts | bigint | Number of page reclaims (soft page faults) executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_majflts | bigint | Number of page faults (hard page faults) executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nswaps | bigint | Number of swaps executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_reads | bigint | Number of bytes read by the filesystem layer executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_reads_blks | bigint | Number of 8K blocks read by the filesystem layer executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_writes | bigint | Number of bytes written by the filesystem layer executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_writes_blks | bigint | Number of 8K blocks written by the filesystem layer executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_msgsnds | bigint | Number of IPC messages sent executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_msgrcvs | bigint | Number of IPC messages received executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nsignals | bigint | Number of signals received executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nvcsws | bigint | Number of voluntary context switches executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nivcsws | bigint | Number of involuntary context switches executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+

pg_stat_kcache_reset function

Resets the statistics gathered by pg_stat_kcache. Can be called by superusers::

pg_stat_kcache_reset()

pg_stat_kcache function

This function is a set-returning functions that dumps the containt of the counters
of the shared memory structure. This function is used by the pg_stat_kcache view.
The function can be called by any user::

SELECT * FROM pg_stat_kcache();

It provides the following columns:

+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Name | Type | Description |
+==================+==================+==========================================================================================================================================+
| queryid | bigint | pg_stat_statements' query identifier |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| top | bool | True if the statement is top-level |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| userid | oid | Database OID |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| dbid | oid | Database OID |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_user_time | double precision | User CPU time used planning the statement, in seconds and milliseconds (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_system_time | double precision | System CPU time used planning the statement, in seconds and milliseconds (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_minflts | bigint | Number of page reclaims (soft page faults) planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_majflts | bigint | Number of page faults (hard page faults) planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nswaps | bigint | Number of swaps planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_reads | bigint | Number of bytes read by the filesystem layer planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_reads_blks | bigint | Number of 8K blocks read by the filesystem layer planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_writes | bigint | Number of bytes written by the filesystem layer planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_writes_blks | bigint | Number of 8K blocks written by the filesystem layer planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_msgsnds | bigint | Number of IPC messages sent planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_msgrcvs | bigint | Number of IPC messages received planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nsignals | bigint | Number of signals received planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nvcsws | bigint | Number of voluntary context switches planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| plan_nivcsws | bigint | Number of involuntary context switches planning the statement (if pg_stat_kcache.track_planning is enabled, otherwise zero) |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_user_time | double precision | User CPU time used executing the statement, in seconds and milliseconds |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_system_time | double precision | System CPU time used executing the statement, in seconds and milliseconds |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_minflts | bigint | Number of page reclaims (soft page faults) executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_majflts | bigint | Number of page faults (hard page faults) executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nswaps | bigint | Number of swaps executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_reads | bigint | Number of bytes read by the filesystem layer executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_reads_blks | bigint | Number of 8K blocks read by the filesystem layer executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_writes | bigint | Number of bytes written by the filesystem layer executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_writes_blks | bigint | Number of 8K blocks written by the filesystem layer executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_msgsnds | bigint | Number of IPC messages sent executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_msgrcvs | bigint | Number of IPC messages received executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nsignals | bigint | Number of signals received executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nvcsws | bigint | Number of voluntary context switches executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| exec_nivcsws | bigint | Number of involuntary context switches executing the statements |
+------------------+------------------+------------------------------------------------------------------------------------------------------------------------------------------+

Bugs and limitations

No known bugs.

Tracking planner resources usage requires PostgreSQL 13 or above.

We assume that a kernel block is 512 bytes. This is true for Linux, but may not
be the case for another Unix implementation.

See: http://lkml.indiana.edu/hypermail/linux/kernel/0703.2/0937.html

On platforms without a native getrusage(2), all fields except user_time and
system_time will be NULL.

On platforms with a native getrusage(2), some of the fields may not be
maintained. This is a platform dependent behavior, please refer to your
platform getrusage(2) manual page for more details.

If pg_stat_kcache.track is all, pg_stat_kcache tracks nested statements.
The max number of nesting level that will be tracked is is limited to 64, in
order to keep implementation simple, but this should be enough for reasonable
use cases.

Even if pg_stat_kcache.track is all, pg_stat_kcache view considers only
statistics of top-level statements. So, there is the case which even though
user cpu time used planning a nested statement is high, plan_user_time of
pg_stat_kcache view is small. In such a case, user cpu time used planning a
nested statement is counted in exec_user_time.

Authors

pg_stat_kcache is an original development from Thomas Reiss, with large
portions of code inspired from pg_stat_plans. Julien Rouhaud also contributed
some parts of the extension.

Thanks goes to Peter Geoghegan for providing much inspiration with
pg_stat_plans so we could write this extension quite straightforward.

License

pg_stat_kcache is free software distributed under the PostgreSQL license.

Copyright (c) 2014-2017, Dalibo
Copyright (c) 2018-2020, The PoWA-team

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论