PostgreSQL pglog 插件, 分析PG csvlog产生各种报告

原创 digoal 2022-01-20

173

作者

digoal

日期

2021-09-18

Parameter	Setting	Comment
log_destination	csvlog	Mandatory
logging_collector	on	Mandatory for csv loggin
log_min_duration_statement	250 ms	For a transactional workload, for Business Intelligence worload, 1s or 5s can be fine
log_autovacuum_min_duration	0	To see all autovacuum operation occuring
log_checkpoints	on	To see when checkpoints occur
log_connections	on	To see the rythm of connections
log_disconnections	on	To see how long sessions are
log_lock_waits	on	To see when locks occur
log_temp_files	0	To see if PostgreSQL is swapping

使用, 创建需要分析的csvlog文件或目录

create extension pglog cascade;  
call pglog.create_logtable_from_file (file, directory, version[, schema][, materialized]);  
call pglog.create_logtable_from_dir (directory, version[, schema][, materialized]);  
复制

an error report
a detailed error analysis for each level of error
an autovacuum report
a checkpoints report
an hourly checkpoints report
a tempfile usage report
a tempfile queries report
a top n slow queries report

Based on the pglog partitionned table, you can generate dashboards.

Error report

This view will query the partitionned table pglog for log levels analysis.
Example:

pglog=# select * from error_report;  
 error_severity |   nb    | percentage   
----------------+---------+------------  
 FATAL          |       8 |       0.00  
 ERROR          |     106 |       0.00  
 WARNING        |       3 |       0.00  
 LOG            | 6055776 |     100.00  
(4 rows)  
复制

Detail error analysis

The views error_analyze_(warning|error|fatal|panic) will display a more detail
analysis of the given log level. For lower severity level, I recommend using
adding limit n to your query.
Example:

select * from error_analyze_fatal;  
                       message                       | count |            min             |            max               
-----------------------------------------------------+-------+----------------------------+----------------------------  
 terminating connection due to administrator command |     7 | 2021-08-25 13:01:32.992+00 | 2021-08-27 09:12:44.884+00  
 connection to client lost                           |     2 | 2021-08-25 12:20:24.044+00 | 2021-08-25 12:20:24.044+00  
(2 rows)  
复制

Autovacuum report

This view will display the number of autovacuum and autoanalyze per table. You
might want to add a limit n part to your query so you don't display all your
tables in the resault.
Example:

pglog=# select * from autovacuum_report limit 5;  
 schema_name |  table_name   | vacuum_count | analyze_count   
-------------+---------------+--------------+---------------  
 pg_catalog  | pg_statistic  |           24 |             0  
 pg_catalog  | pg_attribute  |           15 |            14  
 pglog       | postgresqlwed |            8 |             7  
 pg_catalog  | pg_class      |            7 |             7  
 pg_catalog  | pg_type       |            6 |             6  
(5 rows)  
复制

Checkpoints

This view will give you statistics about your checkpoints as how often they're
launched and how much time it takes to complete.
Example:

pglog=# select * from checkpoints_stats;  
 avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration   
---------------------+------------------------+-------------------------+----------------------------  
 06:01:55.491981     | 00:15:00.809           | 00:00:32.447288         | 00:00:06.6675  
(1 row)  
复制

Checkpoints by hour

It's rare that your database has a flat workload. It usually works by peaks.
This view will give you chackpoints statistics by hours. Once the peak hours
are identified, I strongly suggest to craft a query that will get the same stats
per minute during the peak times.
Example:

pglog=# select * from checkpoints_stats_by_hour;  
 date_part | avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration   
-----------+---------------------+------------------------+-------------------------+----------------------------  
         6 | 03:31:43.989333     | 00:04:59.941           | 00:00:06.375            | 00:00:04.431  
         7 | 22:36:05.281917     | 00:15:00.809           | 00:00:17.00225          | 00:00:03.916  
         8 | 00:21:22.5615       | 00:15:01.661           | 00:00:17.9445           | 00:00:09.3765  
         9 | 00:39:21.674167     | 00:30:17.2415          | 00:00:33.430833         | 00:00:11.2735  
        10 | 00:16:09.9175       | 00:16:09.9175          | 00:00:06.032            | 00:00:06.032  
        11 | 00:37:25.941667     | 00:12:22.328           | 00:01:40.044667         | 00:02:29.37  
        12 | 01:11:27.7846       | 00:40:00.957           | 00:01:05.7842           | 00:00:16.016  
        13 | 00:15:19.55725      | 00:07:56.6295          | 00:01:18.24225          | 00:01:21.253  
        15 | 04:13:58.037        | 04:13:58.037           | 00:02:30.109            | 00:02:30.109  
        16 | 02:58:23.038333     | 00:25:00.427           | 00:00:05.165667         | 00:00:05.012  
        17 | 00:15:00.34025      | 00:07:30.346           | 00:00:04.27425          | 00:00:03.195  
        19 | 01:02:31.026        | 01:02:31.026           | 00:00:06.4115           | 00:00:06.4115  
        20 | 00:09:59.441        | 00:09:59.441           | 00:00:06.241            | 00:00:06.241  
(13 rows)  
复制

Tempfile report

Tempfiles are crucial when it comes to performance because when tempfiles are
used, it means Postgres is swapping on disc. These statistics might help you
adjusting work_mem to a better value for your system.
Example:

pglog=# select * from tempfile_report ;  
   min   |  max   | global_average |   10%   |   20%   |   30%   |   40% |  50%  |  60%  |  70%  |  80%   |  90%   |  100%    
---------+--------+----------------+---------+---------+---------+---------+-------+-------+-------+--------+--------+--------  
 8192 kB | 374 GB | 57 GB          | 5720 MB | 5720 MB | 5720 MB | 6000 MB | 32 GB | 33 GB | 36 GB | 142 GB | 144 GB | 374 GB  
(1 row)  
复制

Tempfile queries

It happens that most tempfiles are used by one or two specific queries. In that
case, you can set work_mem to a highest value right before those queries on
the session level.
Example:

pglog=# select * from tempfile_queries limit 2 \gx  
-[ RECORD 1 ]--------------+----------------------------------------------------------------------  
frequency                  | 33  
query_tempfilesize_median  | 142 GB  
query_tempfilesize_average | 142 GB  
total_size                 | 4689 GB  
query                      | create materialized view pglog.top_n_queries as (                    +  
                           |       with queries as (                                              +  
                           |       select                                                         +  
                           |         split_part(message, ?)                                       +  
                           |     )                                                                +  
                           |     select query,                                                    +  
                           |       count(1),                                                      +  
                           |       avg(duration) as average,                                      +  
                           |       percentile_disc(0.5) within group (order by duration) as median+  
                           |     from queries                                                     +  
                           |     group by query                                                   +  
                           |     order by average desc                                            +  
                           |     )  
-[ RECORD 2 ]--------------+----------------------------------------------------------------------  
frequency                  | 154  
query_tempfilesize_median  | 6000 MB  
query_tempfilesize_average | 30 GB  
total_size                 | 4655 GB  
query                      | call create_mylogs (?,true);  
复制

Top n slow queries

This view will give the top n slowest query in the partitionned table
pglog. Don't forget to add a limit n when you'll query the view unless you
want to see all your slow queries.
Example:

pglog=# select *  
from top_n_slowest_queries  
limit 5;  
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | with queries as ( select split_part(message, ? ) ) select query, count ? , avg(duration) as average, percentile_disc ? within group (order by duration) as median from queries group by query order by average desc limit ?   
count   | 1  
average | 26384.688000000000  
median  | 26384.688  
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog inner join (values ? ? )) as severity(level,name) on pglog.pglog.error_severity = severity.name, total group by error_severity, severity.level, total order by severity.level;  
count   | 1  
average | 24090.557000000000  
median  | 24090.557  
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog, total group by error_severity, total order by percentage desc;  
count   | 2  
average | 23530.949500000000  
median  | 23210.202  
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?   
count   | 173009  
average | 2.5784063256824789  
median  | 1.666  
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
query   | UPDATE pgbench_tellers SET tbalance = tbalance + ? WHERE tid = ?   
count   | 173009  
average | 1.9884613054812177  
median  | 0.593  
复制

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

PostgreSQL pglog 插件, 分析PG csvlog产生各种报告

作者

日期

标签

Error report

Detail error analysis

Autovacuum report

Checkpoints

Checkpoints by hour

Tempfile report

Tempfile queries

Top n slow queries

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论

相关阅读