作者
digoal
日期
2021-09-18
标签
PostgreSQL , pglog
解析csvlog, 获得数据库报告
https://mydbanotebook.org/post/pglog/
https://gitlab.com/l_avrot/pglog
建议的参数配置如下:
Parameter | Setting | Comment |
---|---|---|
log_destination | csvlog | Mandatory |
logging_collector | on | Mandatory for csv loggin |
log_min_duration_statement | 250 ms | For a transactional workload, for Business Intelligence worload, 1s or 5s can be fine |
log_autovacuum_min_duration | 0 | To see all autovacuum operation occuring |
log_checkpoints | on | To see when checkpoints occur |
log_connections | on | To see the rythm of connections |
log_disconnections | on | To see how long sessions are |
log_lock_waits | on | To see when locks occur |
log_temp_files | 0 | To see if PostgreSQL is swapping |
使用, 创建需要分析的csvlog文件或目录
create extension pglog cascade; call pglog.create_logtable_from_file (file, directory, version[, schema][, materialized]); call pglog.create_logtable_from_dir (directory, version[, schema][, materialized]);
复制
- an error report
- a detailed error analysis for each level of error
- an autovacuum report
- a checkpoints report
- an hourly checkpoints report
- a tempfile usage report
- a tempfile queries report
- a top n slow queries report
Based on the pglog partitionned table, you can generate dashboards.
Error report
This view will query the partitionned table pglog for log levels analysis.
Example:
pglog=# select * from error_report; error_severity | nb | percentage ----------------+---------+------------ FATAL | 8 | 0.00 ERROR | 106 | 0.00 WARNING | 3 | 0.00 LOG | 6055776 | 100.00 (4 rows)
复制
Detail error analysis
The views error_analyze_(warning|error|fatal|panic) will display a more detail
analysis of the given log level. For lower severity level, I recommend using
adding limit n to your query.
Example:
select * from error_analyze_fatal; message | count | min | max -----------------------------------------------------+-------+----------------------------+---------------------------- terminating connection due to administrator command | 7 | 2021-08-25 13:01:32.992+00 | 2021-08-27 09:12:44.884+00 connection to client lost | 2 | 2021-08-25 12:20:24.044+00 | 2021-08-25 12:20:24.044+00 (2 rows)
复制
Autovacuum report
This view will display the number of autovacuum and autoanalyze per table. You
might want to add a limit n part to your query so you don't display all your
tables in the resault.
Example:
pglog=# select * from autovacuum_report limit 5; schema_name | table_name | vacuum_count | analyze_count -------------+---------------+--------------+--------------- pg_catalog | pg_statistic | 24 | 0 pg_catalog | pg_attribute | 15 | 14 pglog | postgresqlwed | 8 | 7 pg_catalog | pg_class | 7 | 7 pg_catalog | pg_type | 6 | 6 (5 rows)
复制
Checkpoints
This view will give you statistics about your checkpoints as how often they're
launched and how much time it takes to complete.
Example:
pglog=# select * from checkpoints_stats; avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration ---------------------+------------------------+-------------------------+---------------------------- 06:01:55.491981 | 00:15:00.809 | 00:00:32.447288 | 00:00:06.6675 (1 row)
复制
Checkpoints by hour
It's rare that your database has a flat workload. It usually works by peaks.
This view will give you chackpoints statistics by hours. Once the peak hours
are identified, I strongly suggest to craft a query that will get the same stats
per minute during the peak times.
Example:
pglog=# select * from checkpoints_stats_by_hour; date_part | avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration -----------+---------------------+------------------------+-------------------------+---------------------------- 6 | 03:31:43.989333 | 00:04:59.941 | 00:00:06.375 | 00:00:04.431 7 | 22:36:05.281917 | 00:15:00.809 | 00:00:17.00225 | 00:00:03.916 8 | 00:21:22.5615 | 00:15:01.661 | 00:00:17.9445 | 00:00:09.3765 9 | 00:39:21.674167 | 00:30:17.2415 | 00:00:33.430833 | 00:00:11.2735 10 | 00:16:09.9175 | 00:16:09.9175 | 00:00:06.032 | 00:00:06.032 11 | 00:37:25.941667 | 00:12:22.328 | 00:01:40.044667 | 00:02:29.37 12 | 01:11:27.7846 | 00:40:00.957 | 00:01:05.7842 | 00:00:16.016 13 | 00:15:19.55725 | 00:07:56.6295 | 00:01:18.24225 | 00:01:21.253 15 | 04:13:58.037 | 04:13:58.037 | 00:02:30.109 | 00:02:30.109 16 | 02:58:23.038333 | 00:25:00.427 | 00:00:05.165667 | 00:00:05.012 17 | 00:15:00.34025 | 00:07:30.346 | 00:00:04.27425 | 00:00:03.195 19 | 01:02:31.026 | 01:02:31.026 | 00:00:06.4115 | 00:00:06.4115 20 | 00:09:59.441 | 00:09:59.441 | 00:00:06.241 | 00:00:06.241 (13 rows)
复制
Tempfile report
Tempfiles are crucial when it comes to performance because when tempfiles are
used, it means Postgres is swapping on disc. These statistics might help you
adjusting work_mem to a better value for your system.
Example:
pglog=# select * from tempfile_report ; min | max | global_average | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% ---------+--------+----------------+---------+---------+---------+---------+-------+-------+-------+--------+--------+-------- 8192 kB | 374 GB | 57 GB | 5720 MB | 5720 MB | 5720 MB | 6000 MB | 32 GB | 33 GB | 36 GB | 142 GB | 144 GB | 374 GB (1 row)
复制
Tempfile queries
It happens that most tempfiles are used by one or two specific queries. In that
case, you can set work_mem to a highest value right before those queries on
the session level.
Example:
pglog=# select * from tempfile_queries limit 2 \gx -[ RECORD 1 ]--------------+---------------------------------------------------------------------- frequency | 33 query_tempfilesize_median | 142 GB query_tempfilesize_average | 142 GB total_size | 4689 GB query | create materialized view pglog.top_n_queries as ( + | with queries as ( + | select + | split_part(message, ?) + | ) + | select query, + | count(1), + | avg(duration) as average, + | percentile_disc(0.5) within group (order by duration) as median+ | from queries + | group by query + | order by average desc + | ) -[ RECORD 2 ]--------------+---------------------------------------------------------------------- frequency | 154 query_tempfilesize_median | 6000 MB query_tempfilesize_average | 30 GB total_size | 4655 GB query | call create_mylogs (?,true);
复制
Top n slow queries
This view will give the top n slowest query in the partitionned table
pglog. Don't forget to add a limit n when you'll query the view unless you
want to see all your slow queries.
Example:
pglog=# select * from top_n_slowest_queries limit 5; -[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- query | with queries as ( select split_part(message, ? ) ) select query, count ? , avg(duration) as average, percentile_disc ? within group (order by duration) as median from queries group by query order by average desc limit ? count | 1 average | 26384.688000000000 median | 26384.688 -[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- query | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog inner join (values ? ? )) as severity(level,name) on pglog.pglog.error_severity = severity.name, total group by error_severity, severity.level, total order by severity.level; count | 1 average | 24090.557000000000 median | 24090.557 -[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- query | with total as ( select count ? as total from pglog.pglog ) select error_severity, count ? as nb, round(count ? /total::numeric ? 2) as percentage from pglog.pglog, total group by error_severity, total order by percentage desc; count | 2 average | 23530.949500000000 median | 23210.202 -[ RECORD 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- query | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ? count | 173009 average | 2.5784063256824789 median | 1.666 -[ RECORD 5 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- query | UPDATE pgbench_tellers SET tbalance = tbalance + ? WHERE tid = ? count | 173009 average | 1.9884613054812177 median | 0.593
复制
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.