暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

使用Grafana+Prometheus搭建PostgreSQL监控系统

原创 贺晓群 2021-08-06
3528

PostgreSQL的监控方案很多,这里介绍一个比较炫酷的监控方案(Grafana+Prometheus+Alertmanager+pgSCV),这些组件的介绍可自行百度,此文不作介绍,本文重点介绍如何安装和整合这些组件。

下图以Prometheus为核心的一个系统监控架构图(图片来自Prometheus官网)
架构图.png

服务器列表:

节点名 IP 操作系统 安装软件 备注
pg_node1 192.168.210.15 CentOS 7.6 Prometheus_2.28.1/grafana_8.1.0/pgscv_0.7.1/node_exporter_1.2.1/alertmanager-0.22.2 部署监控服务,被监控节点
pg_node2 192.168.210.81 CentOS 7.6 pgscv_0.7.1/node_exporter-1.2.1 被监控节点
pg_node3 192.168.210.33 CentOS 7.6 pgscv_0.7.1/node_exporter-1.2.1 被监控节点

软件安装目录说明

软件 安装主机 安装目录
grafana 192.168.210.15 /data/monitor/grafana
prometheus 192.168.210.15 /data/monitor/prometheus
alertmanager 192.168.210.15 /data/monitor/prometheus/alertmanager
node_exporter 192.168.210.15/192.168.210.81/192.168.210.33 /data/monitor/prometheus/plugin/node_exporter
pgscv 192.168.210.15/192.168.210.81/192.168.210.33 /data/monitor/prometheus/plugin/postgres_exporter

主机配置

关闭selinux(所有节点)

sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config setenforce 0
复制

配置防火墙

根据节点部署的软件开放对应的端口。

  • grafana:3000
  • prometheus:9090
  • node_exporter:9100
  • pgscv:9890
  • alertmanager:465/9093
firewall-cmd --add-port=9090/tcp --permanent firewall-cmd --add-port=3000/tcp --permanent firewall-cmd --add-port=9100/tcp --permanent firewall-cmd --add-port=9093/tcp --permanent firewall-cmd --add-port=465/tcp --permanent firewall-cmd --add-port=9890/tcp --permanent firewall-cmd --reload firewall-cmd --list-all
复制

安装Grafana

mkdir -p /data/monitor/
wget https://dl.grafana.com/oss/release/grafana-8.1.0.linux-amd64.tar.gz
tar xf grafana-8.1.0.linux-amd64.tar.gz
mv grafana-8.1.0 grafana
rm -f grafana-8.1.0.linux-amd64.tar.gz
mkdir -p /data/monitor/grafana/data/log

#修改相关参数
vi grafana/conf/defaults.ini
data = /data/monitor/grafana/data
logs = /data/monitor/grafana/data/log
plugins = /var/lib/grafana/plugins
provisioning = /data/monitor/grafana/conf/provisioning

#配置启动文件
cat >> /usr/lib/systemd/system/grafana-server.service <<EOF
[Unit]
Description=Grafana instance
Documentation=http://docs.grafana.org
Wants=network-online.target
After=network-online.target
#After=postgresql.service

[Service]
User=root
Group=root
Type=simple
Restart=on-failure
WorkingDirectory=/data/monitor/grafana
RuntimeDirectory=grafana
RuntimeDirectoryMode=0750
ExecStart=/data/monitor/grafana/bin/grafana-server --config=/data/monitor/grafana/conf/defaults.ini
LimitNOFILE=10000
TimeoutStopSec=20

[Install]
WantedBy=multi-user.target
EOF

#启动
systemctl daemon-reload
systemctl enable grafana-server
systemctl start grafana-server

#安装插件(dashboards会用到)
./grafana/bin/grafana-cli plugins install digiapulssi-breadcrumb-panel
./grafana/bin/grafana-cli plugins install grafana-polystat-panel
./grafana/bin/grafana-cli plugins install yesoreyeram-boomtable-panel

#登录界面(默认账号密码admin/admin,初始登录会提示修改密码)
http://192.168.210.15:3000
复制

grafana登录.png

安装Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gz tar xf prometheus-2.28.1.linux-amd64.tar.gz mv prometheus-2.28.1.linux-amd64 prometheus rm -f prometheus-2.28.1.linux-amd64.tar.gz #修改相关参数 cd prometheus vi prometheus.yml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 192.168.210.15:9093 #Alertmanager访问信息 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "rules/*.yml" #配置报警规则 # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: - file_sd_configs: # 注意,如果指定从某配置文件加载监控目标,则在prometheus启动之前需要确保该文件在prometheus的工作目录下事先存在,否则可能后续配置过程中出现报错 - files: - host.yml #下面会创建此文件 job_name: Host metrics_path: /metrics relabel_configs: - source_labels: [__address__] regex: (.*) target_label: instance replacement: $1 - source_labels: [__address__] regex: (.*) target_label: __address__ replacement: $1:9100 - file_sd_configs: # 注意,如果指定从某配置文件加载监控目标,则在prometheus启动之前需要确保该文件在prometheus的工作目录下事先存在,否则可能后续>配置过程中出现报错 - files: - pgscv.yml #下面会创建此文件 job_name: pgscv metrics_path: /metrics relabel_configs: - source_labels: [__address__] regex: (.*) target_label: instance replacement: $1 - source_labels: [__address__] regex: (.*) target_label: __address__ replacement: $1:9890 - job_name: promethus static_configs: - targets: - localhost:9090 #配置主机监控 vi host.yml - labels: node_name: pg_node1 service_name: pg_node1 targets: - 192.168.210.15 - labels: node_name: pg_node2 service_name: pg_node2 targets: - 192.168.210.81 - labels: node_name: pg_node3 service_name: pg_node3 targets: - 192.168.210.33 #配置PG监控 vi pgscv.yml - labels: node_name: pg1 service_name: pg1 targets: - 192.168.210.15 - labels: node_name: pg2 service_name: pg2 targets: - 192.168.210.81 - labels: node_name: pg3 service_name: pg3 targets: - 192.168.210.33 #配置启动文件 cat >> /usr/lib/systemd/system/prometheus.service <<EOF [Unit] Description=Prometheus instance Wants=network-online.target After=network-online.target #After=postgresql.service [Service] User=root Group=root Type=simple Restart=on-failure WorkingDirectory=/data/monitor/prometheus RuntimeDirectory=prometheus RuntimeDirectoryMode=0750 ExecStart=/data/monitor/prometheus/prometheus --storage.tsdb.retention=30d --web.enable-lifecycle --web.enable-admin-api --config.file=/data/monitor/prometheus/prometheus.yml LimitNOFILE=10000 TimeoutStopSec=20 [Install] WantedBy=multi-user.target EOF #暂不启动,等数据采集器和告警组件部署后再启动 systemctl daemon-reload systemctl enable prometheus #systemctl start prometheus
复制

安装Alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz tar xf alertmanager-0.22.2.linux-amd64.tar.gz mv alertmanager-0.22.2.linux-amd64 alertmanager rm -f alertmanager-0.22.2.linux-amd64.tar.gz #配置参数 vi alertmanager/alertmanager.yml global: resolve_timeout: 5m #处理超时时间,默认为5min smtp_smarthost: 'smtp.163.com:465' smtp_from: 'test@163.com' #邮件发送地址 smtp_auth_username: 'test@163.com' #邮件发送地址用户名 smtp_auth_password: 'HTBPT***********' #邮件发送地址授权码 smtp_require_tls: false route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'default' receivers: - name: 'default' email_configs: - to: 'joan_he@189.cn' send_resolved: true #配置告警规则(上文的prometheus.yml中用到),这里是一条告警样例 vi rules/memory_over.yml groups: - name: example rules: - alert: 主机内存超限 expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes))) * 100 > 80 for: 1m labels: severity: warning annotations: summary: "{{$labels.instance}}: 主机内存使用超过告警限制" description: "{{$labels.instance}}: 内存使用率超过80% (当前值是:{{ $value }})" #启动 nohup ./alertmanager --config.file=alertmanager.yml &
复制

#web查看
http://192.168.210.15:9093
告警首页.png

安装数据采集器

#安装主机监控数据采集器node_exporter cd /data/monitor/prometheus/plugin wget https://github.com/prometheus/node_exporter/releases/download/v1.2.1/node_exporter-1.2.1.linux-amd64.tar.gz tar xf node_exporter-1.2.1.linux-amd64.tar.gz mv node_exporter-1.2.1.linux-amd64 node_exporter #启动 nohup node_exporter/node_exporter & #安装PG监控数据采集器pgSCV wget https://github.com/weaponry/pgscv/releases/download/0.7.1/pgscv_0.7.1_linux_amd64.tar.gz tar xf pgscv_0.7.1_linux_amd64.tar.gz -C postgres_exporter/ #添加环境变量,也可以在启动是指定参数文件--config-file=,这样还可以在配置文件中添加自定义指标 cat >> /etc/profile <<EOF export PGSCV_LISTEN_ADDRESS="0.0.0.0:9890" export POSTGRES_DSN="postgresql://db_user:password@127.0.0.1:5432/postgres" EOF source /etc/profile #启动 nohup ./postgres_exporter/pgscv &
复制

#查看node_exporter采集的指标数据
http://192.168.210.15:9100/metrics
采集数据1.png

#查看pgscv采集的指标数据
http://192.168.210.15:9890/metrics
采集数据2.png

启动Prometheus

systemctl start prometheus

#WEB查看Prometheus
http://192.168.210.15:9090
p1.png
111.png
p2.png

配置Grafana

  • 配置数据源
    1.png
    2.png
    3.png
    注意:如果你的dashboards是从线下导入的json,数据源的Name需要和你json模板中的datasource一致

  • 在线导入dashboards
    4.png
    导入主机监控dashboards:https://grafana.com/grafana/dashboards/8919
    5.png
    6.png
    按照上面步骤再导入PG监控的dashboards(https://grafana.com/grafana/dashboards/14540)
    另外https://github.com/percona/grafana-dashboards/releases中还有丰富的dashboards,可根据自己的需要导入,及安装对应的采集器。当然也可以自己定义面板,这个要求比较高,可以拷贝一份模板出来修改。

查看主机监控面板

10.png
11.png
12.png
13.png

查看PG监控面板,关注的指标基本都包括了

19.png
15.png
16.png
17.png
18.png

查看告警邮件(把阀值调低,触发告警)

20.png

#参考文章
https://grafana.com/grafana/dashboards
https://github.com/weaponry/pgscv/wiki
https://github.com/prometheus/prometheus
https://www.cnblogs.com/ilifeilong/p/10543876.html

最后修改时间:2021-08-06 17:53:20
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

目录
  • 服务器列表:
  • 软件安装目录说明
  • 主机配置
    • 关闭selinux(所有节点)
    • 配置防火墙
  • 安装Grafana
  • 安装Prometheus
  • 安装Alertmanager
  • 安装数据采集器
  • 启动Prometheus
  • 配置Grafana
  • 查看主机监控面板
  • 查看PG监控面板,关注的指标基本都包括了
  • 查看告警邮件(把阀值调低,触发告警)