暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

使用普罗米修斯(Prometheus)对Vertica数据库进行监控

原创 simonchiang 2022-12-01
1572

使用普罗米修斯(Prometheus)与对Vertica数据库进行监控

工具介绍

Prometheus

Prometheus是一个开源的面向时间序列的系统监控和警报工具包。 Prometheus 从检测作业中抓取指标。 它在本地存储所有抓取的数据(样本),并提供一个 Web 界面以表格或图形视图呈现数据。 可以对这些数据运行一些可选规则,以从现有数据聚合和记录新的时间序列或生成警报。 Grafana 或其他 API 消费者可用于可视化收集的数据。
下载地址:官网地址

Vertica-prometheus-exporter

Vertica Prometheus Exporter是一个配置驱动的导出器,它公开从 Vertica 数据库收集的指标供 Prometheus 监控系统使用,以及支持 Prometheus 作为数据源的工具。
下载地址:GitHub

配置过程

本文仅对配置过程进行简单描述,帮助首次使用的用户了解配置过程。

环境准备

  • Vertica数据库(单节点或者多节点)
  • Prometheus:prometheus-2.40.4.linux-386.tar.gz
  • vertica-prometheus-exporter:vertica-prometheus-exporter-v1.0.2.linux-amd64.tar.gz

Vertica-prometheus-exporter配置

1.上传tar包到服务器并解压

[dbadmin@szxtsp104 tmp]$ pwd /tmp [dbadmin@szxtsp104 tmp]$ tar -zxvf vertica-prometheus-exporter-v1.0.2.linux-amd64.tar.gz

2.修改vertica-prometheus-exporter.yml配置文件

[dbadmin@szxtsp104 tmp]$ vi vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics/vertica-prometheus-exporter.yml global: # Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first. # # Must be strictly positive. The default is 500ms. scrape_timeout_offset: 500ms # Minimum interval between collector runs: by default (0s) collectors are executed on every scrape. min_interval: 10s # Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections, # as will concurrent scrapes. max_connections: 3 # Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should # always be the same as max_connections. max_idle_connections: 3 # Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. # If 0, connections are not closed due to a connection's age. max_connection_lifetime: 5m # The target to monitor and the collectors to execute on it. target: # Data source name always has a URI schema that matches the driver name. In some cases (e.g. vertica) # the schema gets dropped or replaced to match the driver expected DSN format. # data_source_name: 'vertica://<username>:<userpwd>@<exporterhostip>:5433/<databasename>' data_source_name: 'vertica://dbadmin:vertica@10.4.56.104:5433/vmart' #github auto test configration # Collectors (referenced by name) to execute on the target. collectors: [example ,example1] # Collector files specifies a list of globs. One collector definition is read from each matching file. collector_files: # - "*.collector.yml" - "*.collector.yml" Log: retention_day: 15 # Any integer value which represents days . max_log_filesize: 500 # Any integer value which represents log file size in megabytes

数据库连接配置项参数:

data_source_name: 'vertica://<username>:<userpwd>@<exporterhostip>:5433/<databasename>'

监控口径文件配置项参数:

collector_files: - "*.collector.yml"

该配置文件中其余参数可查看文档按需求进行调整。

3.监控口径配置文件

在/tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics目录下缺省存在两个监控口径配置文件,可以在这两个配置文件上进行修改,或者按照模版格式新建配置文件。

[dbadmin@szxtsp104 metrics]$ cat /tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics/vertica-example1.collector.yml collector_name: example1 metrics: - metric_name: vertica_connections_per_node type: gauge help: 'Connections per node' key_labels: - node_name values: [totalconns] query: | SELECT /*+ LABEL(exporter_vertica_global_status_connections_per_node) */ node_name , count(*) totalconns FROM v_monitor.sessions s GROUP BY node_name ORDER BY node_name; - metric_name: vertica_query_requests_transactions_count_per_node type: gauge help: 'Running transactions per node' key_labels: - node_name values: [total] query: | SELECT /*+ LABEL(exporter_vertica_query_requests_transactions_count_per_node) */ node_name , count(*) total FROM transactions WHERE start_timestamp between date_trunc('minute',sysdate) - '1 minutes'::interval and date_trunc('minute',sysdate) - '1 milliseconds'::interval GROUP BY node_name ORDER BY node_name; - metric_name: vertica_cpu_usage_pct type: gauge help: 'vertica cpu usage percentage' key_labels: - node_name values: [avg_cpu_usage_pct] query_ref: vertica_system_resources - metric_name: vertica_mem_usage_pct type: gauge help: 'vertica memory usage percentage' key_labels: - node_name values: [avg_mem_usage_pct] query_ref: vertica_system_resources - metric_name: vertica_net_rx_bytespersec type: gauge help: 'Vertica Network Receive bps' key_labels: - node_name values: [net_rx_bps] query_ref: vertica_system_resources - metric_name: vertica_net_tx_bytespersec type: gauge help: 'Vertica Network Transmit bps' key_labels: - node_name values: [net_tx_bps] query_ref: vertica_system_resources - metric_name: vertica_io_read_bytespersec type: gauge help: 'Vertica IO Read bps' key_labels: - node_name values: [io_read_bps] query_ref: vertica_system_resources - metric_name: vertica_io_write_bytespersec type: gauge help: 'Vertica IO Writes bps' key_labels: - node_name values: [io_write_bps] query_ref: vertica_system_resources queries: - query_name: vertica_system_resources query: | select node_name, ROUND(max(average_cpu_usage_percent)) as avg_cpu_usage_pct, ROUND(max(average_memory_usage_percent)) as avg_mem_usage_pct, CAST(max(net_rx_kbytes_per_second)*1024 as INTEGER) as net_rx_bps, CAST(max(net_tx_kbytes_per_second)*1024 as INTEGER) as net_tx_bps, CAST(max(io_read_kbytes_per_second)*1024 as INTEGER) as io_read_bps, CAST(max(io_written_kbytes_per_second)*1024 as INTEGER) as io_write_bps from system_resource_usage group by node_name, end_time order by end_time desc limit 1;
[dbadmin@szxtsp104 metrics]$ cat /tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics/vertica-example.collector.yml collector_name: example # min_interval: 0s metrics: - metric_name: vertica_license_size type: gauge help: "Total License size in MB" values: [licsz] query: | select /*+ LABEL(exporter_vertica_license_size MB) */ (license_size_bytes/1000000)::INTEGER as licsz from license_audits where audited_data='Total' order by audit_end_timestamp desc limit 1; - metric_name: vertica_database_size type: gauge help: "Total Database size in MB" values: [ttldbsz] query: | select /*+ LABEL(exporter_vertica_total_database_size) */ (database_size_bytes/1000000)::INTEGER as ttldbsz from license_audits where audited_data='Total' order by audit_end_timestamp desc limit 1; - metric_name: vertica_total_database_rows type: gauge help: "Total Rows in Database from projection_storage table." values: [ttlrows] query: | select /*+ LABEL(exporter_vertica_total_projection_rows) */ sum(row_count) as ttlrows from projection_storage; - metric_name: vertica_total_database_connections type: gauge help: "Total Database Connections from sessions table." values: [ttlconns] query: | select /*+ LABEL(exporter_vertica_total_database_connections) */ count(*) as ttlconns from sessions; - metric_name: vertica_state_not_up_or_standby type: counter help: "Nodes with state of other than UP or STANDBY." values: [down] query: | select count(*) as down from nodes where node_state!='UP' and node_state!='Standby';

4.启动Vertica-prometheus-exporter

nohup /tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/vertica-prometheus-exporter --config.file /tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics/vertica-prometheus-exporter.yml &

启动后通过nohup.out日志检查该程序的端口:

[dbadmin@szxtsp104 vertica-prometheus-exporter-v1.0.2.linux-amd64]$ grep 'Listening on' nohup.out time="2022-12-01T11:32:10+08:00" level=info msg="Listening on :9968"

Prometheus配置

1.上传tar包到服务器并解压

[dbadmin@szxtsp104 tmp]$ pwd /tmp [dbadmin@szxtsp104 tmp]$ tar -zxvf prometheus-2.40.4.linux-386.tar.gz

2.修改prometheus.yml配置文件

[dbadmin@szxtsp104 prometheus-2.40.4.linux-386]$ vi /tmp/prometheus-2.40.4.linux-386/prometheus.yml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["10.4.56.104:9968"]

配置prometheus数据源:

static_configs: - targets: ["10.4.56.104:9968"]

配置上一步中设置的Vertica-prometheus-exporter的IP以及端口。

该配置文件中其余参数可查看文档按需求进行调整。

3.启动prometheus

[dbadmin@szxtsp104 prometheus-2.40.4.linux-386]$ nohup /tmp/prometheus-2.40.4.linux-386/prometheus --config.file=/tmp/prometheus-2.40.4.linux-386/prometheus.yml &

访问Prometheus

配置完成后,可以通过浏览器登录Prometheus界面,http://10.4.56.104:9090/graph,将IP改为启动Prometheus的服务器IP。

指标查询

通过页面的Graph即可选择不同的指标进行展示。
image.png

二维码.png

最后修改时间:2024-08-29 10:49:03
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论