使用普罗米修斯（Prometheus）对Vertica数据库进行监控

原创 simonchiang 2022-12-01

1572

使用普罗米修斯（Prometheus）与对Vertica数据库进行监控

工具介绍

Prometheus

Prometheus是一个开源的面向时间序列的系统监控和警报工具包。 Prometheus 从检测作业中抓取指标。它在本地存储所有抓取的数据（样本），并提供一个 Web 界面以表格或图形视图呈现数据。可以对这些数据运行一些可选规则，以从现有数据聚合和记录新的时间序列或生成警报。 Grafana 或其他 API 消费者可用于可视化收集的数据。
下载地址：官网地址

Vertica-prometheus-exporter

Vertica Prometheus Exporter是一个配置驱动的导出器，它公开从 Vertica 数据库收集的指标供 Prometheus 监控系统使用，以及支持 Prometheus 作为数据源的工具。
下载地址：GitHub

配置过程

本文仅对配置过程进行简单描述，帮助首次使用的用户了解配置过程。

环境准备

Vertica数据库（单节点或者多节点）
Prometheus：prometheus-2.40.4.linux-386.tar.gz
vertica-prometheus-exporter：vertica-prometheus-exporter-v1.0.2.linux-amd64.tar.gz

Vertica-prometheus-exporter配置

1.上传tar包到服务器并解压

[dbadmin@szxtsp104 tmp]$ pwd
/tmp
[dbadmin@szxtsp104 tmp]$ tar -zxvf vertica-prometheus-exporter-v1.0.2.linux-amd64.tar.gz

2.修改vertica-prometheus-exporter.yml配置文件

[dbadmin@szxtsp104 tmp]$ vi vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics/vertica-prometheus-exporter.yml 
global:
  # Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
  #
  # Must be strictly positive. The default is 500ms.
  scrape_timeout_offset: 500ms
  # Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
  min_interval: 10s
   # Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
  # as will concurrent scrapes.

  max_connections: 3
  # Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
  # always be the same as max_connections.

  max_idle_connections: 3
  # Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse.
  # If 0, connections are not closed due to a connection's age.
  max_connection_lifetime: 5m



# The target to monitor and the collectors to execute on it.
target:
  # Data source name always has a URI schema that matches the driver name. In some cases (e.g. vertica)
  # the schema gets dropped or replaced to match the driver expected DSN format.
  # data_source_name: 'vertica://<username>:<userpwd>@<exporterhostip>:5433/<databasename>'
  data_source_name: 'vertica://dbadmin:vertica@10.4.56.104:5433/vmart' #github auto test configration


  # Collectors (referenced by name) to execute on the target.
  collectors: [example ,example1]

# Collector files specifies a list of globs. One collector definition is read from each matching file.
collector_files:
# - "*.collector.yml"
- "*.collector.yml"

Log:
  retention_day:  15 # Any integer value which represents days .
  max_log_filesize:  500 # Any integer value which represents log file size in  megabytes

数据库连接配置项参数：

data_source_name: 'vertica://<username>:<userpwd>@<exporterhostip>:5433/<databasename>'

监控口径文件配置项参数：

collector_files:
- "*.collector.yml"

该配置文件中其余参数可查看文档按需求进行调整。

3.监控口径配置文件

在/tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics目录下缺省存在两个监控口径配置文件，可以在这两个配置文件上进行修改，或者按照模版格式新建配置文件。

[dbadmin@szxtsp104 metrics]$ cat /tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics/vertica-example1.collector.yml 
collector_name: example1
metrics:
  - metric_name: vertica_connections_per_node
    type: gauge
    help: 'Connections per node'
    key_labels:
       - node_name
    values: [totalconns]
    query: |
        SELECT /*+ LABEL(exporter_vertica_global_status_connections_per_node) */ node_name , count(*) totalconns 
        FROM v_monitor.sessions s 
        GROUP BY node_name
        ORDER BY node_name;
  - metric_name: vertica_query_requests_transactions_count_per_node
    type: gauge
    help: 'Running transactions per node'
    key_labels:
       - node_name
    values: [total]
    query: |
       SELECT /*+ LABEL(exporter_vertica_query_requests_transactions_count_per_node) */
       node_name , count(*) total
       FROM transactions
       WHERE start_timestamp between date_trunc('minute',sysdate) - '1 minutes'::interval and date_trunc('minute',sysdate) - '1 milliseconds'::interval
       GROUP BY node_name
       ORDER BY node_name;
  - metric_name: vertica_cpu_usage_pct
    type: gauge
    help: 'vertica cpu usage percentage'
    key_labels: 
       - node_name
    values: [avg_cpu_usage_pct]
    query_ref: vertica_system_resources
  - metric_name: vertica_mem_usage_pct
    type: gauge
    help: 'vertica memory usage percentage'
    key_labels:
       - node_name
    values: [avg_mem_usage_pct]
    query_ref: vertica_system_resources
  - metric_name: vertica_net_rx_bytespersec
    type: gauge
    help: 'Vertica Network Receive bps'
    key_labels:
       - node_name
    values: [net_rx_bps]
    query_ref: vertica_system_resources
  - metric_name: vertica_net_tx_bytespersec
    type: gauge
    help: 'Vertica Network Transmit bps'
    key_labels:
       - node_name
    values: [net_tx_bps]
    query_ref: vertica_system_resources
  - metric_name: vertica_io_read_bytespersec
    type: gauge
    help: 'Vertica IO Read bps'
    key_labels:
       - node_name
    values: [io_read_bps]
    query_ref: vertica_system_resources
  - metric_name: vertica_io_write_bytespersec
    type: gauge
    help: 'Vertica IO Writes bps'
    key_labels:
       - node_name
    values: [io_write_bps]
    query_ref: vertica_system_resources

queries:
  - query_name: vertica_system_resources
    query: |
       select  
          node_name,
          ROUND(max(average_cpu_usage_percent)) as avg_cpu_usage_pct,
          ROUND(max(average_memory_usage_percent)) as avg_mem_usage_pct,
          CAST(max(net_rx_kbytes_per_second)*1024 as INTEGER) as net_rx_bps,
          CAST(max(net_tx_kbytes_per_second)*1024 as INTEGER) as net_tx_bps,
          CAST(max(io_read_kbytes_per_second)*1024 as INTEGER) as io_read_bps,
          CAST(max(io_written_kbytes_per_second)*1024 as INTEGER) as io_write_bps
       from system_resource_usage 
       group by node_name, end_time
       order by end_time desc limit 1;

[dbadmin@szxtsp104 metrics]$ cat /tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics/vertica-example.collector.yml 
collector_name: example
# min_interval: 0s
metrics:
  - metric_name: vertica_license_size
    type: gauge
    help: "Total License size in MB"
    values: [licsz]
    query: |
      select  /*+ LABEL(exporter_vertica_license_size MB) */ (license_size_bytes/1000000)::INTEGER as licsz
      from license_audits where audited_data='Total'
      order by audit_end_timestamp desc limit 1;

  - metric_name: vertica_database_size
    type: gauge
    help: "Total Database size in MB"
    values: [ttldbsz]
    query: |
      select  /*+ LABEL(exporter_vertica_total_database_size) */ (database_size_bytes/1000000)::INTEGER as ttldbsz
      from license_audits where audited_data='Total'
      order by audit_end_timestamp desc limit 1;
  - metric_name: vertica_total_database_rows
    type: gauge
    help: "Total Rows in Database from projection_storage table."
    values: [ttlrows]
    query: |
      select /*+ LABEL(exporter_vertica_total_projection_rows) */ sum(row_count) as ttlrows 
      from projection_storage;
  - metric_name: vertica_total_database_connections
    type: gauge
    help: "Total Database Connections from sessions table."
    values: [ttlconns]
    query: |
      select /*+ LABEL(exporter_vertica_total_database_connections) */ count(*) as ttlconns 
      from sessions;
  - metric_name: vertica_state_not_up_or_standby
    type: counter
    help: "Nodes with state of other than UP or STANDBY."
    values: [down]
    query: |
      select count(*) as down 
      from nodes 
      where node_state!='UP' and node_state!='Standby';

4.启动Vertica-prometheus-exporter

nohup /tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/vertica-prometheus-exporter --config.file /tmp/vertica-prometheus-exporter-v1.0.2.linux-amd64/metrics/vertica-prometheus-exporter.yml &

启动后通过nohup.out日志检查该程序的端口:

[dbadmin@szxtsp104 vertica-prometheus-exporter-v1.0.2.linux-amd64]$ grep 'Listening on' nohup.out 
time="2022-12-01T11:32:10+08:00" level=info msg="Listening on :9968"

Prometheus配置

1.上传tar包到服务器并解压

[dbadmin@szxtsp104 tmp]$ pwd
/tmp
[dbadmin@szxtsp104 tmp]$ tar -zxvf prometheus-2.40.4.linux-386.tar.gz

2.修改prometheus.yml配置文件

[dbadmin@szxtsp104 prometheus-2.40.4.linux-386]$ vi /tmp/prometheus-2.40.4.linux-386/prometheus.yml
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["10.4.56.104:9968"]

配置prometheus数据源：

    static_configs:
      - targets: ["10.4.56.104:9968"]

配置上一步中设置的Vertica-prometheus-exporter的IP以及端口。

该配置文件中其余参数可查看文档按需求进行调整。

3.启动prometheus

[dbadmin@szxtsp104 prometheus-2.40.4.linux-386]$ nohup /tmp/prometheus-2.40.4.linux-386/prometheus --config.file=/tmp/prometheus-2.40.4.linux-386/prometheus.yml &

访问Prometheus

配置完成后，可以通过浏览器登录Prometheus界面，http://10.4.56.104:9090/graph，将IP改为启动Prometheus的服务器IP。

指标查询

通过页面的Graph即可选择不同的指标进行展示。

二维码.png

vertica prometheus

最后修改时间：2024-08-29 10:49:03

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

使用普罗米修斯（Prometheus）对Vertica数据库进行监控

使用普罗米修斯（Prometheus）与对Vertica数据库进行监控

工具介绍

Prometheus

Vertica-prometheus-exporter

配置过程

环境准备

Vertica-prometheus-exporter配置

1.上传tar包到服务器并解压

2.修改vertica-prometheus-exporter.yml配置文件

3.监控口径配置文件

4.启动Vertica-prometheus-exporter

Prometheus配置

1.上传tar包到服务器并解压

2.修改prometheus.yml配置文件

3.启动prometheus

访问Prometheus

指标查询

评论