prometheus监控k8s集群系列之cadvisor篇

一粒菜鸟 2021-09-27

5332

简介

cAdvisor可以对Node机器上的资源及容器进行实时监控和性能数据采集，包括CPU、内存使用情况、网络吞吐量及文件系统使用情况，1.7.3版本以前，cadvisor的metrics数据集成在kubelet的metrics中，在1.7.3以后版本中cadvisor的metrics被从kubelet的metrics独立出来了，每个Node机器上都会有一个aAdvisor对这台机器进行监控。

操作

从简介中我们可以知道，在k8s集群中，每个node节点都会有一个cAdvisor对当前主机进行监控。

# prometheus对接k8s的服务发现
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

从Prometheus的官方文档中，我们可以知道，Prometheus支持对k8s的自动服务发现，支持以下几个角色

- node

- service

- pod

- endpoints

- ingress

新版本的标准配置，kubelet中的cadvisor是没有对外开放4194端口的。所以，我们只能通过apiserver提供的api做代理获取监控指标。

cAdvisor的metrics地址: /api/v1/nodes/[节点名称]/proxy/metrics/cadvisor

metrics地址中的节点名称参数可以通过prometheus的kubernetes_sd_config中的node角色自动发现k8s集群中的所有node节点

抓取任务配置
- job_name: 'kubernetes-cadvisor'
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    target_label: __metrics_path__
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_node_name]
    action: replace
    target_label: node
  - source_labels: [__meta_kubernetes_node_label_node]
    action: replace
    target_label: node_name

将以上配置添加到configmap prometheus-config中，详见“k8s环境下搭建prometheus”一文,更新configmap和pod。

## 更新configmap
kubectl create configmap prometheus-config --from-file=prometheus.yaml -n monitoring -o yaml --dry-run | kubectl replace -f -


## 更新pod
kubectl apply -f prometheus-deploy.yaml


## 热更新配置
curl -XPOST http://localhost:30090/-/reload

页面查看prometheus，可以看到相应的metrics。

常用指标

#cpu使用时间
container_cpu_usage_seconds_total
#分配cpu limit数
container_spec_cpu_quota
#内存使用量
container_memory_usage_bytes
#分配内存 limit数
container_spec_memory_limit_bytes
#网络接收流量
container_network_receive_bytes_total
#网路发送流量
container_network_transmit_bytes_total
#磁盘使用量
container_fs_usage_bytes
#磁盘写
container_fs_writes_bytes_total
#磁盘读
container_fs_reads_bytes_total

基本查询语句

## 容器cpu使用率
sum by(pod_name, namespace) (rate(container_cpu_usage_seconds_total{image!=""}[5m]))  (sum by(pod_name, namespace) (container_spec_cpu_quota{image!=""}  100000)) * 100


## 容器内存使用率
sum by(pod_name, namespace) (container_memory_rss{image!=""})  sum by(pod_name, namespace) (container_spec_memory_limit_bytes{image!=""}) * 100 != +Inf


## 磁盘使用量
sum by(pod_name, namespace) (container_fs_usage_bytes{image!=""}) / 1024 / 1024 / 1024

一粒菜鸟

程序猿的非硬核技术文档和崩溃日常

数据库

文章转载自一粒菜鸟，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

prometheus监控k8s集群系列之cadvisor篇

评论