本节将带来Thanos的最后两个组件Sidecar、Ruler的介绍。
Sidecar
Sidecar是与Prometheus部署在一起的一个组件,Sidecar获取prometheus的数据供query查询,同时也将历史数据存储至对象存储。
Sidecar通过实现Prometheus的Remote Read功能,实现能够接收Prometheus的指标数据,同时每两小时会将数据同步到对象存储。
当Sidecar如果出现异常崩溃,若这两小时内的数据没有做持久化存储,那么也将会丢失这部分数据。所以使用容器部署时,将缓存目录挂载出来避免容器重启导致数据丢失。
我们先看下sidecar的常规启动参数:
sidecar--tsdb.path=/prometheus--prometheus.url=http://127.0.0.1:9090--objstore.config-file=/etc/thanos/objectstorage.yaml--web.enable-lifecycle--reloader.config-file=/etc/prometheus/config/prometheus.yaml.tmpl--reloader.config-envsubst-file=/etc/prometheus/config_out/prometheus.yaml--reloader.rule-dir=/etc/prometheus/rules/
sidecar - 以sidecar组件运行
tsdb.path - sidecar维护的时序数据库路径(缓存文件)
prometheus.url - 连接的prometheus地址
objectstore.config - 对象存储连接文件
web.enable-lifecyle - 开启热加载模式,会监听以reloader.*指定的文件,若文件发生变化则通知prometheus重新加载配置(调用/-/reload接口)
reloader.config-file - prometheus的配置文件
reloader.config-envsubst-file 输出的环境变量文件
reloader.rule-dir - prometheus的rule配置文件
我们来看一个集成至prometheus的sidecar实际的例子,一般prometheus与sidecar在一个pod内:
apiVersion: apps/v1kind: StatefulSetmetadata: name: prometheus namespace: thanos labels: app.kubernetes.io/name: thanos-prometheus-sidecarspec: serviceName: prometheus-headless podManagementPolicy: Parallel replicas: 2 selector: matchLabels: app.kubernetes.io/name: prometheus template: metadata: labels: app.kubernetes.io/name: prometheus spec: serviceAccountName: prometheus securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - prometheus topologyKey: kubernetes.io/hostname containers: - name: prometheus image: quay.io/prometheus/prometheus:v2.15.2 args: - --config.file=/etc/prometheus/config_out/prometheus.yaml - --storage.tsdb.path=/prometheus - --storage.tsdb.retention.time=10d - --web.route-prefix=/ - --web.enable-lifecycle - --storage.tsdb.no-lockfile - --storage.tsdb.min-block-duration=2h - --storage.tsdb.max-block-duration=2h - --log.level=debug ports: - containerPort: 9090 name: web protocol: TCP livenessProbe: failureThreshold: 6 httpGet: path: /-/healthy port: web scheme: HTTP periodSeconds: 5 successThreshold: 1 timeoutSeconds: 3 readinessProbe: failureThreshold: 120 httpGet: path: /-/ready port: web scheme: HTTP periodSeconds: 5 successThreshold: 1 timeoutSeconds: 3 volumeMounts: - mountPath: /etc/prometheus/config_out name: prometheus-config-out readOnly: true - mountPath: /prometheus name: prometheus-storage - mountPath: /etc/prometheus/rules name: prometheus-rules - name: thanos image: quay.io/thanos/thanos:v0.11.0 args: - sidecar - --tsdb.path=/prometheus - --prometheus.url=http://127.0.0.1:9090 - --objstore.config-file=/etc/thanos/objectstorage.yaml - --web.enable-lifecycle - --reloader.config-file=/etc/prometheus/config/prometheus.yaml.tmpl - --reloader.config-envsubst-file=/etc/prometheus/config_out/prometheus.yaml - --reloader.rule-dir=/etc/prometheus/rules/ env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name ports: - name: http-sidecar containerPort: 10902 - name: grpc containerPort: 10901 livenessProbe: httpGet: port: 10902 path: /-/healthy readinessProbe: httpGet: port: 10902 path: /-/ready volumeMounts: - name: prometheus-config-tmpl mountPath: /etc/prometheus/config - name: prometheus-config-out mountPath: /etc/prometheus/config_out - name: prometheus-rules mountPath: /etc/prometheus/rules - name: prometheus-storage mountPath: /prometheus - name: thanos-objectstorage subPath: objectstorage.yaml mountPath: /etc/thanos/objectstorage.yaml volumes: - name: prometheus-config-tmpl configMap: defaultMode: 420 name: prometheus-config-tmpl - name: prometheus-config-out emptyDir: {} - name: prometheus-rules configMap: name: prometheus-rules - name: thanos-objectstorage secret: secretName: thanos-objectstorage volumeClaimTemplates: - metadata: name: prometheus-storage labels: app.kubernetes.io/name: thanos-store spec: storageClassName: thanos-data-db accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
Ruler
Ruler可以根据Prometheus采集的指标计算出新的指标供query查询且存储至对象存储。
新计算出来的指标可以减轻查询压力,例如一组指标是由若干个指标计算出来的结果,例如:
(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"} -node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}) *100/(node_filesystem_avail_bytes {instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}+(node_filesystem_size_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}-node_filesystem_free_bytes{instance=~'$node',fstype=~"ext.*|xfs",mountpoint !~".*pod.*"}))
如果将其计算的结果存储为新的指标,那么query不用每次都去这四个指标进行计算了,只用取新计算出来的指标。同时计算出来的值也可以指定告警规则,提升ALertmanager的效率。
Ruler也是Thanos中相对独立的组件,其存在是否并不影响整体架构的主流功能,起到的是提高效率优化的作用。
Ruler的启动参数如下:
rule--grpc-address=0.0.0.0:10901--http-address=0.0.0.0:10902--rule-file=/etc/thanos/rules/*rules.yaml--objstore.config-file=/etc/thanos/objectstorage.yaml--data-dir=/var/thanos/rule--label=rule_replica="$(NAME)"--alert.label-drop="rule_replica"--query=dnssrv+_http._tcp.thanos-query.thanos.svc.cluster.local
rule-file - 告警规则路径(配置的为新计算出来的指标告警规则)
objectstore.config-file - 对象存储文件
data-dir - 缓存文件路径
label - 指定计算的指标label,避免多个rule计算出相同数据,可在query中指定去重 - label条件
alert.label-drop - 发送给alertmanager时需要丢弃的label
query指定query地址
在kubernetes中运行ruler的yalm模板如下:
apiVersion: v1kind: Servicemetadata: labels: app.kubernetes.io/name: thanos-rule name: thanos-rule namespace: thanosspec: clusterIP: None ports: - name: grpc port: 10901 targetPort: grpc - name: http port: 10902 targetPort: http selector: app.kubernetes.io/name: thanos-rule---apiVersion: apps/v1kind: StatefulSetmetadata: labels: app.kubernetes.io/name: thanos-rule name: thanos-rule namespace: thanosspec: replicas: 2 selector: matchLabels: app.kubernetes.io/name: thanos-rule serviceName: thanos-rule podManagementPolicy: Parallel template: metadata: labels: app.kubernetes.io/name: thanos-rule spec: containers: - args: - rule - --grpc-address=0.0.0.0:10901 - --http-address=0.0.0.0:10902 - --rule-file=/etc/thanos/rules/*rules.yaml - --objstore.config-file=/etc/thanos/objectstorage.yaml - --data-dir=/var/thanos/rule - --label=rule_replica="$(NAME)" - --alert.label-drop="rule_replica" - --query=dnssrv+_http._tcp.thanos-query.thanos.svc.cluster.local env: - name: NAME valueFrom: fieldRef: fieldPath: metadata.name image: thanosio/thanos:v0.11.0 livenessProbe: failureThreshold: 24 httpGet: path: /-/healthy port: 10902 scheme: HTTP periodSeconds: 5 name: thanos-rule ports: - containerPort: 10901 name: grpc - containerPort: 10902 name: http readinessProbe: failureThreshold: 18 httpGet: path: /-/ready port: 10902 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 5 terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /var/thanos/rule name: data readOnly: false - name: thanos-objectstorage subPath: objectstorage.yaml mountPath: /etc/thanos/objectstorage.yaml - name: thanos-rules mountPath: /etc/thanos/rules volumes: - name: thanos-objectstorage secret: secretName: thanos-objectstorage - name: thanos-rules configMap: name: thanos-rules volumeClaimTemplates: - metadata: labels: app.kubernetes.io/name: thanos-rule name: data spec: storageClassName: thanos-data-db accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
到这里,Thanos所有组件的介绍,以及对应运行方式就到这里高一段落了。 Thanos框架提供了一个不侵入Prometheus的高可用架构。熟练掌握每个组件,可以很好的在原有的Prometheus集群中扩展使用。
我在使用Receive模式采集Prometheus的指标数据时,发现remote_write接口还不支持接口鉴权,在公网下时无法使用该模式的。接下来我将尝试对receiver代码进行二次开发,使remote_write接口支持接口鉴权。后续文章中也会带来这部分的讲解。




