在Prometheus系列4 - 高可用集群thanos一文中向大家介绍了基于 Prometheus 的高可用集群方案thanos。大家对thanos的架构有着一定的了解后,这一节开始将深入讲解每个组件的作用,以及其启动参数的含义讲解。同时,也提供一套在k8s中运行的yaml文件模板。
本小节将会带来store、receive两个组件的讲解。
Minio
thanos可使用对象存储将指标数据做持久化存储,我们先部署一个minio服务做对象存储。部署方式可参考 https://docs.min.io/docs/minio-client-quickstart-guide,在这里就不再讲解。
而在thanos中与对象存储直接交互的组件有:
sidecar - 将prometheus采集的指标写入对象存储;
receive - 将从prometheus上报的数据写入对象存储;
store - query通过store在对象存储中查询指标数据;
compact - 将对象存储里的指标数据压缩处理;
rule - 将新生成的指标数据存储至对象存储。
这些组件通过读取定义的存储文件配置访问Minio,该存储文件的示例如下:
// bucket_config.yaml
type: s3
config:
bucket: thanos
endpoint: 10.6.110.11:9000
access_key: admin
secret_key: 12345678 #minio的密码8位以上
insecure: true
复制
而若部署在k8s中,我们可以定义一个该配置文件的secret,其他组件读取该secret即可:
apiVersion: v1
kind: Secret
metadata:
name: thanos-objectstorage
namespace: thanos
type: Opaque
stringData:
objectstorage.yaml: |
type: s3
config:
bucket: thanos
endpoint: 10.6.110.11:9000
access_key: admin
secret_key: 12345678
insecure: true
复制
Store
store是供query从对象存储中查询历史指标数据的一个组件。store通过上述定义的bucket_config.yaml配置连接至对象存储。
其启动参数如下:
store
--data-dir=/var/thanos/store
--grpc-address=0.0.0.0:10901
--http-address=0.0.0.0:10902
--objstore.config-file=/etc/thanos/objectstorage.yaml
复制
store - 以store组件运行
--data-dir - 指定缓存文件的目录
--grpc-address - 指定grpc服务的启动端口
--http-address - 指定http服务的启动端口
--objstore.config-file - 指定对象存储配置文件路径
在k8s中可定义StoregeClass,使用动态绑定机制生成PVC作为缓存文件的目录:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: thanos-data-db
provisioner: fuseim.pri/ifs
parameters:
archiveOnDelete: "false"
复制
以Statefulset控制器运行store副本:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-store
namespace: thanos
labels:
app.kubernetes.io/name: thanos-store
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: thanos-store
serviceName: thanos-store
podManagementPolicy: Parallel
template:
metadata:
labels:
app.kubernetes.io/name: thanos-store
spec:
containers:
- args:
- store
- --log.level=debug
- --data-dir=/var/thanos/store
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config-file=/etc/thanos/objectstorage.yaml
#- --experimental.enable-index-header
image: registry-dev.uihcloud.cn/library/thanos/thanos:v0.21.1
livenessProbe:
failureThreshold: 8
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-store
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: var/thanos/store
name: data
readOnly: false
- name: thanos-objectstorage
subPath: objectstorage.yaml
mountPath: /etc/thanos/objectstorage.yaml
terminationGracePeriodSeconds: 120
volumes:
- name: thanos-objectstorage
secret:
secretName: thanos-objectstorage
volumeClaimTemplates:
- metadata:
labels:
app.kubernetes.io/name: thanos-store
name: data
spec:
storageClassName: thanos-data-db
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
复制
同时定义一个service,供query在集群内可以访问到store:
apiVersion: v1
kind: Service
metadata:
name: thanos-store
namespace: thanos
labels:
app.kubernetes.io/name: thanos-store
spec:
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: 10901
- name: http
port: 10902
targetPort: 10902
selector:
app.kubernetes.io/name: thanos-store
复制
Receive
在理解Receive工作机制之前,我们需要先了解以remote_write、租户这两个概念。
remote_write
prometheus通过remote write机制,将采集到的指标数据以hook机制发送出去,在prometheus的配置文件中增加配置指定hook地址。而这里的hook地址正是receive提供的http接口。
remote_write:
- url: http://10.6.118.123:32291/api/v1/receive
复制
租户
receive会集成许多个prometheus(集群)上传上来的指标,每一个prometheus(集群)认为就是一个租户。
由于receive将会收集多个租户的指标数据,那么receive必然是需要支持可集群扩展的。在定义receive集群中,集群的hash配置文件起着关键的作用。我们来结合一个hash文件的样例来了解该配置文件:
[
{
"hashring":"default",
"endpoints":[
"thanos-receive-0.thanos-receive.thanos.svc.cluster.local:10901",
"thanos-receive-1.thanos-receive.thanos.svc.cluster.local:10901"
]
},
{
"hashring":"hashring-0",
"endpoints":[
"thanos-receive-2.thanos-receive.thanos.svc.cluster.local:10901"
],
"tenants":[
"tenant-a"
]
},
{
"hashring":"hashring-1",
"endpoints":[
"thanos-receive-3.thanos-receive.thanos.svc.cluster.local:10901"
],
"tenants":[
"tenant-b"
]
}
]
复制
该json文件指出,receive集群一共运行了4个副本:thanos-receive-0、thanos-receive-1、thanos-receive-2、thanos-receive-3;同时指定租户tenant-a通过 thanos-receive-2收集指标,tenant-b通过 thanos-receive-3收集指标,其他的租户通过thanos-receive-0、thanos-receive-1收集指标。
在启动receive之前,可以将该配置通过configMap设置到kubernetes中。
apiVersion: v1
kind: ConfigMap
metadata:
name: thanos-receive-hashrings
namespace: thanos
data:
thanos-receive-hashrings.json: |
[
{
"hashring":"default",
"endpoints":[
"thanos-receive-0.thanos-receive.thanos.svc.cluster.local:10901",
"thanos-receive-1.thanos-receive.thanos.svc.cluster.local:10901"
]
},
{
"hashring":"hashring-0",
"endpoints":[
"thanos-receive-2.thanos-receive.thanos.svc.cluster.local:10901"
],
"tenants":[
"tenant-a"
]
},
{
"hashring":"hashring-1",
"endpoints":[
"thanos-receive-3.thanos-receive.thanos.svc.cluster.local:10901"
],
"tenants":[
"tenant-b"
]
}
]
复制
我们来看下启动receive需要指定的参数:
receive
--receive.replication-factor=1
--grpc-address=0.0.0.0:10901
--http-address=0.0.0.0:10902
--remote-write.address=0.0.0.0:19291
--objstore.config-file=/etc/thanos/objectstorage.yaml
--tsdb.path=/var/thanos/receive
--tsdb.retention=12h
--label=receive_replica="$(NAME)"
--label=receive="true"
--receive.hashrings-file=/etc/thanos/thanos-receive-hashrings.json
--receive.local-endpoint="$(NAME).thanos-receive.thanos.svc.cluster.local:10901"
复制
各参数的含义:
receive - 以receive组件运行
--receive.replication-factor - 采集到的指标备份的数量,若配置大于1则会在多个receive实例中存储相同的一份指标数据
--grpc-address - grpc服务的端口
--http-address - http服务的端口
--remote-write.address - remote_write的接口端口
--objstore.config-file - 对象存储配置文件路径
--tsdb.path - 临时文件暂存路径
--tsdb.retention - 多长时间清理一次临时文件
--label=receive_replica - 当前副本处理的数据需要增加的label
--receive.hashrings-file - 集群配置文件的路径
--receive.local-endpoint- 当前副本在集群配置文件中的地址,在集群文件中解析成当前集群。
同样的,我们使用StatefulSet控制器运行receiver副本。
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: thanos-receive
tenant: default-tenant
controller.receive.thanos.io: thanos-receive-controller
controller.receive.thanos.io/hashring: default
part-of: thanos
name: thanos-receive
namespace: thanos
spec:
replicas: 4
selector:
matchLabels:
app: thanos-receive
tenant: default-tenant
controller.receive.thanos.io: thanos-receive-controller
controller.receive.thanos.io/hashring: default
part-of: thanos
serviceName: thanos-receive
template:
metadata:
labels:
app: thanos-receive
tenant: default-tenant
controller.receive.thanos.io: thanos-receive-controller
controller.receive.thanos.io/hashring: default
part-of: thanos
spec:
affinity: {}
containers:
- args:
- receive
- --receive.replication-factor=1
- --objstore.config=$(OBJSTORE_CONFIG)
- --tsdb.path=/var/thanos/receive
- --label=receive_replica="$(NAME)"
- --receive.local-endpoint=$(NAME).thanos-receive.$(NAMESPACE).svc.cluster.local:10901
- --tsdb.retention=15d
- --receive.hashrings-file=/etc/thanos/thanos-receive-hashrings.json
env:
- name: NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: objectstorage.yaml
name: thanos-objectstorage
image: registry-dev.uihcloud.cn/library/thanos/thanos:v0.22.0
livenessProbe:
failureThreshold: 8
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-receive
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
- containerPort: 19291
name: remote-write
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: var/thanos/receive
name: data
readOnly: false
- mountPath: etc/thanos/thanos-receive-hashrings.json
name: thanos-receive-hashrings
subPath: thanos-receive-hashrings.json
terminationGracePeriodSeconds: 900
volumeClaimTemplates:
- metadata:
labels:
app.kubernetes.io/name: thanos-receive
name: data
spec:
storageClassName: thanos-receiver-data-db
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
复制
定义service供集群内访问:
apiVersion: v1
kind: Service
metadata:
labels:
app: thanos-receive
tenant: default-tenant
controller.receive.thanos.io/hashring: default
part-of: thanos
name: thanos-receive
namespace: thanos
spec:
clusterIP: None
ports:
- name: grpc
port: 10901
protocol: TCP
targetPort: 10901
- name: http
port: 10902
protocol: TCP
targetPort: 10902
- name: remote-write
port: 19291
targetPort: 19291
protocol: TCP
selector:
app: thanos-receive
tenant: default-tenant
controller.receive.thanos.io: thanos-receive-controller
controller.receive.thanos.io/hashring: default
part-of: thanos
复制
同时,如果有需要在集群外访问(或许receive的上游prometheus不在一个集群内,设置在不同的局域网内),定义receive的供集群外部访问的端口:
apiVersion: v1
kind: Service
metadata:
labels:
app: thanos-receive
tenant: default-tenant
controller.receive.thanos.io/hashring: default
part-of: thanos
name: thanos-receive-node
namespace: thanos
spec:
type: NodePort
ports:
- name: grpc
port: 10901
protocol: TCP
targetPort: 10901
- name: http
port: 10902
protocol: TCP
targetPort: 10902
- name: remote-write
port: 19291
targetPort: 19291
protocol: TCP
nodePort: 32291
selector:
app: thanos-receive
tenant: default-tenant
controller.receive.thanos.io: thanos-receive-controller
controller.receive.thanos.io/hashring: default
part-of: thanos
复制
这样就提供了一个可供query访问、也可供集群外部访问的receive组件。
receive组件的负载是在服务内部,自己处理的。当prometheus上传指标时,通过service任意访问到某个副本。该副本根据携带的租户信息判断是否是该当前副本处理,如果不是则会根据hash.json文件的定义将数据转发给对应的副本进行处理。