在当今快速发展的数字化世界中,Apache Kafka作为处理实时数据流的分布式系统,已经成为构建现代数据驱动应用的核心组件之一。无论是日志聚合、事件流处理还是实时分析,Kafka都扮演着不可或缺的角色。然而,随着系统规模的扩大和复杂度的增加,如何有效地监控Kafka集群,保证其高效稳定运行,成为了运维人员面临的一大挑战。一次未被及时发现的性能瓶颈或故障可能会导致数据丢失、服务中断,甚至影响到整个业务流程的正常运转。因此,建立一个全面而有效的Kafka监控体系不仅能够帮助我们及时发现问题所在,还能为优化系统性能提供重要依据。

1、通外网主机操作
# 添加chart仓库
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts --force-update
# 下载kafka-exporter chart包
$ helm pull prometheus-community/prometheus-kafka-exporter --version 2.11.0
# 下载kafka-exporter镜像
$ sudo docker pull danielqsj/kafka-exporter:v1.8.0
[sudo] password for ops:
v1.8.0: Pulling from danielqsj/kafka-exporter
9fa9226be034: Pull complete
1617e25568b2: Pull complete
423bf5c8db1e: Pull complete
Digest: sha256:9b67b273d1e54052e7ce77c46ecd1c6619fdee7e2ed26717e6272806fbc94150
Status: Downloaded newer image for danielqsj/kafka-exporter:v1.8.0
docker.io/danielqsj/kafka-exporter:v1.8.0
# 推送到私有harbor仓库
$ helm push prometheus-kafka-exporter-2.11.0.tgz oci://core.jiaxzeng.com/plugins
Pushed: core.jiaxzeng.com/plugins/prometheus-kafka-exporter:2.11.0
Digest: sha256:c2c3207989a82f2a14b4f69d9b94d4d566e3a427b394c3ebfdc9319d73aee164
$ sudo docker tag danielqsj/kafka-exporter:v1.8.0 core.jiaxzeng.com/library/kafka-exporter:v1.8.0
$ sudo docker push core.jiaxzeng.com/library/kafka-exporter:v1.8.0
The push refers to repository [core.jiaxzeng.com/library/kafka-exporter]
c82fabbecceb: Pushed
6b83872188a9: Mounted from library/monitor/alertmanager
1e604deea57d: Mounted from library/monitor/alertmanager
v1.8.0: digest: sha256:16bbe1d1647128a7060da21c36ae27b6f052bf5b8dedba0a5cb3460dee2f7b51 size: 949
1、下载chart包
$ sudo helm pull oci://core.jiaxzeng.com/plugins/prometheus-kafka-exporter --version 2.11.0 --untar --untardir etc/kubernetes/addons/ Pulled: core.jiaxzeng.com/plugins/prometheus-kafka-exporter:2.11.0
Digest: sha256:c2c3207989a82f2a14b4f69d9b94d4d566e3a427b394c3ebfdc9319d73aee164
2、部署kafka-exporter配置文件
fullnameOverride: kafka-exporter
image:
repository:core.jiaxzeng.com/library/kafka-exporter
tag:v1.8.0
pullPolicy:IfNotPresent
kafkaServer:
-172.139.20.17:9095
-172.139.20.81:9095
-172.139.20.177:9095
verbosity:0
sasl:
enabled:true
scram:
enabled:true
mechanism:scram-sha512
username:admin
password:admin-password
tls:
enabled:true
insecureSkipVerify: true
Tip:上述配置写在/etc/kubernetes/addons/prometheus-kafka-exporter-values.yaml 文件中
3、部署服务
$ helm -n obs-system install kafka-exporter -f etc/kubernetes/addons/prometheus-kafka-exporter-values.yaml etc/kubernetes/addons/prometheus-kafka-exporter
NAME: kafka-exporter
LAST DEPLOYED: Thu Apr 17 15:19:43 2025
NAMESPACE: obs-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the application URL by running these commands:
export POD_NAME=$(kubectl get pods --namespace obs-system -l "app=prometheus-kafka-exporter,release=kafka-exporter" -o jsonpath="{.items[0].metadata.name}")
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl port-forward $POD_NAME 8080:80
1、Prometheus采集Kafka-exporter数据
- job_name:'kafka'
kubernetes_sd_configs:
-role:service
relabel_configs:
-action:keep
source_labels:[__meta_kubernetes_namespace,__meta_kubernetes_service_name,__meta_kubernetes_service_port_name]
regex: obs-system;kafka-exporter;exporter-port
2、验证是否采集成功
$ curl -s -u admin $(kubectl -n kube-system get svc prometheus -ojsonpath='{.spec.clusterIP}:{.spec.ports[0].port}')/prometheus/api/v1/query --data-urlencode 'query=up{job=~"kafka"}' | jq '.data.result[] | {job: .metric.job, instance: .metric.instance ,status: .value[1]}'
Enter host password for user 'admin':
{
"job": "kafka",
"instance": "kafka-exporter.obs-system.svc:9308",
"status": "1"
}
grafana上导入以下 dashboard ID号:21078


通过这篇文章,我们希望您对Kafka监控有了更深刻的理解,并认识到实施有效的监控机制对于保障系统的稳定性和性能的重要性。不论是使用开源工具如Prometheus和Grafana进行监控,还是利用Confluent等提供的企业级解决方案,找到最适合您需求的监控策略都是至关重要的。记住,监控不仅仅是关于识别问题,更是关于预防问题的发生,从而确保我们的Kafka集群可以持续、高效地支持我们的业务目标。让我们一起行动起来,为自己的Kafka环境构建一个坚固的监控堡垒吧!
【推荐阅读】点击下方蓝色标题跳转至详细内容!
别忘了,关注我们的公众号,获取更多关于容器技术和云原生领域的深度洞察和技术实战,让我们携手在技术的海洋中乘风破浪!





