Kafka快速入门——Kafka监控
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Kafka快速入门——Kafka监控相关的知识,希望对你有一定的参考价值。
Kafka快速入门(七)——Kafka监控
一、Kafka监控指标
1、Kafka主机监控指标
主机监控是监控Kafka集群Broker所在的节点机器的性能。常见的主机监控指标包括:
(1)机器负载(Load)
(2)CPU使用率
(3)内存使用率,包括空闲内存(Free Memory)和已使用内存(Used Memory)
(4)磁盘I/O使用率,包括读使用率和写使用率网络
(5)I/O使用率
(6)TCP连接数
(7)打开文件数
(8)inode使用情况
2、JVM监控指标
Kafka Broker进程是一个普通的Java进程,因此所有关于JVM的监控方式都可以用于对Kafka Broker进程的监控。
(1)Full GC发生频率和时长,用于评估Full GC对Broker进程的影响。长时间的停顿会令Broker端抛出各种超时异常。
(2)活跃对象大小,是设定堆大小的重要依据,能帮助细粒度地调优JVM各个代的堆大小。
(3)应用线程总数。了解Broker进程对CPU的使用情况。2019-07-30T09:13:03.809+0800: 552.982: [GC cleanup 827M->645M(1024M), 0.0019078 secs]
Broker JVM进程默认使用G1的GC算法,当cleanup步骤结束后,堆上活跃对象大小从827MB缩减成645MB。Kafka 0.9.0.0版本起,默认GC收集器为G1,而G1中的Full GC是由单线程执行的,速度非常慢。因此,需要监控Broker GC日志,即以kafkaServer-gc.log开头的文件。如果发现Broker进程频繁Full GC,可以开启G1的-XX:+PrintAdaptiveSizePolicy开关,让JVM指明是谁引发Full GC。
3、集群监控指标
(1)查看Broker进程是否启动,端口是否建立。在容器化的Kafka环境中,使用Docker启动Kafka Broker时,Docker容器虽然成功启动,但网络设置如果配置有误,就可能会出现进程已经启动但端口未成功建立监听的情形。
(2)查看Broker端关键日志。Broker端服务器日志server.log,控制器日志controller.log以及主题分区状态变更日志state-change.log。
(3)查看Broker端关键线程的运行状态。Kafka Broker进程会启动十几个甚至是几十个线程。在实际生产环境中,Log Compaction线程是以kafka-log-cleaner-thread开头的,负责日志Compaction;副本拉取消息的线程,通常以ReplicaFetcherThread开头,负责执行Follower副本向Leader副本拉取消息的逻辑。
(4)查看Broker端的关键JMX指标。
BytesIn/BytesOut:即Broker端每秒入站和出站字节数,如果值接近网络带宽,很容易出现网络丢包的情形。
NetworkProcessorAvgIdlePercent:即网络线程池线程平均的空闲比例,通常需要确保其值长期大于30%。如果小于30%,表明网络线程池非常繁忙,需要通过增加网络线程数或将负载转移给其它服务器的方式,来给Broker减负。
RequestHandlerAvgIdlePercent:即I/O线程池线程平均的空闲比例。如果值长期小于30%,需要调整I/O线程池的数量或者减少 Broker端的负载。
UnderReplicatedPartitions:即未充分备份的分区数。所谓未充分备份,是指并非所有的Follower副本都和Leader副本保持同步。
ISRShrink/ISRExpand:即ISR收缩和扩容的频次指标。如果生产环境中出现ISR中副本频繁进出的情形,其值一定是很高的。需要诊断下副本频繁进出ISR的原因,并采取适当的措施。
ActiveControllerCount:即当前处于激活状态的控制器的数量。通常,Controller所在Broker上的ActiveControllerCount指标值是1,其它Broker上的值是 0。如果发现存在多台Broker上ActiveControllerCount值都是1,表明Kafka集群出现了脑裂,必须尽快处理,处理方式主要是查看网络连通性。脑裂问题是非常严重的分布式故障,Kafka目前依托ZooKeeper来防止脑裂,一旦出现脑裂,Kafka无法保证正常工作。
(5)监控Kafka客户端。客户端所在的机器与Kafka Broker机器之间的网络往返时延(Round-Trip Time,RTT)。对于生产者,以kafka-producer-network-thread开头的线程负责实际消息发送,一旦挂掉,Producer将无法正常工作,但Producer进程不会自动挂掉。对于消费者,以kafka-coordinator-heartbeat-thread 开头的心跳线程事关Rebalance。
从Producer角度,需要关注的JMX指标是request-latency,即消息生产请求的延时,最直接地表征Producer程序的TPS;从 Consumer角度,records-lag和records-lead是两个重要的JMX 指标。如果使用Consumer Group,需要关注join rate和sync rate指标,其表明Rebalance的频繁程度。
二、JMX监控Kafka
1、JMX简介
JMX(Java Management Extensions)可以管理、监控正在运行中的Java程序,用于管理线程、内存、日志Level、服务重启、系统环境等。
2、Kafka开启JMX
开启JMX端口的方式有两种:
(1)启动Kafka时设置JMX_PORTexport JMX_PORT=9999 kafka-server-start.sh -daemon config/server.properties
(2)修改kafka-run-class.sh
在kafka-run-class.sh文件开始增加下列行:JMX_PORT=9999
修改kafka-run-class.sh文件后重启Kafka集群。
(3)Kafka Docker容器服务的JMX开启
Kafka容器服务的docker-compose.yml文件导入KAFKA_JMX_OPTS和JMX_PORT环境变量。
KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=192.168.0.105 -Dcom.sun.management.jmxremote.rmi.port=9999"
JMX_PORT: 9999
将相应的JMX端口对外暴露。
ports:
- "9999:9999" # 对外暴露端口号
3、JMX_PORT占用问题
Kafka需要监控Broker和Topic数据时,需要开启JMX_PORT,通常在脚本kafka-run-class.sh里面定义JMX_PORT变量,但JMX_PORT定义完成后,执行bin目录下脚本工具会报错。原因在于
kafka-run-class.sh是被调用脚本,当被其它脚本调用时,Java会绑定JMX_PORT,导致端口被占用。
解决方法是在执行Kafka启动时指定JMX_PORT。
(1)supervisor启动Kafka,在supervisor服务启动配置文件中加入environment=JMX_PORT=9999。
(2)kafka-server-start.sh脚本启动Kafka,在启动时export JMX_PORT=9999或者在kafka-server-start.sh脚本指定。
(3)修改kafka-run-class.sh脚本
修改Kafka安装目录下的bin/Kafka-run-class.sh文件:
三、Kafka监控工具
1、JMXTool工具
JMXTool是Kafka社区的工具,能够实时查看Kafka JMX指标。kafka-run-class.sh kafka.tools.JmxTool
--attributes:指定要查询的JMX属性名称,是以逗号分隔的CSV格式。
--date-format:指定显示的日志格式
--jmx-url:指定要连接的JMX接口,默认格式是service:jmx:rmi:///jndi/rmi://:JMX端口/jmxrmi
。
--object-name:指定要查询的JMX MBean名称。
--reporting-interval:指定实时查询的时间间隔,默认2s。
每秒查询一次过去1分钟的Broker端每秒入站的流量(BytesInPerSec)命令如下:kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9999/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes OneMinuteRate --reporting-interval 1000
ActiveController JMX指标查看命令如下:kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.controller:type=KafkaController,name=ActiveControllerCount --jmx-url service:jmx:rmi:///jndi/rmi://:9999/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --reporting-interval 1000
2、Kafka Manager
Kafka Manager是雅虎公司于2015年开源的一个Kafka监控框架,使用Scala语言开发,主要用于管理和监控Kafka集群。
Kafka Manager目前已经改名为CMAK (Cluster Manager for Apache Kafka)。
GitHub地址:
https://github.com/yahoo/CMAK
Kafka Manager Docker镜像:kafkamanager/kafka-manager
如果需要设置Kafka Manager基本安全认证,可以为Kafka Manager设置环境变量:
KAFKA_MANAGER_AUTH_ENABLED: "true"
KAFKA_MANAGER_USERNAME: username
KAFKA_MANAGER_PASSWORD: password
Kafka-Manager服务部署Docker-Compose.yml文件如下:
# 定义kafka-manager服务
kafka-manager-test:
image: kafkamanager/kafka-manager # kafka-manager镜像
restart: always
container_name: kafka-manager-test
hostname: kafka-manager-test
ports:
- "9000:9000" # 对外暴露端口,提供web访问
depends_on:
- kafka-test # 依赖
environment:
ZK_HOSTS: zookeeper-test:2181 # 宿主机IP
KAFKA_BROKERS: kafka-test:9090 # kafka
KAFKA_MANAGER_AUTH_ENABLED: "true"
KAFKA_MANAGER_USERNAME: admin
KAFKA_MANAGER_PASSWORD: password
启动Kafka Manager服务,登录Kafka Manager Web。
Web地址:http://127.0.0.1:9000
增加Kafka-Manager管理Kafka Broker节点:
3、JMXTrans + InfluxDB + Grafana
通常,监控框架可以使用JMXTrans + InfluxDB + Grafana组合,由于Grafana支持对JMX指标的监控,因此很容易将Kafka各种 JMX指标集成进来,对于已经采用JMXTrans + InfluxDB + Grafana监控方案的公司来说,可以直接复用已有的监控框架,可以极大地节省运维成本。
4、Confluent Control Center
Control Center能够实时地监控Kafka集群,同时还能够帮助操作和搭建基于Kafka的实时流处理应用。Control Center不是免费的,必须使用Confluent Kafka Platform企业版才能使用。
5、jconsole
Jconsole(Java Monitoring and Management Console)是一种基于JMX的可视化监视、管理工具,提供概述、内存、线程、类、VM概要、MBean的监控。
在Linux Terminal执行jsoncole,在弹出的窗口的远程进程中输入service:jmx:rmi:///jndi/rmi://192.168.0.105:9999/jmxrmi
或192.168.0.105:9999
。
选择MBeans选项卡,
四、JMXTrans
1、JMXTrans简介
JMXTrans是一个通过JMX采集Java应用程序的数据采集器,只要Java应用程序开启JMX端口,就可以进行采集。
JMXTrans以后台deamon形式运行,每隔1分钟采集一次数据。
GitHub地址:https://github.com/jmxtrans/jmxtrans
JMXTrans Docker容器镜像下载:docker pull jmxtrans/jmxtrans
2、JMXTrans配置文件
JMXTrans默认读取/var/lib/jmxtran
s目录下所有数据源配置文件(json格式文件),实时从数据源中获取数据,解析数据后存储到InfluxDB中。
JMXTrans配置JSON文件如下:
{
"servers": [{
"port": "9901",
"host": "192.168.0.105",
"queries": [{
"obj": "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
"attr": ["MeanRate", "OneMinuteRate", "FiveMinuteRate", "FifteenMinuteRate"],
"resultAlias": "kafkaServer",
"outputWriters": [{
"@class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url": "http://192.168.0.105:8086/",
"username": "admin",
"password": "123456",
"database": "jmx",
"tags": {
"application": "kafka_server"
}
}]
}]
}]
}
servers:数组,数据源配置。
port:字符串,接收jmx的json数据的端口。
host:字符串,接收jmx的json数据的IP地址。
queries:数组,具体监控指标项,按JSON格式列出多个指标项,监控指标可以通过jconsole工具(JDK自带的工具)获取。
obj:字符串,监控指标的名称。
attr:数组,需要存储的指标项字段,是数据目标表的字段名。
resultAlias:字符串,InfluxDB中的表名。
outputWriters:数组,数据目的地。
@class:字符串,数据目的地的类。
url:字符串,数据目的地( InfluxDb )的url。
username:字符串,InfluxDB登录名。
password:字符串,InfluxDB登录密码。
database:字符串,InfluxDB数据库名(需要预先创好)。
tags:json,避免指标项在 InfluxDbB表中所对应的字段重名的情况。
3、Kafka JMX监控指标
Kafka的JMX监控指标可以通过jconsole进行获取。
对于BytesInPerSec监控指标,在jconsole的MBeans选项页找到BytesInPerSe。
ObjectName的值是监控指标obj的值。
ObjectName的属性是"attr"对应的指标值,可以选择一个或多个。
metric名称是resultAlias对应的指标值,在InfluxDB中是MEASUREMENTS名。
"tags" 对应InfluxDB的tag功能,用于与存储在同一个MEASUREMENTS里的不同监控指标做区分。
{??????
"obj":"kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
??????"attr":[?"Count", "EventType","RateUnit","OneMinuteRate"?],
??????"resultAlias":"BytesInPerSec",
??????"outputWriters": [{
"@class"?: ? "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
????????"url"?: ? "http://192.168.0.105:8086/",
????????"username"?: ? "admin",
????????"password"?: ? "123456",
????????"database"?: ? "jmx",
????????"tags"?????: ?{
"application"?: ? "BytesInPerSec"
}
}?]
}
对于全局监控,每一个监控指标对应一个InfluxDB的MEASUREMENTS,所有的Kafka节点的同一个监控指标数据写同一个MEASUREMENTS;对于Topic的监控指标,同一个Topic的所有Kafka节点写到同一个MEASUREMENTS,并且以Topic名称命名。
{
"servers" : [ {
"port" : "9999",
"host" : "192.168.0.105",
"queries" : [ {
"obj" : "java.lang:type=Memory",
"attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ],
"resultAlias":"jvmMemory",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
"attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
"resultAlias":"kafkaServer",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
"attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
"resultAlias":"kafkaServer",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec",
"attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
"resultAlias":"kafkaServer",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec",
"attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
"resultAlias":"kafkaServer",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec",
"attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
"resultAlias":"kafkaServer",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec",
"attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
"resultAlias":"kafkaServer",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec",
"attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
"resultAlias":"kafkaServer",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions",
"attr" : [ "Value" ],
"resultAlias":"underReplicated",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.controller:type=KafkaController,name=ActiveControllerCount",
"attr" : [ "Value" ],
"resultAlias":"activeController",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "java.lang:type=OperatingSystem",
"attr" : [ "FreePhysicalMemorySize","SystemCpuLoad","ProcessCpuLoad","SystemLoadAverage" ],
"resultAlias":"jvmMemory",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
} ,{
"obj" : "kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent",
"attr" : [ "Value" ],
"resultAlias":"network",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent",
"attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
"resultAlias":"network",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
},{
"obj" : "java.lang:type=GarbageCollector,name=G1 Young Generation",
"attr" : [ "CollectionCount","CollectionTime" ],
"resultAlias":"gc",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://192.168.0.105:8086/",
"username" : "admin",
"password" : "123456",
"database" : "jmx",
"tags" : {"application" : "kafka_server"}
} ]
}]
} ]
}
4、JMXTrans部署
JMX通过网络连接,因此JMXtrans有2种部署方案:
(1)集中式。在一台服务器上部署JMXtrans,分别连接所有的Kafka Broker实例,并将数据写入到InfluxDB。为了减少网络传输,通常部署到InfluxDB所在服务器上。
(2)分布式。每个Kafka Broker实例部署一个JMXtrans。
JMXTrans配置文件分全局指标(每个Kafka节点)和Topic指标,全局指标是每个节点一个配置文件,命名规则:kafka-brokerxx.json,Topic指标是每个Topic一个配置文件,命名规则:TopicName.json。
五、Kafka监控方案实例
1、Kafka监控架构方案选择
监控系统架构通常分为三部分:数据采集、分析与转换、数据展示(可视化)。
(1)数据采集
数据采集通常先开发数据采集程序,然后使用Nagios、Zabbix等监控软件来调度执行,并将采集到的数据进行上报。对于Java程序,可以使用JMXTrans采集数据。
(2)分析与转换
Kafka是Java应用程序,所提供的性能指标数据已经非常全面,指标的直方图、次数、最大最小、标准方差都已经计算好,因此不需要再对数据进行分析加工,直接将MBeans数据存储到InfluxDB。
(3)数据可视化
Grafana是一个开源的可视化面板(Dashboard),支持Graphite、Zabbix、InfluxDB、Prometheus和OpenTSDB作为数据源。
2、InfluxDB部署
InfluxDB是一款用Go语言编写的开源分布式时序、事件和指标数据库,无需外部依赖,主要用于存储涉及大量的时间戳数据,如DevOps监控数据、APP metrics、lOT传感器数据和实时分析数据。docker pull influxdb
influxdb.yml文件:
version: ‘2‘
services:
influxdb:
image: influxdb
container_name: influxdb
volumes:
- /data/influxdb/conf:/etc/influxdb
- /data/influxdb/data:/var/lib/influxdb/data
- /data/influxdb/meta:/var/lib/influxdb/meta
- /data/influxdb/wal:/var/lib/influxdb/wal
ports:
- "8086:8086"
restart: always
结果查看:docker exec -it influxdb influx
3、JMXTrans部署
JMXTrans是一个通过JMX采集Java应用程序的数据采集器,只要Java应用程序开启JMX端口,就可以进行采集。docker pull jmxtrans/jmxtrans
JMXTrans默认读取/var/lib/jmxtrans目录下所有数据源配置文件(json格式文件),实时从数据源中获取数据,解析数据后存储到InfluxDB中。
version: ‘2‘
services:
# JMXTrans服务
jmxtrans:
image: jmxtrans/jmxtrans
container_name: jmxtrans
volumes:
- ./jmxtrans:/var/lib/jmxtrans
4、Grafana部署
Grafana是一个可视化面板(Dashboard),有非常漂亮的图表和布局展示,功能齐全的度量仪表盘和图形编辑器,支持Graphite、zabbix、InfluxDB、Prometheus和OpenTSDB作为数据源。
Grafana主要特性如下:
(1)展示方式:快速灵活的客户端图表,面板插件有许多不同方式的可视化指标和日志,官方库中具有丰富的仪表盘插件,比如热图、折线图、图表等多种展示方式。
(2)数据源:Graphite,InfluxDB,OpenTSDB,Prometheus,Elasticsearch,CloudWatch和KairosDB等。
(3)通知提醒:以可视方式定义最重要指标的警报规则,Grafana将不断计算并发送通知,在数据达到阈值时通过Slack、PagerDuty等获得通知。
(4)混合展示:在同一图表中混合使用不同的数据源,可以基于每个查询指定数据源,甚至自定义数据源。
(5)注释:使用来自不同数据源的丰富事件注释图表,将鼠标悬停在事件上会显示完整的事件元数据和标记。
(6)过滤器:Ad-hoc过滤器允许动态创建新的键/值过滤器,这些过滤器会自动应用于使用该数据源的所有查询。
GitHub地址:https://github.com/grafana/grafana
Grafana容器镜像下载:docker pull grafana/grafana:6.5.0
Grafana容器启动:docker run -d --name=grafana -p 3000:3000 grafana/grafana:6.5.0
Web登录:192.168.0.105:3000
初次登录默认使用admin/admin登录,登录后会强制要求修改密码。
增加数据源:
导入DashBoard模板:
DashBoard模板json文件如下:
{
"__inputs": [
{
"name": "DS_KAFKAMONITOR",
"label": "KafkaMonitor",
"description": "",
"type": "datasource",
"pluginId": "influxdb",
"pluginName": "InfluxDB"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "6.7.3"
},
{
"type": "panel",
"id": "graph",
"name": "Graph",
"version": ""
},
{
"type": "datasource",
"id": "influxdb",
"name": "InfluxDB",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"$$hashKey": "object:318",
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "java.lang:type=OperatingSystem",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 12,
"w": 8,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 6,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "jvmMemory",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"ProcessCpuLoad"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"进程CPU使用率"
],
"type": "alias"
}
]
],
"tags": []
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka进程CPU使用率",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:1134",
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:1135",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "服务器CPU使用率",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 12,
"w": 8,
"x": 8,
"y": 0
},
"hiddenSeries": false,
"id": 2,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "jvmMemory",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"SystemCpuLoad"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"CPU使用率"
],
"type": "alias"
}
]
],
"tags": []
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "CPU使用率",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:369",
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:370",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "java.lang:type=OperatingSystem
Linux系统负载",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 12,
"w": 8,
"x": 16,
"y": 0
},
"hiddenSeries": false,
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": true,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "jvmMemory",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"SystemLoadAverage"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"系统负载"
],
"type": "alias"
}
]
],
"tags": []
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "系统负载",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:656",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:657",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "Kafka每个broker每秒中的数据量,包括__consumer_offsets topic",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 12,
"w": 8,
"x": 0,
"y": 12
},
"hiddenSeries": false,
"id": 34,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
}
],
"hide": false,
"measurement": "kafkaServer",
"orderByTime": "ASC",
"policy": "default",
"refId": "D",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"OneMinuteRate"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"平均每秒"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=BrokerTopicMetrics,name=MessagesInPerSec"
}
]
},
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
}
],
"hide": false,
"measurement": "kafkaServer",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"OneMinuteRate"
],
"type": "field"
},
{
"params": [],
"type": "sum"
},
{
"params": [
"所有broker平均每秒"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=BrokerTopicMetrics,name=MessagesInPerSec"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka Topic 每秒数据量",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:2118",
"format": "none",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:2119",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "java.lang:type=OperatingSystem
服务器可用物理内存",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 12,
"w": 8,
"x": 8,
"y": 12
},
"hiddenSeries": false,
"id": 32,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "jvmMemory",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"FreePhysicalMemorySize"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"系统剩余物理内存"
],
"type": "alias"
}
]
],
"tags": []
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "可用物理内存",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:2324",
"format": "decbytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:2325",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"cacheTimeout": null,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "kafka.controller:type=KafkaController,name=ActiveControllerCount
Kafka控制器数量,每个集群只有一台机器为1,为1的机器是Kafka控制器Crontroller",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 12,
"w": 8,
"x": 16,
"y": 12
},
"hiddenSeries": false,
"id": 26,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pluginVersion": "6.7.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
}
],
"measurement": "activeController",
"orderByTime": "ASC",
"policy": "default",
"query": "SELECT sum("Value") AS "获取控制器数量" FROM "activeController" WHERE $timeFilter GROUP BY time($__interval), "hostname"",
"rawQuery": false,
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"Value"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"获取控制器数量"
],
"type": "alias"
}
]
],
"tags": [],
"tz": ""
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka控制器数量",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:4446",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:4447",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "监控 kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec 指标",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 8,
"x": 0,
"y": 24
},
"hiddenSeries": false,
"id": 16,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "kafkaServer",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"FiveMinuteRate"
],
"type": "field"
},
{
"params": [],
"type": "mean"
},
{
"params": [
"每秒拉取字节数"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=BrokerTopicMetrics,name=BytesOutPerSec"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka每秒拉取流量",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:77",
"format": "decbytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:78",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "监控 kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec 指标",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 8,
"x": 8,
"y": 24
},
"hiddenSeries": false,
"id": 14,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "kafkaServer",
"orderByTime": "ASC",
"policy": "default",
"refId": "F",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"OneMinuteRate"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"平均每秒进入字节数"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=BrokerTopicMetrics,name=BytesInPerSec"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka每秒进入流量",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:77",
"format": "decbytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:78",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "监控 kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec 和 kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec 指标",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 8,
"x": 16,
"y": 24
},
"hiddenSeries": false,
"id": 20,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "kafkaServer",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"OneMinuteRate"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"每秒Fetch(获取)的请求数量"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec"
}
]
},
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "kafkaServer",
"orderByTime": "ASC",
"policy": "default",
"refId": "D",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"MeanRate"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"每秒Producer发送的请求数量"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka生产、消费每秒请求数量",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:77",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:78",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "java.lang:type=Memory",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 13,
"w": 8,
"x": 0,
"y": 33
},
"hiddenSeries": false,
"id": 8,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "jvmMemory",
"orderByTime": "ASC",
"policy": "default",
"refId": "E",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"HeapMemoryUsage_used"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"堆内存使用"
],
"type": "alias"
}
]
],
"tags": []
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka使用堆内存",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:1850",
"format": "decbytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:1851",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "java.lang:type=Memory",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 13,
"w": 8,
"x": 8,
"y": 33
},
"hiddenSeries": false,
"id": 30,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "jvmMemory",
"orderByTime": "ASC",
"policy": "default",
"refId": "E",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"NonHeapMemoryUsage_used"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"对外内存使用"
],
"type": "alias"
}
]
],
"tags": []
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka使用堆外内存",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:1850",
"format": "decbytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:1851",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
不为0则说明有的副本跟不上leader",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 13,
"w": 8,
"x": 16,
"y": 33
},
"hiddenSeries": false,
"id": 24,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pluginVersion": "6.7.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "underReplicated",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"Value"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"未充分备份的分区数"
],
"type": "alias"
}
]
],
"tags": []
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "未充分备份的分区数监控",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:11235",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:11236",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"cacheTimeout": null,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 13,
"w": 8,
"x": 0,
"y": 46
},
"hiddenSeries": false,
"id": 12,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pluginVersion": "6.7.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"5m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "network",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"Value"
],
"type": "field"
},
{
"params": [],
"type": "mean"
},
{
"params": [
"网络线程池空闲比例"
],
"type": "alias"
}
]
],
"tags": []
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka网络线程池线程平均的空闲比例",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:13734",
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:13735",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"cacheTimeout": null,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 13,
"w": 8,
"x": 8,
"y": 46
},
"hiddenSeries": false,
"id": 22,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pluginVersion": "6.7.3",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
}
],
"measurement": "network",
"orderByTime": "ASC",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"OneMinuteRate"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"IO空闲比例"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": " I/O 线程池线程平均的空闲比例",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:13517",
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:13518",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${DS_KAFKAMONITOR}",
"description": "监控 kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec 和 kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec 指标",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 13,
"w": 8,
"x": 16,
"y": 46
},
"hiddenSeries": false,
"id": 18,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "kafkaServer",
"orderByTime": "ASC",
"policy": "default",
"refId": "H",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"OneMinuteRate"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"每秒Fetch(获取)异常的请求"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec"
}
]
},
{
"alias": "",
"groupBy": [
{
"params": [
"1m"
],
"type": "time"
},
{
"params": [
"hostname"
],
"type": "tag"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "kafkaServer",
"orderByTime": "ASC",
"policy": "default",
"refId": "J",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"MeanRate"
],
"type": "field"
},
{
"params": [],
"type": "last"
},
{
"params": [
"每秒Producer异常的请求"
],
"type": "alias"
}
]
],
"tags": [
{
"key": "typeName",
"operator": "=",
"value": "type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Kafka生产、消费请求失败数量",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:77",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:78",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": false,
"schemaVersion": 22,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
]
},
"timezone": "",
"title": "Kafka集群监控模板",
"uid": "PkULDneZkALL",
"variables": {
"list": []
},
"version": 27
}
5、docker-compose.yml文件
将InfluxDB、JMXTrans、Grafana部署整合使用Docker-Compose进行部署,创建KafkaMonitor目录,在KafkaMonitor目录内创建influxdb目录和jmxtrans目录以及docker-compose.yml文件,将jmxtrans.json文件放到jmxtrans目录。
docker-compose.yml文件如下:
version: ‘2‘
services:
# JMXTrans服务
jmxtrans:
image: jmxtrans/jmxtrans
container_name: jmxtrans
volumes:
- ./jmxtrans:/var/lib/jmxtrans
# InfluxDB服务
influxdb:
image: influxdb
container_name: influxdb
volumes:
- ./influxdb/conf:/etc/influxdb
- ./influxdb/data:/var/lib/influxdb/data
- ./influxdb/meta:/var/lib/influxdb/meta
- ./influxdb/wal:/var/lib/influxdb/wal
ports:
- "8086:8086" # 对外暴露端口,提供Grafana访问
restart: always
# Grafana服务
grafana:
image: grafana/grafana:6.5.0 #高版本可能存在bug
container_name: grafana
ports:
- "3000:3000" # 对外暴露端口,提供web访问
启动监控框架服务:docker-compose -f docker-compose.yml up -d
需要Web登录Grafana服务,配置相应的数据源和模板。
6、监控查看
以上是关于Kafka快速入门——Kafka监控的主要内容,如果未能解决你的问题,请参考以下文章