RocketMq监控大盘制作

Posted 21aspnet

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了RocketMq监控大盘制作相关的知识,希望对你有一定的参考价值。

1.官网资料

​​​​​​https://github.com/apache/rocketmq-exporter

官方模板

Grafana Dashboard ID: 10477, name: RocketMQ Exporter Overview. For details of the dashboard please see RocketMQ Exporter Overview.

2.常用指标

类型监控项说明
Brokerrocketmq_broker_tps单个broker每秒生成的消息数
rocketmq_broker_qps单个broker的qps(每秒请求处理数)
Producerrocketmq_producer_tps单个topic的消息生产的(TPS生产tps)
rocketmq_producer_message_size单个topic每秒消息生产的总数据量大小
rocketmq_producer_offset单个topic消息生产的offset
Consumer Groupsrocketmq_consumer_tps单个consumer组每秒消息的TPS(消费tps)
rocketmq_consumer_message_size单个consumer组每秒消息消息的总数据大小
rocketmq_consumer_offset单个consumer组消息的offset
rocketmq_group_get_latency_by_storetime单个消费组延迟时间
rocketmq_group_get_latency单个队列的某个主题的消费者延迟
rocketmq_message_accumulation单个消费组延迟消费消息数量
Consumerrocketmq_client_consume_fail_msg_count消费者一小时内消费消息失败的数量
rocketmq_client_consume_fail_msg_tps消费者每秒消费消息失败的数量
rocketmq_client_consume_ok_msg_tps消费者每秒消费成功的消息数
rocketmq_client_consumer_pull_tps消费者每秒消费的消息数
 rocketmq_client_consume_rt每条消息的平均消费时间
 rocketmq_client_consumer_pull_rt拉取每条消息的平均时间
 rocketmq_client_consumer_pull_tps客户端每秒拉取的消息数
Containercontainer_cpu_usage_seconds_total容器CPU使用率
container_memory_usage_bytes当前使用的内存量
container_fs_usage_bytes容器磁盘空间使用
container_fs_writes_bytes_total磁盘写入速度
container_fs_reads_bytes_total磁盘读取速度
rocketmq_brokeruntimerocketmq_brokeruntime_commitlog_disk_ratio
rocketmq_brokeruntime_consumequeue_disk_ratio
rocketmq_brokeruntime_commitlogdir_capacity_free
rocketmq_brokeruntime_commitlogdir_capacity_total
rocketmq_brokeruntime_commitlog_maxoffset
rocketmq_brokeruntime_commitlog_minoffset

下面这个是阿里云官方配置的时候自带的,阿里云专用

指标 类别 描述
rocketmq_broker_tps Broker Broker produces the number of messages per second
rocketmq_broker_qps Broker Broker consumes messages per second
rocketmq_producer_tps Producer The number of messages produced per second per topic
rocketmq_producer_message_size Producer The size of a message produced per second by a topic (in bytes)
rocketmq_producer_offset Producer The progress of a topic's production message
rocketmq_consumer_tps Consumer Groups The number of messages consumed per second by a consumer group
rocketmq_consumer_message_size Consumer Groups The size of the message consumed by the consumer group per second (in bytes)
rocketmq_consumer_offset Consumer Groups Progress of consumption message for a consumer group
rocketmq_group_get_latency Consumer Groups Consumer latency on some topic for one queue
rocketmq_group_get_latency_by_storetime Consumer Groups Consumption delay time of a consumer group
rocketmq_message_accumulation Consumer Groups How far Consumer offset lag behind
rocketmq_client_consume_fail_msg_count Consumer The number of messages consumed fail in one hour
rocketmq_client_consume_fail_msg_tps Consumer The number of messages consumed fail per second
rocketmq_client_consume_ok_msg_tps Consumer The number of messages consumed success per second
rocketmq_client_consume_ok_msg_tps Consumer The number of messages consumed success per second
rocketmq_client_consume_rt Consumer The average time of consuming every message
rocketmq_client_consumer_pull_rt Consumer The average time of pulling every message
rocketmq_client_consumer_pull_tps Consumer The number of messages pulled by client per second
rocketmq_brokeruntime_pmdt_0to10ms Broker The number of put message broke responds within 0to10ms
rocketmq_brokeruntime_pmdt_10to50ms Broker The number of put message broke responds within 10to50ms
rocketmq_brokeruntime_pmdt_50to100ms Broker The number of put message broke responds within 50to100ms
rocketmq_brokeruntime_pmdt_100to200ms Broker The number of put message broke responds within 100to200ms
rocketmq_brokeruntime_pmdt_200to500ms Broker The number of put message broke responds within 200to500ms
rocketmq_brokeruntime_pmdt_500to1s Broker The number of put message broke responds within 500to1s
rocketmq_brokeruntime_pmdt_1to2s Broker The number of put message broke responds within 1to2s
rocketmq_brokeruntime_pmdt_2to3s Broker The number of put message broke responds within 2to3s
rocketmq_brokeruntime_pmdt_3to4s Broker The number of put message broke responds within 3to4s
rocketmq_brokeruntime_pmdt_4to5s Broker The number of put message broke responds within 4to5s
rocketmq_brokeruntime_pmdt_5to10s Broker The number of put message broke responds within 5to10s
rocketmq_brokeruntime_pmdt_10stomore Broker The number of put message broke responds within 10stomore
rocketmq_brokeruntime_query_threadpoolqueue_headwait_timemills Broker Query thread pool queue head element wait time
rocketmq_brokeruntime_pull_threadpoolqueue_headwait_timemill Broker Pull thread pool queue head element wait time
rocketmq_brokeruntime_send_threadpoolqueue_headwait_timemills Broker Send thread pool queue head element wait time
rocketmq_brokeruntime_commitlog_disk_ratio Broker Broker commit log disk ratio
rocketmq_brokeruntime_consumequeue_disk_ratio Broker Broker consume queue disk ratio
rocketmq_brokeruntime_commitlogdir_capacity_free Broker Broker commit log dir capacity free
rocketmq_brokeruntime_commitlogdir_capacity_total Broker Broker commit log dir capacity total
rocketmq_brokeruntime_commitlog_maxoffset Broker Broker commit log max offset
rocketmq_brokeruntime_msg_put_total_today_now Broker Broker msg put total today now
rocketmq_brokeruntime_msg_gettotal_today_now Broker Broker msg get total today now
rocketmq_brokeruntime_dispatch_behind_bytes Broker Broker dispatch behind bytes
rocketmq_brokeruntime_put_message_size_total Broker Broker put message size total
rocketmq_brokeruntime_put_message_average_size Broker Broker put message average size
rocketmq_brokeruntime_msg_gettotal_yesterdaymorning Broker Broker msg get total yesterday morning
rocketmq_brokeruntime_msg_gettotal_todaymorning Broker Broker msg get total today morning

更多指标参考官方源码:

https://github.com/apache/rocketmq-exporter/blob/master/rocketmq_exporter_overview.json

这里有人做了一版更实用的大盘,下载json即可:

一套拿来即用的RocketMQ监控面板和告警规则_不识君的荒漠的博客-CSDN博客_rocketmq 监控

告警规则主要有如下几条:

  • broker节点挂了
  • 磁盘空间不足
  • broker busy告警
  • 消息积压
  • broker写入消息耗时太久(可能是broker IO压力大或才内存资源不足了)
  • 集群发送的tps暴增了(可能是有人突然对集群压测,或者猛写入一批消息了)

 下面是阿里云的一些实用问题。

阿里云需要在Prometheus监控增加RocketMQ的监控组件

 需要注意如果是阿里云部署的私有云之类需要注意镜像是否推送。

配置好默认会自带一个监控大盘实际上也就是官方做的模板,只是用了阿里云的环境变量。

以上是关于RocketMq监控大盘制作的主要内容,如果未能解决你的问题,请参考以下文章

RocketMq监控大盘制作

十分钟构建双十一交互分析大盘

十分钟构建双十一交互分析大盘

一套拿来即用的RocketMQ监控面板和告警规则

一套拿来即用的RocketMQ监控面板和告警规则

资源数据大盘展现