RocketMq监控大盘制作
Posted 21aspnet
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了RocketMq监控大盘制作相关的知识,希望对你有一定的参考价值。
1.官网资料
https://github.com/apache/rocketmq-exporter
官方模板
Grafana Dashboard ID: 10477, name: RocketMQ Exporter Overview. For details of the dashboard please see RocketMQ Exporter Overview.
2.常用指标
类型 | 监控项 | 说明 |
Broker | rocketmq_broker_tps | 单个broker每秒生成的消息数 |
rocketmq_broker_qps | 单个broker的qps(每秒请求处理数) | |
Producer | rocketmq_producer_tps | 单个topic的消息生产的(TPS生产tps) |
rocketmq_producer_message_size | 单个topic每秒消息生产的总数据量大小 | |
rocketmq_producer_offset | 单个topic消息生产的offset | |
Consumer Groups | rocketmq_consumer_tps | 单个consumer组每秒消息的TPS(消费tps) |
rocketmq_consumer_message_size | 单个consumer组每秒消息消息的总数据大小 | |
rocketmq_consumer_offset | 单个consumer组消息的offset | |
rocketmq_group_get_latency_by_storetime | 单个消费组延迟时间 | |
rocketmq_group_get_latency | 单个队列的某个主题的消费者延迟 | |
rocketmq_message_accumulation | 单个消费组延迟消费消息数量 | |
Consumer | rocketmq_client_consume_fail_msg_count | 消费者一小时内消费消息失败的数量 |
rocketmq_client_consume_fail_msg_tps | 消费者每秒消费消息失败的数量 | |
rocketmq_client_consume_ok_msg_tps | 消费者每秒消费成功的消息数 | |
rocketmq_client_consumer_pull_tps | 消费者每秒消费的消息数 | |
rocketmq_client_consume_rt | 每条消息的平均消费时间 | |
rocketmq_client_consumer_pull_rt | 拉取每条消息的平均时间 | |
rocketmq_client_consumer_pull_tps | 客户端每秒拉取的消息数 | |
Container | container_cpu_usage_seconds_total | 容器CPU使用率 |
container_memory_usage_bytes | 当前使用的内存量 | |
container_fs_usage_bytes | 容器磁盘空间使用 | |
container_fs_writes_bytes_total | 磁盘写入速度 | |
container_fs_reads_bytes_total | 磁盘读取速度 | |
rocketmq_brokeruntime | rocketmq_brokeruntime_commitlog_disk_ratio | |
rocketmq_brokeruntime_consumequeue_disk_ratio | ||
rocketmq_brokeruntime_commitlogdir_capacity_free | ||
rocketmq_brokeruntime_commitlogdir_capacity_total | ||
rocketmq_brokeruntime_commitlog_maxoffset | ||
rocketmq_brokeruntime_commitlog_minoffset |
下面这个是阿里云官方配置的时候自带的,阿里云专用
指标 | 类别 | 描述 |
rocketmq_broker_tps | Broker | Broker produces the number of messages per second |
rocketmq_broker_qps | Broker | Broker consumes messages per second |
rocketmq_producer_tps | Producer | The number of messages produced per second per topic |
rocketmq_producer_message_size | Producer | The size of a message produced per second by a topic (in bytes) |
rocketmq_producer_offset | Producer | The progress of a topic's production message |
rocketmq_consumer_tps | Consumer Groups | The number of messages consumed per second by a consumer group |
rocketmq_consumer_message_size | Consumer Groups | The size of the message consumed by the consumer group per second (in bytes) |
rocketmq_consumer_offset | Consumer Groups | Progress of consumption message for a consumer group |
rocketmq_group_get_latency | Consumer Groups | Consumer latency on some topic for one queue |
rocketmq_group_get_latency_by_storetime | Consumer Groups | Consumption delay time of a consumer group |
rocketmq_message_accumulation | Consumer Groups | How far Consumer offset lag behind |
rocketmq_client_consume_fail_msg_count | Consumer | The number of messages consumed fail in one hour |
rocketmq_client_consume_fail_msg_tps | Consumer | The number of messages consumed fail per second |
rocketmq_client_consume_ok_msg_tps | Consumer | The number of messages consumed success per second |
rocketmq_client_consume_ok_msg_tps | Consumer | The number of messages consumed success per second |
rocketmq_client_consume_rt | Consumer | The average time of consuming every message |
rocketmq_client_consumer_pull_rt | Consumer | The average time of pulling every message |
rocketmq_client_consumer_pull_tps | Consumer | The number of messages pulled by client per second |
rocketmq_brokeruntime_pmdt_0to10ms | Broker | The number of put message broke responds within 0to10ms |
rocketmq_brokeruntime_pmdt_10to50ms | Broker | The number of put message broke responds within 10to50ms |
rocketmq_brokeruntime_pmdt_50to100ms | Broker | The number of put message broke responds within 50to100ms |
rocketmq_brokeruntime_pmdt_100to200ms | Broker | The number of put message broke responds within 100to200ms |
rocketmq_brokeruntime_pmdt_200to500ms | Broker | The number of put message broke responds within 200to500ms |
rocketmq_brokeruntime_pmdt_500to1s | Broker | The number of put message broke responds within 500to1s |
rocketmq_brokeruntime_pmdt_1to2s | Broker | The number of put message broke responds within 1to2s |
rocketmq_brokeruntime_pmdt_2to3s | Broker | The number of put message broke responds within 2to3s |
rocketmq_brokeruntime_pmdt_3to4s | Broker | The number of put message broke responds within 3to4s |
rocketmq_brokeruntime_pmdt_4to5s | Broker | The number of put message broke responds within 4to5s |
rocketmq_brokeruntime_pmdt_5to10s | Broker | The number of put message broke responds within 5to10s |
rocketmq_brokeruntime_pmdt_10stomore | Broker | The number of put message broke responds within 10stomore |
rocketmq_brokeruntime_query_threadpoolqueue_headwait_timemills | Broker | Query thread pool queue head element wait time |
rocketmq_brokeruntime_pull_threadpoolqueue_headwait_timemill | Broker | Pull thread pool queue head element wait time |
rocketmq_brokeruntime_send_threadpoolqueue_headwait_timemills | Broker | Send thread pool queue head element wait time |
rocketmq_brokeruntime_commitlog_disk_ratio | Broker | Broker commit log disk ratio |
rocketmq_brokeruntime_consumequeue_disk_ratio | Broker | Broker consume queue disk ratio |
rocketmq_brokeruntime_commitlogdir_capacity_free | Broker | Broker commit log dir capacity free |
rocketmq_brokeruntime_commitlogdir_capacity_total | Broker | Broker commit log dir capacity total |
rocketmq_brokeruntime_commitlog_maxoffset | Broker | Broker commit log max offset |
rocketmq_brokeruntime_msg_put_total_today_now | Broker | Broker msg put total today now |
rocketmq_brokeruntime_msg_gettotal_today_now | Broker | Broker msg get total today now |
rocketmq_brokeruntime_dispatch_behind_bytes | Broker | Broker dispatch behind bytes |
rocketmq_brokeruntime_put_message_size_total | Broker | Broker put message size total |
rocketmq_brokeruntime_put_message_average_size | Broker | Broker put message average size |
rocketmq_brokeruntime_msg_gettotal_yesterdaymorning | Broker | Broker msg get total yesterday morning |
rocketmq_brokeruntime_msg_gettotal_todaymorning | Broker | Broker msg get total today morning |
更多指标参考官方源码:
https://github.com/apache/rocketmq-exporter/blob/master/rocketmq_exporter_overview.json
这里有人做了一版更实用的大盘,下载json即可:
一套拿来即用的RocketMQ监控面板和告警规则_不识君的荒漠的博客-CSDN博客_rocketmq 监控
告警规则主要有如下几条:
- broker节点挂了
- 磁盘空间不足
- broker busy告警
- 消息积压
- broker写入消息耗时太久(可能是broker IO压力大或才内存资源不足了)
- 集群发送的tps暴增了(可能是有人突然对集群压测,或者猛写入一批消息了)
下面是阿里云的一些实用问题。
阿里云需要在Prometheus监控增加RocketMQ的监控组件
需要注意如果是阿里云部署的私有云之类需要注意镜像是否推送。
配置好默认会自带一个监控大盘实际上也就是官方做的模板,只是用了阿里云的环境变量。
以上是关于RocketMq监控大盘制作的主要内容,如果未能解决你的问题,请参考以下文章