普罗米修斯中的警报管理器未启动

Posted

技术标签:

【中文标题】普罗米修斯中的警报管理器未启动【英文标题】:Alert manager in prometheus not starting 【发布时间】:2021-12-24 14:04:24 【问题描述】:

我配置了prometheus alertmanager 安装没有错误,但是 systemctl status alertmanager.service 给出

# systemctl status alertmanager.service
● alertmanager.service - Alertmanager for prometheus
     Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2021-11-12 07:15:08 UTC; 4min 50s ago
    Process: 1791 ExecStart=/opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data (code=exited, status=1/FAILUR>
   Main PID: 1791 (code=exited, status=1/FAILURE)

Nov 12 07:15:08 localhost systemd[1]: alertmanager.service: Scheduled restart job, restart counter is at 5.
Nov 12 07:15:08 localhost systemd[1]: Stopped Alertmanager for prometheus.
Nov 12 07:15:08 localhost systemd[1]: alertmanager.service: Start request repeated too quickly.
Nov 12 07:15:08 localhost systemd[1]: alertmanager.service: Failed with result 'exit-code'.
Nov 12 07:15:08 localhost systemd[1]: Failed to start Alertmanager for prometheus.

我的 alertmanager.service 的 systemd 服务文件是

 [Unit]
Description=Alertmanager for prometheus

[Service]
Restart=always
User=prometheus
ExecStart=/opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data            
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

ExecStart=/opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data(code=exited, status=1/FAILUR>

日志说

Nov 12 13:27:01 localhost alertmanager[1563]: level=warn ts=2021-11-12T13:27:01.483Z caller=cluster.go:177 component=cluster err="couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided"
Nov 12 13:27:01 localhost alertmanager[1563]: level=error ts=2021-11-12T13:27:01.485Z caller=main.go:250 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP address found, and explicit IP not provided"
Nov 12 13:27:01 localhost systemd[1]: alertmanager.service: Main process exited, code=exited, status=1/FAILURE
Nov 12 13:27:01 localhost systemd[1]: alertmanager.service: Failed with result 'exit-code'.

任何解决问题的线索

【问题讨论】:

【参考方案1】:

您想在 HA 模式下运行 AlertManager 吗?它默认启用,并且需要具有 RFC-6980 IP 地址的实例。

您可以使用标志alertmanager --cluster.advertise-address=<ip> 指定此地址

否则禁用 HA 并为标志指定空值:alertmanager --cluster.listen-address=

【讨论】:

以上是关于普罗米修斯中的警报管理器未启动的主要内容,如果未能解决你的问题,请参考以下文章

如何从普罗米修斯警报中标记松弛通道中的用户

如何获取警报计数,在普罗米修斯上一周警报触发了多少次

我们如何编写警报规则与普罗米修斯警报规则的先前值进行比较

普罗米修斯警报中缺少标签

是否可以根据另一个警报是否触发有条件地向普罗米修斯发出警报?

如何在特定时间打盹普罗米修斯警报