监控告警之elastalert部署及配置全解

Posted wzxmt

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了监控告警之elastalert部署及配置全解相关的知识,希望对你有一定的参考价值。

一、安装elastalert

环境

  • CentOS:7.4
  • Python:3.6.9
  • pip:19.3
  • elastalert:0.2.1
  • elk:7.3.2

2、配置Python3.6.9环境

安装依赖包

yum -y install wget openssl openssl-devel gcc gcc-c++

下载包

wget https://www.python.org/ftp/python/3.6.9/Python-3.6.9.tgz

安装

tar xf Python-3.6.9.tgz
cd Python-3.6.9./configure --prefix=/usr/local/python --with-openssl
make && make install

配置

mv /usr/bin/python /usr/bin/python_old
ln -s /usr/local/python/bin/python3 /usr/bin/python
ln -s /usr/local/python/bin/pip3 /usr/bin/pip
pip install --upgrade pip

#注意,所有依赖python2的脚本,都需要更改为python2.7,因为现在默认的python为3.6,例如

sed -i '1s/python/python2.7/g' /usr/bin/yum
sed -i '1s/python/python2.7/g' /usr/libexec/urlgrabber-ext-down

验证

$ python -V
    Python 3.6.9
$ pip -V 
    pip 19.3 from /usr/local/python/lib/python3.6/site-packages/pip (python 3.6)

3、安装elastalert

下载包

git clone https://github.com/Yelp/elastalert.git
cd elastalert

安装

pip install "elasticsearch<7,>6"
pip install -r requirements.txt
python setup.py install

安装成功后可以看到四个命令

ll /usr/local/python/bin/elastalert*
    /usr/local/python/bin/elastalert
    /usr/local/python/bin/elastalert-create-index
    /usr/local/python/bin/elastalert-rule-from-kibana
    /usr/local/python/bin/elastalert-test-rule
ln -s /usr/local/python/bin/elastalert* /usr/bin
  • elastalert-create-index会创建一个索引,ElastAlert会把执行记录存放到这个索引中,默认情况下,索引名叫elastalert_status。其中有4个_type,都有自己的@timestamp字段,所以同样也可以用kibana来查看这个索引的日志记录情况。
  • elastalert-rule-from-kibana从Kibana3已保存的仪表盘中读取Filtering设置,帮助生成config.yaml里的配置。不过注意,它只会读取filtering,不包括queries。
  • elastalert-test-rule测试自定义配置中的rule设置。

二、使用

官方文档:https://elastalert.readthedocs.io

规则文档:https://elastalert.readthedocs.io/en/latest/ruletypes.html

1、主配置文件

首先是主配置文件的模板为config.yaml.example,生成全局配置
vim config.yaml

# 用来加载rule的目录,默认是example_rules
rules_folder: rules
# 用来设置定时向elasticsearch发送请求,也就是告警执行的频率
run_every:
  seconds: 30
# 用来设置请求里时间字段的范围
buffer_time:
  seconds: 30
# elasticsearch的host地址,端口
es_host: 10.2.13.3
es_port: 9200
# elastalert产生的日志在elasticsearch中的创建的索引
writeback_index: elastalert_status
writeback_alias: elastalert_alerts
# 失败重试的时间限制
alert_time_limit:
  days: 2

2、创建告警索引

执行elastalert-create-index命令在ES创建索引,这不是必须的步骤,但是强烈建议创建。因为对于审计和测试很有用,并且重启ES不影响计数和发送alert.

Elastic Version: 7.3.2
Reading Elastic 6 index mappings:
Reading index mapping 'es_mappings/6/silence.json'
Reading index mapping 'es_mappings/6/elastalert_status.json'
Reading index mapping 'es_mappings/6/elastalert.json'
Reading index mapping 'es_mappings/6/past_elastalert.json'
Reading index mapping 'es_mappings/6/elastalert_error.json'
New index elastalert_status created
Done!

看到这个输出,就说明创建成功了,也可以请求一下看看:

curl 127.0.0.1:9200/_cat/indices?v

health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   elastalert_status_status  lh8LL4iCQeSn0afyzxBX7w   1   1          0            0       460b           230b
green  open   elastalert_status         i7B7IfCuSb2Sex8U5KoTZg   1   1          0            0       460b           230b
green  open   elastalert_status_past    et2aF44VR4WQnxB8T7zD4Q   1   1          0            0       460b           230b
green  open   elastalert_status_silence lhXHEsuUQeGZaW3cRLp5pQ   1   1          0            0       460b           230b
green  open   elastalert_status_error   zykwk4KtSyyOY7ckxQTrkA   1   1          0            0       460b           230b

3、Rule配置

所有的告警规则,通过在rule目下创建配置文件进行定义,这里简单创建一个来作为演示。

首先我已经在elk集群中配置了一个nginx日志采集的流水线,现在去kibana中利用检索规则,过滤出我想要的告警内容,比如我想让状态码是404的请求,触发告警通知,就用如下语句进行查询:

response: 404

其中group是kafka里边定义的组,后边是状态码,还可以写更多条件进行匹配。

然后来到服务器添加一条规则:

vim nginx_404.yaml

name: Nginx_err
use_strftine_index: true
index: nginx_info*
type: any
aggregation:
 seconds: 10
filter:
- query:
    query_string:
      query: "groups: nginx AND response: 404"
alert:
- "email"
email:
 - "test@qq.com"
smtp_host: smtp.163.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/smtp_auth_file.yaml
from_addr: test01@163.com
email_reply_to: teast02@163.com

注意里边在配置邮件通知的时候,还需要引用外部的一个文件,这个文件里用于存放对应邮箱的用户名密码。

vim /opt/elastalert/smtp_auth_file.yaml

user: "test01@163.com"
password: "xxxxxxx"

4、规则测试

刚刚已经添加了一条规则,现在可以用自身的命令测试一下刚刚添加的规则。

elastalert-test-rule --config config.yaml nginx_404.yaml

INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.
            To send them but remain verbose, use --verbose instead.
Didn't get any results.
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.
                To send them but remain verbose, use --verbose instead.
1 rules loaded
.........
elastalert_status - {'rule_name': 'Nginx_err', 'endtime': datetime.datetime(2020, 1, 11, 7, 30, 59, 793352, tzinfo=tzutc()), 'starttime': datetime.datetime(2020, 1, 10, 7, 30, 59, 793352, tzinfo=tzutc()), 'matches': 0, 'hits': 0, '@timestamp': datetime.datetime(2020, 1, 11, 7, 31, 0, 76042, tzinfo=tzutc()), 'time_taken': 0.24003815650939941}

如果没有报错,则说明可用。

5、启动

启动方式有两种

(1)指定规则文件路径

python -m elastalert.elastalert --verbose --config config.yaml --rule rules/nginx_404.yaml

(2)在全局路径config.yaml下,配置规则存放在加载规则rules目录下

python -m elastalert.elastalert --verbose

6、验证

服务启动之后,日志能够很清晰看到整个过程,此时可以在刚刚的索引原点请求几个不存在的接口,造一些404状态,过一会儿应该可以看到日志中的说明,有告警发出,邮箱应该也能收到了。

三、优化

1、启动方式

上边的启动命令只是在前台启动,并不给力,可以使用nohup启动,或者是通过supervisord管理,会更加方便。
supervisord如何安装就不说了.

创建配置文件:

$cat /etc/supervisord.d/elastalert1.ini
[program:elastalert1]
directory=/data/elastalert1/
command=python -m elastalert.elastalert --verbose --config /data/elastalert1/config.yaml
process_name=elastalert1
autorestart=true
startsecs=15
stopsignal=INT
stopasgroup=true
killasgroup=true
redirect_stderr=true
stdout_logfile=/data/log/elastalert1.log
stdout_logfile_maxbytes=5MB

然后启动即可

supervisorctl update 
supervisorctl start elastalert1

2、报警方式

elastalert的报警方式有很多种,像邮件、微信、钉钉、post等等,我们主要介绍以下几种常用的

(1)邮件报警

alert:
- "email"
email:
 - "test@qq.com"
smtp_host: smtp.163.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/smtp_auth_file.yaml
from_addr: test01@163.com
email_reply_to: teast02@163.com

修改/opt/elastalert/smtp_auth_file.yaml信息

(2)微信机器人报警

微信报警模板

git clone https://github.com/anjia0532/elastalert-wechat-plugin.git
cd elastalert-wechat-plugin/elastalert_modules/* elastalert_modules/

添加报警方式

 alert:
 - "elastalert_modules.wechat_qiye_alert.WeChatAlerter"
#后台登陆后【设置】->【权限管理】->【普通管理组】->【创建并设置通讯录和应用权限】->【CorpID,Secret】
#设置微信企业号的appid
corp_id: xxx
#设置微信企业号的Secret
secret: xxx
#后台登陆后【应用中心】->【选择应用】->【应用id】
#设置微信企业号应用id
agent_id: xx
#部门id
party_id: xx
#用户微信号
user_id: xx
# 标签id,多个用 | 分隔

(3)钉钉报警方式

钉钉报警模板

git clone https://github.com/xuyaoqiang/elastalert-dingtalk-plugin.git
cp elastalert-dingtalk-plugin/elastalert_modules/dingtalk_alert.py elastalert_modules/

添加报警方式

alert:
- "elastalert_modules.dingtalk_alert.DingTalkAlerter"
dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=fb6500f4c85b8cfe66fa9586870f3ce16c848eab1e1cb23110388d6d443f1e"
dingtalk_msgtype: text

3、报警频率

#限定时间内,发生事件次数
num_events: 3
#与上面参数结合使用,表示在2分钟内发生3次就报警
timeframe:
  minutes: 2

4、避免重复告警

避免一定时间段中重复告警,可以配置realertexponential_realert这两个选项:

# 5分钟内相同的报警不会重复发送
realert:
  minutes: 5
# 指数级扩大 realert 时间,中间如果有报警,
# 则按照5->10->20->40->60不断增大报警时间到制定的最大时间,
# 如果之后报警减少,则会慢慢恢复原始realert时间
exponential_realert:
  hours: 1

5、聚合相同告警

# 根据报警的内容将相同的报警按照 name 来聚合
aggregation_key: name
# 聚合报警的内容,只展示 name 与 message
summary_table_fields:
  - name
  - message

6、告警内容格式化

可以自定义告警内容,内部是使用Pythonformat来实现的。

alert_subject: "Error {1} @{2}"
alert_subject_args:
  - name
  - "@timestamp"
alert_text_type: alert_text_only
alert_text: |
  > Name: {1}
  > Message: {2}
  > Host: {3} ({4})
alert_text_args:
  - name
  - message
  - hostname
  - host

最后,整理了比较全的配置文件

name: test_err
use_strftine_index: true
index: filebeat-7.3.2-*
type: any

#将多个匹配项汇总到一个警报中。每次找到匹配项时,ElastAlert将等待该aggregation时间段,并将特定规则在该时间段内发生的所有匹配项一起发送。
aggregation:
  seconds: 10
#限定时间内,发生事件次数
num_events: 3
#与上面参数结合使用,在几分钟内
timeframe:
  minutes: 2

realert:
# 5分钟内相同的报警不会重复发送
  minutes: 5
   # 指数级扩大 realert 时间,中间如果有报警,
   # 则按照5->10->20->40->60不断增大报警时间到制定的最大时间,
   # 如果之后报警减少,则会慢慢恢复原始realert时间
exponential_realert:
  hours: 1

filter:
- query:
    query_string:
      query: "404"

alert:
- "email"
#在邮件正文会显示你定义的alert_text
alert_text: "You have a err message!"

#用户认证文件,需要user和password两个属性
smtp_host: smtp.163.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/smtp_auth_file.yaml

#从哪个邮箱发送
from_addr: test@163.com
#回复给那个邮箱
email_reply_to: test@163.com

email:
#接收报警邮件的邮箱
 - "test04@163.com"

四、示例

1、监控日志Web攻击行为

1.1 修改nginx日志格式

log_format logstash_json '{"time": "$time_local", '
         '"remote_addr": "$remote_addr", '
         '"remote_user": "$remote_user", '
         '"request": "$request", '
         '"status": "$status", '
         '"body_bytes_sent": "$body_bytes_sent", '
         '"http_referer": "$http_referer", '
         '"http_user_agent": "$http_user_agent", '
         '"http_x_forwarded_for": "$http_x_forwarded_for", '
         '"request_time": "$request_time", '
         '"request_length": "$request_length", '
         '"host": "$http_host"}';

1.2 编写监控规则

name: web attack
realert:
  minutes: 5
index: logstash-*
type: frequency
num_events: 10
timeframe:
  minutes: 1

query_key:
  - name
realert:
  minutes: 5
exponential_realert:
  hours: 1
  
filter:
- query_string:
# sql insert  xss detect
        query: "request: select.+(from|limit) OR request: union(.*?)select OR request: into.+(dump|out)file OR
        request: (base64_decode|sleep|benchmark|and.+1=1|and.+1=2|or%20|exec|information_schema|where%20|union%20|%2ctable_name%20|cmdshell|table_schema) OR
        request: (iframe|script|body|img|layer|div|meta|style|base|object|input|onmouseover|onerror|onload) OR
        request: .+etc.+passwd OR http_user_agent:(HTTrack|harvest|audit|dirbuster|pangolin|nmap|sqln|-scan|hydra|Parser|libwww|BBBike|sqlmap|w3af|owasp|Nikto|fimap|havij|PycURL|zmeu|BabyKrokodil|netsparker|httperf|bench) OR
        status: (400|404|500|501)
        NOT (request:_health.html OR remote_addr:222.222.222.222  )
        "

smtp_host: smtp.qiye.163.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/smtp_auth_file.yaml
email_reply_to: xxx@163.com
from_addr: xxx@163.com

alert:
- "email"
email:
- "shystartree@163.com"

alert_subject: "web attack may be by {0} at @{1}"
alert_subject_args:
  - remote_addr
  - time
alert_text_type: alert_text_only
alert_text: |
  你好,服务器({})可能正在受到web攻击,请采取手段阻止!!!!
  ### 截止发邮件前匹配到的请求数:{}
  > 发生时间: {}
  > timestamp:{}
  > attacker's ip: {}
  > request: {}
  > status:{}
  > UA头:{}
  >>> 参考来源:{}

alert_text_args:
  - host
  - num_hits
  - time
  - "@timestamp"
  - remote_addr
  - request
  - status
  - http_user_agent
  - source

2、五分钟内流量总和超过200M就发邮件

run_every:
  minutes: 5

name: flow
type: metric_aggregation
index: nginx_info
buffer_time:
  minutes: 5

metric_agg_key: body_bytes_sent
metric_agg_type: sum
max_threshold: 209715200
use_run_every_query_size: true

alert_text_type: alert_text_only
alert_subject: "Alter 最近五分钟流量超200M,请注意!!!"
alert_text: |
  最近五分钟总流量: {0} B
  kibana url: http://xxxxx

alert_text_args:
  - metric_body_bytes_sent_sum

smtp_host: smtp.qq.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/smtp_auth_file.yaml
from_addr: "xxxx@qq.com"
alert:
- "email"
email:
- "xxxx@qq.com"

3、对后端请求超过3秒的发送邮件

es_host: 192.168.20.6
es_port: 9200
run_every:
  seconds: 30
name: xxx_reponse_time
index: n-xxx-*
type: whitelist
compare_key: "request"
ignore_null: true
whitelist:
  - /index.html
  - /siteapp/ecsAuthentication/hasAuthentication

type: frequency
num_events: 1
timeframe:
    seconds: 30
filter:
- query_string:
   query: "upstream_response_time: >3 "

alert_text_type: alert_text_only
alert_subject: "Alter {0} 接口后端处理超过3秒!!!"
alert_subject_args:
- _index

html_table_title: "<h2>This is a heading</h2>"
alert_text: |
  timestamp: {0}
  request_method: {1}
  request: {2}
  request_body: {3}
  request_time: {4} s
  upstream_response_time: {5} s
  body_bytes_sent: {6} B
  status: {7}
  remote_addr: {8}
  http_x_forwarded_for: {9}
  upstream_addr: {10}
  agent: {11}

alert_text_args:
  - timestamp
  - request_method
  - request
  - request_body
  - request_time
  - upstream_response_time
  - body_bytes_sent
  - status
  - remote_addr
  - http_x_forwarded_for
  - upstream_addr
  - agent

smtp_host: smtp.qq.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/rule_templates/smtp_auth_file.yaml
from_addr: "xxx@qq.com"
alert:
- "email"
email:
- "xxxxx@qq.com"

以上是关于监控告警之elastalert部署及配置全解的主要内容,如果未能解决你的问题,请参考以下文章

Zabbix监控Elastalert

监控之--Nagios监控远程windows主机及邮件告警功能的配置

elasticsearch5之Elastalert 安装使用 配置邮件报警和微信报警

LINUX——监控服务zabbix部署(立于lnmp架构之上),服务配置,数据监控及告警自动化

ElastAlert规则

Elastaticsearch日志告警(elastalert)