elk笔记4--grok正则解析

Posted 昕光xg

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了elk笔记4--grok正则解析相关的知识,希望对你有一定的参考价值。


elk笔记4--grok正则解析

1 grok 切分方法

grok切分规则可按照如下思路进行。
1)找准切分标志,以切分标志作为中心向左或者向右逐个字段抽出,对于正则中的通配符需要进行转义处理,否则这类字符作为分割标志的时候容易解析出错
2)也可以直接从左到右逐个字段取出

2 grok 切分案例

  1. 案例1内容:
2016/04/27 12:22:50 OSPF: AdjChg: Nbr 220.220.220.220 on g-or2-a0bjt:10.61.61.61: Init -> Deleted (InactivityTimer)

正则:

%DATA:timestamp OSPF: %DATA:type: Nbr %DATA:neighborip on %DATA:interface:%DATA:ip: %DATA:srcstat -> %GREEDYDATA:data

注意: OSPF前面需要有空格,否则会导致空格到timestamp中;on前面需要空格,否则会导致解析失败
结果:


"data": "Deleted (InactivityTimer)",
"neighborip": "220.220.220.220",
"srcstat": "Init",
"ip": "10.61.61.61",
"type": "AdjChg",
"interface": "g-or2-a0bjt",
"timestamp": "2016/04/27 12:22:50"
  1. 案例2
    内容:
[Jul 11 10:22:59][123.123.123.123]<14>[2016-07-11 10:22:59,591][client.log][INFO]bak found in

正则1:

\\[%DATA:head]\\[%DATA:clientip]<%NUMBER:pid>\\[%GREEDYDATA:ts]\\[%DATA:logtype]\\[%LOGLEVEL:level]%GREEDYDATA:data

注意:[需要进行转义
结果:


"head": "Jul 11 10:22:59",
"logtype": "client.log",
"data": "bak found in cache, skip it, test_data_2035_20160711_0500",
"level": "INFO",
"clientip": "123.123.123.123",
"pid": "14",
"ts": "2016-07-11 10:22:59,591"

正则2:去掉多余一个时间

\\[%DATA:head]\\[%DATA:clientip]<%NUMBER:pid>\\[2016-07-11 10:22:59,591]\\[%DATA:logtype]\\[%LOGLEVEL:level]%GREEDYDATA:data
或者
\\[%DATA:head]\\[%DATA:clientip]<%NUMBER:pid>\\[.*]\\[%DATA:logtype]\\[%LOGLEVEL:level]%GREEDYDATA:data

结果:


"head": "Jul 11 10:22:59",
"logtype": "client.log",
"data": "bak found in cache, skip it, test_data_2035_20160711_0500",
"level": "INFO",
"clientip": "123.123.123.123",
"pid": "14"
  1. 案例3 解析syslog 日志
    内容:
Apr 19 12:56:07 xg dbus-daemon[1537]: [session uid=1000 pid=1537] Successfully activated service org.freedesktop.Tracker1

正则:

%GREEDYDATA:timestamp %DATA:user %DATA:app\\[%NUMBER:pid]: %GREEDYDATA:content

注意: 此处可以根[ 或者 ] 确定字段的相关关系,然后逐渐向前取,最前面时间直接使用GREEDYDATA匹配即可
结果:


"app": "dbus-daemon",
"pid": "1537",
"user": "xg",
"content": "[session uid=1000 pid=1537] Successfully activated service org.freedesktop.Tracker1",
"timestamp": "Apr 19 12:56:07"
  1. 案例4 解析nginx 日志
    内容:
120.123.123.123 - - [19/Apr/2020:10:40:59 +0800] "GET /hello HTTP/1.1" 404 200 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/81.0.4044.92 Safari/537.36"

正则:

%IP:server_name %DATA:holder1 %DATA:remote_user \\[%DATA:localtime] "%DATA:request" %NUMBER:req_status %NUMBER:upstream_status "%DATA:holder2" %GREEDYDATA:agent

结果:


"localtime": "19/Apr/2020:10:40:59 +0800",
"server_name": "120.123.123.123",
"request": "GET /hello HTTP/1.1",
"agent": "\\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36\\"",
"req_status": "404",
"remote_user": "-",
"upstream_status": "200",
"holder2": "-",
"holder1": "-"
  1. 案例5
    内容:
\\[%DATA:ts]\\[%DATA:ns]\\[%DATA:env]\\[%DATA:logstash_level]\\[%DATA:service]\\[%DATA:filename:%NUMBER:lineno]%GREEDYDATA:msg

正则:

\\[%DATA:ts]\\[%DATA:ns]\\[%DATA:env]\\[%DATA:logstash_level]\\[%DATA:service]\\[%DATA:filename:%NUMBER:lineno]%GREEDYDATA:msg

结果:


"msg": "keyword: , pageNo: 1",
"filename": "search.py",
"lineno": "29",
"ns": "audio-mgr",
"service": "apiserver",
"env": "production",
"ts": "2020-04-29 21:37:54",
"logstash_level": " INFO"
  1. 案例6
    内容:
2021-01-12T17:38:53.800474Z stdout F 2021-01-12 17:38:53,800 INFO: [Log.py:50] [MainProcess:20 MainThread]

正则:

%DATA:timestamp %DATA:stdtype F %DATA:dt2 %DATA:time2 %DATA:loglevel\\: \\[%DATA:file] \\[%DATA:function] - %GREEDYDATA:msg

结果:


"msg": "init logger",
"time2": "17:38:53,800",
"dt2": "2021-01-12",
"file": "Log.py:50",
"loglevel": "INFO",
"function": "MainProcess:20 MainThread",
"stdtype": "stdout",
"timestamp": "2021-01-12T17:38:53.800474Z"
  1. 案例7-解析ingress 日志
    本案例解析ingress 的日志,案例中字段参考案例晕 sls 中日志解析字段
    内容:
192.168.2.12 - - [18/May/2022:12:44:01 +0000] "GET /public/fonts/roboto/vPcynSL0qHq_6dX7lKVByfesZW2xOQ-xsNqO47m55DA.woff2 HTTP/1.1" 304 0 "http://grafana.xg.com:30080/public/build/grafana.dark.b208037f6b1954dc031d.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36" 569 0.000 [lens-metrics-grafana-svc-80] [] 10.224.25.187:3000 0 0.000 304

正则:

%IP:upstream_addr %DATA:http_referer %DATA:remote_user \\[%DATA:time] "%DATA:method %DATA:url %DATA:version" %NUMBER:status %NUMBER:request_length "http://%DATA:host/%DATA:path" %GREEDYDATA:agent %NUMBER:request_length %NUMBER:request_time \\[%DATA:proxy_upstream_name] \\[] %DATA:upstream_addr %NUMBER:upstream_response_length %NUMBER:upstream_response_time %NUMBER:upstream_status %GREEDYDATA:req_id

结果:


"agent": "\\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36\\"",
"method": "GET",
"upstream_addr": "10.224.25.187:3000",
"upstream_response_length": "0",
"version": "HTTP/1.1",
"url": "/public/fonts/roboto/vPcynSL0qHq_6dX7lKVByfesZW2xOQ-xsNqO47m55DA.woff2",
"remote_user": "-",
"req_id": "7f2d304f864b63c6cd969cdde507b899",
"path": "public/build/grafana.dark.b208037f6b1954dc031d.css",
"upstream_status": "304",
"request_time": "0.000",
"request_length": "0",
"http_referer": "-",
"host": "grafana.xg.com:30080",
"proxy_upstream_name": "lens-metrics-grafana-svc-80",
"upstream_response_time": "0.000",
"time": "18/May/2022:12:44:01 +0000",
"status": "304"

3 说明

参考文档:
​​​grok-patterns​​​​filter-grok-index​


以上是关于elk笔记4--grok正则解析的主要内容,如果未能解决你的问题,请参考以下文章

无需解析日志即可从 Java 应用程序记录到 ELK

学习笔记37用正则表达式解析和提取数据

如何使用正则表达式从 PySpark databricks 笔记本中的文件中解析表名

ELK-logstash grok自定义正则

Logstash笔记-----grok插件的正则表达式来解析日志

ELK --- Grok正则过滤Linux系统登录日志