Logstash Grok过滤器Apache访问日志

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Logstash Grok过滤器Apache访问日志相关的知识,希望对你有一定的参考价值。

我一直在那里四处寻找,但找不到工作决议。我尝试在Logstash配置文件中使用Grok Filter来过滤Apache-Access日志文件。日志消息如下所示:{"message":"00.00.0.000 - - [dd/mm/YYYY:hh:mm:ii +0000] "GET /index.html HTTP/1.1" 200 00"}.

在这一刻,我只能使用grok { match => [ "message", "%{IP:client_ip}" ] }过滤客户端IP。

我想过滤:

- The GET method, 
- requested page (index.html), 
- HTTP/1.1, 
- server response 200
- the last number 00 after 200 inside the message body

请注意,这些都不适合我:

grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } 

要么

grok { match => [ "message", "%{COMBINEDAPACHELOG}" ] }
答案

使用Grok Debugger可以完全匹配日志格式。这是唯一的方法。

http://grokdebug.herokuapp.com/

另一答案
grok {
  match => [ "message", "%{IP:client_ip} %{USER:ident} %{USER:auth} [%{HTTPDATE:apache_timestamp}] "%{WORD:method} /%{NOTSPACE:request_page} HTTP/%{NUMBER:http_version}" %{NUMBER:server_response} " ]
}
另一答案

你可以使用COMBINEDAPACHELOG模式,

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}

例如,请考虑此示例apache日志

111.222.333.123首页 - [01 / Feb / 1998:01:08:46 -0800]“GET /bannerad/ad.htm HTTP / 1.0”200 28083“http://www.referrer.com/bannerad/ba_intro.htm”“Mozilla / 4.01(Macintosh; I; PPC)”

以上过滤器会产生,

{
  "clientip": [
    [
      "111.222.333.123"
    ]
  ],
  "HOSTNAME": [
    [
      "111.222.333.123"
    ]
  ],
  "IP": [
    [
      null
    ]
  ],
  "IPV6": [
    [
      null
    ]
  ],
  "IPV4": [
    [
      null
    ]
  ],
  "ident": [
    [
      "HOME"
    ]
  ],
  "USERNAME": [
    [
      "HOME",
      "-"
    ]
  ],
  "auth": [
    [
      "-"
    ]
  ],
  "timestamp": [
    [
      "01/Feb/1998:01:08:46 -0800"
    ]
  ],
  "MONTHDAY": [
    [
      "01"
    ]
  ],
  "MONTH": [
    [
      "Feb"
    ]
  ],
  "YEAR": [
    [
      "1998"
    ]
  ],
  "TIME": [
    [
      "01:08:46"
    ]
  ],
  "HOUR": [
    [
      "01"
    ]
  ],
  "MINUTE": [
    [
      "08"
    ]
  ],
  "SECOND": [
    [
      "46"
    ]
  ],
  "INT": [
    [
      "-0800"
    ]
  ],
  "verb": [
    [
      "GET"
    ]
  ],
  "request": [
    [
      "/bannerad/ad.htm"
    ]
  ],
  "httpversion": [
    [
      "1.0"
    ]
  ],
  "BASE10NUM": [
    [
      "1.0",
      "200",
      "28083"
    ]
  ],
  "rawrequest": [
    [
      null
    ]
  ],
  "response": [
    [
      "200"
    ]
  ],
  "bytes": [
    [
      "28083"
    ]
  ],
  "referrer": [
    [
      ""http://www.referrer.com/bannerad/ba_intro.htm""
    ]
  ],
  "QUOTEDSTRING": [
    [
      ""http://www.referrer.com/bannerad/ba_intro.htm"",
      ""Mozilla/4.01 (Macintosh; I; PPC)""
    ]
  ],
  "agent": [
    [
      ""Mozilla/4.01 (Macintosh; I; PPC)""
    ]
  ]
}

可以在这里测试,

https://grokdebug.herokuapp.com/

另一答案

使用以下内容:

filter {
    grok {
            match => { "message" => "%{COMMONAPACHELOG}" }
    }
}

正如您从模式中看到的那样,COMBINEDAPACHELOG会失败,因为有一些缺失的组件:

COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}

https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns

以上是关于Logstash Grok过滤器Apache访问日志的主要内容,如果未能解决你的问题,请参考以下文章

Logstash,grok 过滤器不适用于固定长度字段

Logstash 中的 Grok 过滤器错误

用于logstash的grok过滤器

使用Logstash filter grok过滤日志文件

logstash实战filter插件之grok(收集apache日志)

未找到 logstash grok 过滤器模式