ELK Logstash Grok入门指南

A Beginner’s Guide to Logstash Grok(​​https://logz.io/blog/logstash-grok/​​)

The ability to efficiently analyze and query the data being shipped into the ​​ELK Stack​​ depends on the information being readable. This means that as unstructured data is being ingested into the system, it must be translated into structured message lines.


This ungrateful but critical task is usually left to Logstash (though there are other log shippers available, see our comparison of ​​Fluentd vs. Logstash​​ as one example). Regardless of the data source that you define, pulling the logs and performing some magic to beautify them is necessary to ensure that they are parsed correctly before being outputted to Elasticsearch.


Data manipulation in ​​Logstash​​ is performed using filter plugins. This article focuses on one of the most popular and useful filter plugins – the Logstash grok filter, which is used to parse unstructured data into structured data.

​Logstash中的​​数据操作是使用过滤器插件执行的。本文重点介绍最流行和有用的过滤器插件之一– Logstash grok过滤器,该过滤器用于将非结构化数据解析为结构化数据。


What is grok?

The original term is actually pretty new — coined by Robert A. Heinlein in his 1961 book Stranger in a Strange Land — it refers to understanding something to the level one has actually immersed oneself in it. It’s an appropriate name for the grok language and Logstash grok plugin, which modify information in one format and immerse it in another (JSON, specifically). There are already a couple hundred Grok patterns for logs available.

最初的术语实际上是很新的-由罗伯特·A·海因莱因(Robert A. Heinlein)在他的1961年的《陌生的土地上的陌生人》一书中创造的–指的是理解某种东西,使人们真正沉浸于其中。这是grok语言和Logstash grok插件的合适名称,它们可以以一种格式修改信息并将其浸入另一种格式(特别是JSON)。已经有数百种用于记录的Grok模式。

Put simply, grok is a way to match a line against a regular expression, map specific parts of the line into dedicated fields, and perform actions based on this mapping.


How does it work?

Built-in, there are over ​​200 Logstash patterns​​ for filtering items such as words, numbers, and dates in AWS, Bacula, Bro, Linux-Syslog and more. If you cannot find the pattern you need, you can write your own custom pattern. There are also options for multiple match patterns, which simplifies the writing of expressions to capture log data.


Here is the basic syntax format for a Logstash grok filter:


The SYNTAX will designate the pattern in the text of each log. The SEMANTIC will be the identifying mark that you actually give that syntax in your parsed logs. In other words:



This will match the predefined pattern and map it to a specific identifying field.


For example, a pattern like will match the Grok IP pattern, usually an IPv4 pattern.

例如,类似于127.0.0.1的模式将匹配Grok IP模式,通常是IPv4模式。

Grok has separate IPv4 and IPv6 patterns, but they can be filtered together with the syntax IP.


This standard pattern is as follows:

IPV4 (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]1,2)[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]1,2)[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]1,2)[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]1,2))(?![0-9])


Pretending there was no unifying IP syntax, you would simply grok both with the same semantic field name:

%IPv4:Client IP %IPv6:Client IP

Again, just use the IP syntax, unless for any reason you want to separate these respective addresses into separate fields.


Since grok is essentially based upon a combination of regular expressions, you can also create your own custom regex-based grok filter with this pattern:


(?<custom_field>custom pattern)

For example:


This grok pattern will match the regex of 22-22-22 (or any other digit) to the field name.



Logstash Grok Pattern Examples


To demonstrate how to get started with grokking, I’m going to use the following application log:

2016-07-11T23:56:42.000+00:00 INFO [MySecretApp.com.Transaction.Manager]:Starting transaction for session -464410bf-37bf-475a-afc0-498e0199f008

The goal I want to accomplish with a grok filter is to break down the logline into the following fields: timestamp, log level, class, and then the rest of the message.

The following grok pattern will do the job:

grok    match =>  "message" => "%TIMESTAMP_ISO8601:timestamp %LOGLEVEL:log-level \\[%DATA:class\\]:%GREEDYDATA:message"  


#NOTE:​GREEDYDATA​​ is the way Logstash Grok expresses the regex ​​.*​​ 

Grok Data Type Conversion

By default, all ​​SEMANTIC​​​ entries are strings, but you can flip the data type with an easy formula. The following Logstash grok example converts any syntax ​​NUMBER​​​ identified as a semantic ​​num​​​ into a semantic float, ​​float​​:

默认情况下,所有​​SEMANTIC​​​条目都是字符串,但是您可以使用简单的公式来翻转数据类型。以下Logstash grok示例将任何​​NUMBER​​​标识为语义的语法​​num​​​转换为语义浮点数​​float​​: 


It’s a pretty useful tool, even though it is currently only available for conversions to ​​float​​​ or integers ​​int​​.



This will try to match the incoming log to the given grok pattern. In case of a match, the log will be broken down into the specified fields, according to the defined grok patterns in the filter. In case of a mismatch, Logstash will add a tag called ​​_grokparsefailure​​.


However, in our case, the filter will match and result in the following output:

"message" => "Starting transaction for session -464410bf-37bf-475a-afc0-498e0199f008",
"timestamp" => "2016-07-11T23:56:42.000+00:00",
"log-level" => "INFO",
"class" => "MySecretApp.com.Transaction.Manager"


The grok debugger

A great way to get started with building your grok filters is this grok debug tool: ​​https://grokdebug.herokuapp.com/​

This tool allows you to paste your log message and gradually build the grok pattern while continuously testing the compilation. As a rule, I recommend starting with the ​​%GREEDYDATA:message​​ pattern and slowly adding more and more patterns as you proceed.

In the case of the example above, I would start with:


Then, to verify that the first part is working, proceed with:

%TIMESTAMP_ISO8601:timestamp %GREEDYDATA:message


Common Logstash grok examples

Here are some examples that will help you to familiarize yourself with how to construct a grok filter:


Parsing syslog messages with Grok is one of the more common demands of new users,. There are also several different kinds of log formats for syslog so keep writing your own custom grok patterns in mind. Here is one example of a common syslog parse:

match => "message" => "%SYSLOGTIMESTAMP:syslog_timestamp

If you are using ​​rsyslog​​, you can configure the latter to send logs to Logstash.

Apache Access logs

match => "message" => "%COMBINEDAPACHELOG"


match => ["message", "\\[%TIMESTAMP_ISO8601:timestamp\\]\\[%DATA:loglevel%SPACE\\]\\[%DATA:source%SPACE\\]%SPACE\\[%DATA:node\\]%SPACE\\[%DATA:index\\] %NOTSPACE \\[%DATA:updated-type\\]",
"message", "\\[%TIMESTAMP_ISO8601:timestamp\\]\\[%DATA:loglevel%SPACE\\]\\[%DATA:source%SPACE\\]%SPACE\\[%DATA:node\\] (\\[%NOTSPACE:Index\\]\\[%NUMBER:shards\\])?%GREEDYDATA"




match => ["redistimestamp", "\\[%MONTHDAY %MONTH %TIME]",
["redislog", "\\[%POSINT:pid\\] %REDISTIMESTAMP:timestamp"],
["redismonlog", "\\[%NUMBER:timestamp \\[%INT:database %IP:client:%NUMBER:port\\] "%WORD:command"\\s?%GREEDYDATA:params"]


MONGO_LOG %SYSLOGTIMESTAMP:timestamp \\[%WORD:component\\] %GREEDYDATA:messageMONGO_QUERY \\ (?<= ).*(?=  ntoreturn:) \\MONGO_SLOWQUERY %WORD %MONGO_WORDDASH:database\\.%MONGO_WORDDASH:collection %WORD: %MONGO_QUERY:query %WORD:%NONNEGINT:ntoreturn %WORD:%NONNEGINT:ntoskip %WORD:%NONNEGINT:nscanned.*nreturned:%NONNEGINT:nreturned..+ (?<duration>[0-9]+)msMONGO_WORDDASH \\b[\\w-]+\\bMONGO3_SEVERITY \\wMONGO3_COMPONENT %WORD|-MONGO3_LOG %TIMESTAMP_ISO8601:timestamp %MONGO3_SEVERITY:severity %MONGO3_COMPONENT:component%SPACE(?:\\[%DATA:context\\])? %GREEDYDATA:message


ELB_ACCESS_LOG %TIMESTAMP_ISO8601:timestamp %NOTSPACE:elb %IP:clientip:%INT:clientport:int (?:(%IP:backendip:?:%INT:backendport:int)|-) %NUMBER:request_processing_time:float %NUMBER:backend_processing_time:float %NUMBER:response_processing_time:float %INT:response:int %INT:backend_response:int %INT:received_bytes:int %INT:bytes:int "%ELB_REQUEST_LINE"
CLOUDFRONT_ACCESS_LOG (?<timestamp>%YEAR-%MONTHNUM-%MONTHDAY\\t%TIME)\\t%WORD:x_edge_location\\t(?:%NUMBER:sc_bytes:int|-)\\t%IPORHOST:clientip\\t%WORD:cs_method\\t%HOSTNAME:cs_host\\t%NOTSPACE:cs_uri_stem\\t%NUMBER:sc_status:int\\t%GREEDYDATA:referrer\\t%GREEDYDATA:agent\\t%GREEDYDATA:cs_uri_query\\t%GREEDYDATA:cookies\\t%WORD:x_edge_result_type\\t%NOTSPACE:x_edge_request_id\\t%HOSTNAME:x_host_header\\t%URIPROTO:cs_protocol\\t%INT:cs_bytes:int\\t%GREEDYDATA:time_taken:float\\t%GREEDYDATA:x_forwarded_for\\t%GREEDYDATA:ssl_protocol\\t%GREEDYDATA:ssl_cipher\\t%GREEDYDATA:x_edge_response_result_type


Summing it up

Logstash grok is just one type of filter that can be applied to your logs before they are forwarded into Elasticsearch. Because it plays such a crucial part in the logging pipeline, grok is also one of the most commonly-used filters.

Logstash grok只是在将日志转发到Elasticsearch之前可以应用于您的日志的一种过滤器。由于grok在测井管道中起着至关重要的作用,因此它也是最常用的过滤器之一。 

Here is a list of some useful resources that can help you along the grokking way:

Happy grokking!


