logstash grok,用 json 过滤器解析一行

Posted

技术标签:

【中文标题】logstash grok,用 json 过滤器解析一行【英文标题】:logstash grok, parse a line with json filter 【发布时间】:2018-04-20 02:54:52 【问题描述】:

我正在使用 ELK(弹性搜索、kibana、logstash、filebeat)来收集日志。我有一个包含以下几行的日志文件,每一行都有一个 json,我的目标是使用 Logstash Grok 取出 json 中的键/值对并将其转发到弹性搜索。

2018-03-28 13:23:01  charge:"oldbalance":5000,"managefee":0,"afterbalance":"5001","cardid":"123456789","txamt":1

2018-03-28 13:23:01  manage:"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1

我正在使用Grok Debugger 制作正则表达式模式并查看结果。我目前的正则表达式是:

%TIMESTAMP_ISO8601 %SPACE %WORD:$:data:%QUOTEDSTRING:key1:%BASE10NUM:value1[,]%QUOTEDSTRING:key2:%BASE10NUM:value2[,]%QUOTEDSTRING:key3:%QUOTEDSTRING:value3[,]%QUOTEDSTRING:key4:%QUOTEDSTRING:value4[,]%QUOTEDSTRING:key5:%BASE10NUM:value5[,]

正如我们看到的那样,它是硬编码的,因为真实日志中 json 中的键可以是任何单词,值可以是整数、双精度或字符串,而且键的长度会有所不同。所以我的解决方案是不可接受的。我的解决结果如下图,仅供参考。我正在使用Grok patterns。

我的问题是尝试在 json 中提取键是否明智,因为弹性搜索也使用 json?其次,如果我尝试从 json 中提取键/值,是否有正确、简洁的 Grok 模式?

Grok 模式的当前结果在解析上述行中的第一行时给出以下输出。


  "TIMESTAMP_ISO8601": [
    [
      "2018-03-28 13:23:01"
    ]
  ],
  "YEAR": [
    [
      "2018"
    ]
  ],
  "MONTHNUM": [
    [
      "03"
    ]
  ],
  "MONTHDAY": [
    [
      "28"
    ]
  ],
  "HOUR": [
    [
      "13",
      null
    ]
  ],
  "MINUTE": [
    [
      "23",
      null
    ]
  ],
  "SECOND": [
    [
      "01"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      null
    ]
  ],
  "SPACE": [
    [
      ""
    ]
  ],
  "WORD": [
    [
      "charge"
    ]
  ],
  "key1": [
    [
      ""oldbalance""
    ]
  ],
  "value1": [
    [
      "5000"
    ]
  ],
  "key2": [
    [
      ""managefee""
    ]
  ],
  "value2": [
    [
      "0"
    ]
  ],
  "key3": [
    [
      ""afterbalance""
    ]
  ],
  "value3": [
    [
      ""5001""
    ]
  ],
  "key4": [
    [
      ""cardid""
    ]
  ],
  "value4": [
    [
      ""123456789""
    ]
  ],
  "key5": [
    [
      ""txamt""
    ]
  ],
  "value5": [
    [
      "1"
    ]
  ]

第二次修改

是否可以使用 Logstash 的 Json 过滤器?但在我的情况下,Json 是行/事件的一部分,而不是整个事件是 Json。

================================================ =============

第三版

我看不到解析 json 的更新解决方案功能。我的正则表达式如下:

filter 
  grok 
    match => 
      "message" => [
           "%TIMESTAMP_ISO8601%SPACE%GREEDYDATA:json_data"
            ]
           
  



filter 
  json
    source => "json_data"
    target => "parsed_json"
   

它没有 key:value 对,而是 msg+json 字符串。解析的 json 没有被解析。

测试数据如下:

2018-03-28 13:23:01  manage:"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1
2018-03-28 13:23:03  payment:"cuurentValue":5001,"reload":0,"newbalance":"5002","posid":"987654321","something":"new3","additionalFields":2
2018-03-28 13:24:07  management:"cuurentValue":5002,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1

[2018-06-04T15:01:30,017][WARN ][logstash.filters.json    ] Error parsing json :source=>"json_data", :raw=>"manage:\"cuurentValue\":5000,\"payment\":0,\"newbalance\":\"5001\",\"posid\":\"123456789\",\"something\":\"new2\",\"additionalFields\":1", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'manage': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"manage:"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1"; line: 1, column: 8]>
[2018-06-04T15:01:30,017][WARN ][logstash.filters.json    ] Error parsing json :source=>"json_data", :raw=>"payment:\"cuurentValue\":5001,\"reload\":0,\"newbalance\":\"5002\",\"posid\":\"987654321\",\"something\":\"new3\",\"additionalFields\":2", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'payment': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"payment:"cuurentValue":5001,"reload":0,"newbalance":"5002","posid":"987654321","something":"new3","additionalFields":2"; line: 1, column: 9]>
[2018-06-04T15:01:34,986][WARN ][logstash.filters.json    ] Error parsing json :source=>"json_data", :raw=>"management:\"cuurentValue\":5002,\"payment\":0,\"newbalance\":\"5001\",\"posid\":\"123456789\",\"something\":\"new2\",\"additionalFields\":1", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'management': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"management:"cuurentValue":5002,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1"; line: 1, column: 12]>

请检查结果:

【问题讨论】:

【参考方案1】:

您可以使用GREEDYDATA 将整个 json 块分配给这样的单独字段,

%TIMESTAMP_ISO8601%SPACE%GREEDYDATA:json_data

这将为您的 json 数据创建一个单独的文件,


  "TIMESTAMP_ISO8601": [
    [
      "2018-03-28 13:23:01"
    ]
  ],
  "json_data": [
    [
      "charge:"oldbalance":5000,"managefee":0,"afterbalance":"5001","cardid":"123456789","txamt":1"
    ]
  ]

然后在json_data字段上应用json filter,如下所示,

json
    source => "json_data"
    target => "parsed_json"
 

【讨论】:

请看第三版,我有一些文字,cmets没有格式功能。 请试试这个,%TIMESTAMP_ISO8601%SPACE%WORD:%GREEDYDATA:json_data

以上是关于logstash grok,用 json 过滤器解析一行的主要内容,如果未能解决你的问题,请参考以下文章

Logstash 中的 Grok 过滤器错误

用于logstash的grok过滤器

使用Logstash filter grok过滤日志文件

未找到 logstash grok 过滤器模式

Logstash Grok过滤器Apache访问日志

Logstash笔记-----grok插件的正则表达式来解析日志