Kubernetes中的多行流利日志

Posted 2023-02-15

技术标签:

【中文标题】Kubernetes中的多行流利日志【英文标题】：multiline fluentd logs in kubernetes 【发布时间】：2020-01-23 22:32:41 【问题描述】：

我是流利的新手。我已经配置了我需要的基本 fluentd 设置，并将其作为守护程序集部署到我的 kubernetes 集群。我看到将日志发送到我的第 3 方日志记录解决方案。但是，我现在想处理一些作为多个条目出现的日志，而它们确实应该是一个条目。来自节点的日志看起来像是 json 格式，格式类似于

"log":"2019-09-23 18:54:42,102 [INFO] some message \n","stream":"stderr","time":"2019-09-23T18:54:42.102Z"
"log": "another message \n","stream":"stderr","time":"2019-09-23T18:54:42.102Z"

我有一个看起来像

的配置映射

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-config-map
  namespace: logging
  labels:
    k8s-app: fluentd-logzio
data:
  fluent.conf: |-
@include "#ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'.conf"
@include kubernetes.conf
@include conf.d/*.conf

<match fluent.**>
    # this tells fluentd to not output its log on stdout
    @type null
</match>

# here we read the logs from Docker's containers and parse them
<source>
  @id fluentd-containers.log
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/es-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag raw.kubernetes.*
  format json
  read_from_head true

</source>

# Detect exceptions in the log output and forward them as one log entry.
<match raw.kubernetes.**>
  @id raw.kubernetes
  @type detect_exceptions
  remove_tag_prefix raw
  message log
  stream stream
  multiline_flush_interval 5
  max_bytes 500000
  max_lines 1000
</match>

# Enriches records with Kubernetes metadata
<filter kubernetes.**>
  @id filter_kubernetes_metadata
  @type kubernetes_metadata
</filter>

<match kubernetes.**>
  @type logzio_buffered
  @id out_logzio
  endpoint_url "https://listener-ca.logz.io?token=####"
  output_include_time true
  output_include_tags true
  <buffer>
    # Set the buffer type to file to improve the reliability and reduce the memory consumption
    @type file
    path /var/log/fluentd-buffers/stackdriver.buffer
    # Set queue_full action to block because we want to pause gracefully
    # in case of the off-the-limits load instead of throwing an exception
    overflow_action block
    # Set the chunk limit conservatively to avoid exceeding the GCL limit
    # of 10MiB per write request.
    chunk_limit_size 2M
    # Cap the combined memory usage of this buffer and the one below to
    # 2MiB/chunk * (6 + 2) chunks = 16 MiB
    queue_limit_length 6
    # Never wait more than 5 seconds before flushing logs in the non-error case.
    flush_interval 5s
    # Never wait longer than 30 seconds between retries.
    retry_max_interval 30
    # Disable the limit on the number of retries (retry forever).
    retry_forever true
    # Use multiple threads for processing.
    flush_thread_count 2
  </buffer>
</match>

我的问题是如何将这些日志消息作为一个条目而不是多个条目发送？

【问题讨论】：

【参考方案1】：

至少有两种方式：

`multiline`插件

感谢@rickerp，他推荐了multiline插件。

多行解析器插件解析多行日志。这个插件是正则表达式解析器的多行版本。

多行解析器使用 formatN 和 format_firstline 参数解析日志。 format_firstline 用于检测多行日志的起始行。 formatN，其中N的范围是[1..20]，是多行日志的Regexp格式列表。

与其他解析器插件不同，此插件需要输入插件中的特殊代码，例如处理 format_firstline。因此，目前，in_tail 插件适用于多行，但其他输入插件不适用。

`fluent-plugin-concat`插件

根据fluentd documentation，fluent-plugin-concat 解决了这个问题：

连接多行日志消息

应用程序日志存储在记录中的"log" 字段中。您可以在发送到目标之前使用fluent-plugin-concat 过滤器连接这些日志。

<filter docker.**>
@type concat
key log
stream_identity_key container_id
multiline_start_regexp /^-e:2:in `\/'/
multiline_end_regexp /^-e:4:in/
</filter>

原始事件：

2016-04-13 14:45:55 +0900 docker.28cf38e21204: "container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky","source":"stdout","log":"-e:2:in `/'"
2016-04-13 14:45:55 +0900 docker.28cf38e21204: "source":"stdout","log":"-e:2:in `do_division_by_zero'","container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky"
2016-04-13 14:45:55 +0900 docker.28cf38e21204: "source":"stdout","log":"-e:4:in `<main>'","container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky"

过滤的事件：

2016-04-13 14:45:55 +0900 docker.28cf38e21204: "container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky","source":"stdout","log":"-e:2:in `/'\n-e:2:in `do_division_by_zero'\n-e:4:in `<main>'"

使用该插件，您需要修复一些正则表达式。

【讨论】：

这非常有效。实际上有点太完美了，结合了一些现在不应该的日志，但我可以稍微编辑一下我的正则表达式，并可能让它工作。感谢您的提示。我想现在你可以使用多行@type docs.fluentd.org/parser/multiline

以上是关于Kubernetes中的多行流利日志的主要内容，如果未能解决你的问题，请参考以下文章

Kubernetes中的多行流利日志

multiline插件

fluent-plugin-concat插件

连接多行日志消息

`multiline`插件

`fluent-plugin-concat`插件