如何在 Cassandra 4.0 Docker 容器上启用完整查询日志记录?

Posted

技术标签:

【中文标题】如何在 Cassandra 4.0 Docker 容器上启用完整查询日志记录?【英文标题】:How to enable full query logging on a Cassandra 4.0 Docker container? 【发布时间】:2021-02-14 12:16:49 【问题描述】:

我想在启用 Full Query Logging (FQL) 的情况下运行运行 Cassandra 4 的 Docker 容器。到目前为止,我已经尝试构建以下Dockerfile

FROM cassandra:4.0
RUN nodetool enablefullquerylog

但这失败并出现以下错误:

nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.

我还尝试取消注释 Docker 容器上位于 /etc/cassandra/cassandra.yaml 中的 cassandra.yaml 中的 full_query_logging_options

# default options for full query logging - these can be overridden from command line when executing
# nodetool enablefullquerylog
full_query_logging_options:
    log_dir: /var/log/cassandra/fql.log
    roll_cycle: HOURLY
    block: true
    max_queue_weight: 268435456 # 256 MiB
    max_log_size: 17179869184 # 16 GiB
    # archive command is "/path/to/script.sh %path" where %path is replaced with the file being rolled:
    archive_command:
    max_archive_retries: 10

理想情况下,我想在 cassandra.yaml 中启用 FQL 而不必使用 nodetool 命令,但这似乎是不可能的(只有在使用nodetool)?

我也不确定如何更改cassandra.yaml 以允许nodetool 连接。我注意到在运行 Cassandra 3 的 cassandra Docker 映像中,nodetool 命令有效;它只是在cassandra:4.0 图像中不起作用。从Cassandra failed to connect看来,似乎需要在cassandra.yaml中配置listen_addressbroadcast_address。在 Cassandra 3 Docker 容器中,我可以看到默认配置如下:

# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be).
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: 172.17.0.5

# Set listen_address OR listen_interface, not both. Interfaces must correspond
# to a single address, IP aliasing is not supported.
# listen_interface: eth0

# If you choose to specify the interface by name and the interface has an ipv4 and an ipv6 address
# you can specify which should be chosen using listen_interface_prefer_ipv6. If false the first ipv4
# address will be used. If true the first ipv6 address will be used. Defaults to false preferring
# ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
# listen_interface_prefer_ipv6: false

# Address to broadcast to other Cassandra nodes
# Leaving this blank will set it to the same value as listen_address
broadcast_address: 172.17.0.5

而在 Cassandra 4 容器中是

# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be). If unresolvable
# it will fall back to InetAddress.getLoopbackAddress(), which is wrong for production systems.
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: localhost

# Set listen_address OR listen_interface, not both. Interfaces must correspond
# to a single address, IP aliasing is not supported.
# listen_interface: eth0

# If you choose to specify the interface by name and the interface has an ipv4 and an ipv6 address
# you can specify which should be chosen using listen_interface_prefer_ipv6. If false the first ipv4
# address will be used. If true the first ipv6 address will be used. Defaults to false preferring
# ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
# listen_interface_prefer_ipv6: false

# Address to broadcast to other Cassandra nodes
# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4

我不太明白172.17.0.5 的“来源”以及为什么将其设置为cassandra.yaml 中的该值将允许nodetool 在容器上工作。任何想法如何在 Cassandra 4 容器上使用 nodetool 来启用 FQL?

【问题讨论】:

【参考方案1】:

原来,默认情况下,在构建容器时,不能在Dockerfile中运行nodetool命令;相反,它们必须在正在运行的容器中“手动”运行。所以我将Dockerfile 改编为以下内容:

FROM cassandra:4.0
RUN mkdir /cassandra-fql && chmod 777 /cassandra-fql
COPY cassandra.yaml /etc/cassandra/cassandra.yaml

cassandra.yaml 相同,除了以下full_query_logging_options

# default options for full query logging - these can be overridden from command line when executing
# nodetool enablefullquerylog
full_query_logging_options:
    log_dir: /cassandra-fql
    roll_cycle: HOURLY
    block: true
    max_queue_weight: 268435456 # 256 MiB
    max_log_size: 17179869184 # 16 GiB
    # archive command is "/path/to/script.sh %path" where %path is replaced with the file being rolled:
    # archive_command:
    max_archive_retries: 10

然后,像这样运行容器后,

docker run --name cassandra-fql -p 127.0.0.1:9042:9042 cassandra-fql

docker exec 进入其中,运行nodetool enablefullquerylog 成功。

【讨论】:

以上是关于如何在 Cassandra 4.0 Docker 容器上启用完整查询日志记录?的主要内容,如果未能解决你的问题,请参考以下文章

因一个 Bug,Cassandra 4.0 暂停发布

Cassandra 4.0 中缺少 CounterMutationStage 和 ViewMutationStage 指标

如何将 Pyspark 连接到在 docker 上运行的 datastax Cassandra?

从 Apache cassandra 3.11.4 迁移到 4.0 beta 版本时出现问题

Cassandra 4.0 使用 java 驱动程序进行多选

因一个 Bug,Cassandra 4.0 暂停发布