如何在 Cassandra 4.0 Docker 容器上启用完整查询日志记录?
Posted
技术标签:
【中文标题】如何在 Cassandra 4.0 Docker 容器上启用完整查询日志记录?【英文标题】:How to enable full query logging on a Cassandra 4.0 Docker container? 【发布时间】:2021-02-14 12:16:49 【问题描述】:我想在启用 Full Query Logging (FQL) 的情况下运行运行 Cassandra 4 的 Docker 容器。到目前为止,我已经尝试构建以下Dockerfile
:
FROM cassandra:4.0
RUN nodetool enablefullquerylog
但这失败并出现以下错误:
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
我还尝试取消注释 Docker 容器上位于 /etc/cassandra/cassandra.yaml
中的 cassandra.yaml
中的 full_query_logging_options
:
# default options for full query logging - these can be overridden from command line when executing
# nodetool enablefullquerylog
full_query_logging_options:
log_dir: /var/log/cassandra/fql.log
roll_cycle: HOURLY
block: true
max_queue_weight: 268435456 # 256 MiB
max_log_size: 17179869184 # 16 GiB
# archive command is "/path/to/script.sh %path" where %path is replaced with the file being rolled:
archive_command:
max_archive_retries: 10
理想情况下,我想在 cassandra.yaml
中启用 FQL 而不必使用 nodetool
命令,但这似乎是不可能的(只有在使用nodetool
)?
我也不确定如何更改cassandra.yaml
以允许nodetool
连接。我注意到在运行 Cassandra 3 的 cassandra
Docker 映像中,nodetool
命令有效;它只是在cassandra:4.0
图像中不起作用。从Cassandra failed to connect看来,似乎需要在cassandra.yaml
中配置listen_address
和broadcast_address
。在 Cassandra 3 Docker 容器中,我可以看到默认配置如下:
# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be).
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: 172.17.0.5
# Set listen_address OR listen_interface, not both. Interfaces must correspond
# to a single address, IP aliasing is not supported.
# listen_interface: eth0
# If you choose to specify the interface by name and the interface has an ipv4 and an ipv6 address
# you can specify which should be chosen using listen_interface_prefer_ipv6. If false the first ipv4
# address will be used. If true the first ipv6 address will be used. Defaults to false preferring
# ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
# listen_interface_prefer_ipv6: false
# Address to broadcast to other Cassandra nodes
# Leaving this blank will set it to the same value as listen_address
broadcast_address: 172.17.0.5
而在 Cassandra 4 容器中是
# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be). If unresolvable
# it will fall back to InetAddress.getLoopbackAddress(), which is wrong for production systems.
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: localhost
# Set listen_address OR listen_interface, not both. Interfaces must correspond
# to a single address, IP aliasing is not supported.
# listen_interface: eth0
# If you choose to specify the interface by name and the interface has an ipv4 and an ipv6 address
# you can specify which should be chosen using listen_interface_prefer_ipv6. If false the first ipv4
# address will be used. If true the first ipv6 address will be used. Defaults to false preferring
# ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
# listen_interface_prefer_ipv6: false
# Address to broadcast to other Cassandra nodes
# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4
我不太明白172.17.0.5
的“来源”以及为什么将其设置为cassandra.yaml
中的该值将允许nodetool
在容器上工作。任何想法如何在 Cassandra 4 容器上使用 nodetool
来启用 FQL?
【问题讨论】:
【参考方案1】:原来,默认情况下,在构建容器时,不能在Dockerfile
中运行nodetool
命令;相反,它们必须在正在运行的容器中“手动”运行。所以我将Dockerfile
改编为以下内容:
FROM cassandra:4.0
RUN mkdir /cassandra-fql && chmod 777 /cassandra-fql
COPY cassandra.yaml /etc/cassandra/cassandra.yaml
与cassandra.yaml
相同,除了以下full_query_logging_options
:
# default options for full query logging - these can be overridden from command line when executing
# nodetool enablefullquerylog
full_query_logging_options:
log_dir: /cassandra-fql
roll_cycle: HOURLY
block: true
max_queue_weight: 268435456 # 256 MiB
max_log_size: 17179869184 # 16 GiB
# archive command is "/path/to/script.sh %path" where %path is replaced with the file being rolled:
# archive_command:
max_archive_retries: 10
然后,像这样运行容器后,
docker run --name cassandra-fql -p 127.0.0.1:9042:9042 cassandra-fql
docker exec
进入其中,运行nodetool enablefullquerylog
成功。
【讨论】:
以上是关于如何在 Cassandra 4.0 Docker 容器上启用完整查询日志记录?的主要内容,如果未能解决你的问题,请参考以下文章
Cassandra 4.0 中缺少 CounterMutationStage 和 ViewMutationStage 指标
如何将 Pyspark 连接到在 docker 上运行的 datastax Cassandra?
从 Apache cassandra 3.11.4 迁移到 4.0 beta 版本时出现问题