如何从游牧民族运行 cassandra docker 容器?

Posted

技术标签:

【中文标题】如何从游牧民族运行 cassandra docker 容器?【英文标题】:How to run cassandra docker container from nomad? 【发布时间】:2019-05-08 07:09:55 【问题描述】:

我想从一个游牧工作中运行一个 cassandra 容器。它似乎开始了,但几秒钟后它就死了(它似乎被游牧民族自己杀死了)。

如果我从命令行运行容器,使用:

docker run --name some-cassandra -p 9042:9042 -d cassandra:3.0

容器完美启动。但是,如果我像这样创建一个游牧工作:

job "cassandra" 

  datacenters = ["dc1"]

  type = "service"

  update 
    max_parallel = 1
    min_healthy_time = "10s"
    healthy_deadline = "5m"
    progress_deadline = "10m"
    auto_revert = false
    canary = 0
  

  migrate 
    max_parallel = 1
    health_check = "checks"
    min_healthy_time = "10s"
    healthy_deadline = "5m"
  

  group "cassandra" 
    restart 
      attempts = 2
      interval = "240s"
      delay = "120s"
      mode = "delay"
    

    task "cassandra" 
      driver = "docker"

      config 
        image = "cassandra:3.0"
        network_mode = "bridge"
        port_map 
          cql = 9042
        
      

      resources 
        memory = 2048
        cpu = 800
        network 
          port "cql" 
        
      

      env 
        CASSANDRA_LISTEN_ADDRESS = "$NOMAD_IP_cql"
      

      service 
        name = "cassandra"
        tags = ["global", "cassandra"]
        port = "cql"
      
    
  

然后它永远不会启动。 nomad 的 web 界面在创建分配的 stdout 日志中没有显示任何内容,stdin 流仅显示 Killed

我知道,在发生这种情况时,会创建 docker 容器,并在几秒钟后将其删除。我无法读取这些容器的日志,因为当我尝试(使用docker logs <container_id>)时,我得到的只是:

Error response from daemon: configured logging driver does not support reading

分配概览显示此消息:

12/06/18 14:16:04   Terminated  Exit Code: 137, Exit Message: "Docker container exited with non-zero exit code: 137"

根据docker:

如果容器启动时没有初始化数据库,那么 将创建默认数据库。虽然这是预期的行为, 这意味着它不会接受传入的连接,直到这样 初始化完成。这可能会在使用自动化时导致问题 启动多个容器的工具,例如 docker-compose 同时。

但我怀疑这是我的问题的根源,因为我增加了 restart 节的值但没有任何效果,而且任务在几秒钟后就失败了。

不久前,我遇到了一个类似的问题,使用 kafka 容器,结果是不满意,因为它需要更多内存。但在这种情况下,我在 resources 节中为内存和 CPU 提供了很大的值,但似乎没有任何区别。

我的主机操作系统是 Arch Linux,内核为 4.19.4-arch1-1-ARCH。我正在将 consul 作为 systemd 服务运行,并且使用此命令行的游牧代理:

sudo nomad agent -dev

我可能会错过什么?任何帮助和/或指针将不胜感激。

更新(格林威治标准时间2018-12-06 16:26):通过详细阅读游牧代理的输出,我了解到可以在主机的/tmp目录中读取一些有价值的信息.该输出的 sn-p:

    2018/12/06 16:03:03 [DEBUG] memberlist: TCP connection from=127.0.0.1:45792
    2018/12/06 16:03:03.180586 [DEBUG] driver.docker: docker pull cassandra:latest succeeded
2018-12-06T16:03:03.184Z [DEBUG] plugin: starting plugin: path=/usr/bin/nomad args="[/usr/bin/nomad executor "LogFile":"/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/executor.out","LogLevel":"DEBUG"]"
2018-12-06T16:03:03.185Z [DEBUG] plugin: waiting for RPC address: path=/usr/bin/nomad
2018-12-06T16:03:03.235Z [DEBUG] plugin.nomad: plugin address: timestamp=2018-12-06T16:03:03.235Z address=/tmp/plugin681788273 network=unix
    2018/12/06 16:03:03.253166 [DEBUG] driver.docker: Setting default logging options to syslog and unix:///tmp/plugin559865372
    2018/12/06 16:03:03.253196 [DEBUG] driver.docker: Using config for logging: Type:syslog ConfigRaw:[] Config:map[syslog-address:unix:///tmp/plugin559865372]
    2018/12/06 16:03:03.253206 [DEBUG] driver.docker: using 2147483648 bytes memory for cassandra
    2018/12/06 16:03:03.253217 [DEBUG] driver.docker: using 800 cpu shares for cassandra
    2018/12/06 16:03:03.253237 [DEBUG] driver.docker: binding directories []string"/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/alloc:/alloc", "/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/local:/local", "/tmp/NomadClient073551030/1c315bf2-688c-2c7b-8d6f-f71fec1254f3/cassandra/secrets:/secrets" for cassandra
    2018/12/06 16:03:03.253282 [DEBUG] driver.docker: allocated port 127.0.0.1:29073 -> 9042 (mapped)
    2018/12/06 16:03:03.253296 [DEBUG] driver.docker: exposed port 9042
    2018/12/06 16:03:03.253320 [DEBUG] driver.docker: setting container name to: cassandra-1c315bf2-688c-2c7b-8d6f-f71fec1254f3
    2018/12/06 16:03:03.361162 [INFO] driver.docker: created container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199
    2018/12/06 16:03:03.754476 [INFO] driver.docker: started container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199
    2018/12/06 16:03:03.757642 [DEBUG] consul.sync: registered 1 services, 0 checks; deregistered 0 services, 0 checks
    2018/12/06 16:03:03.765001 [DEBUG] client: error fetching stats of task cassandra: stats collection hasn't started yet
    2018/12/06 16:03:03.894514 [DEBUG] client: updated allocations at index 371 (total 2) (pulled 0) (filtered 2)
    2018/12/06 16:03:03.894584 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 2)
    2018/12/06 16:03:05.190647 [DEBUG] driver.docker: error collecting stats from container 29b0764bd2de69bda6450ebb1a55ffd2cbb4dc3002f961cb5db71b323d611199: io: read/write on closed pipe
2018-12-06T16:03:09.191Z [DEBUG] plugin.nomad: 2018/12/06 16:03:09 [ERR] plugin: plugin server: accept unix /tmp/plugin681788273: use of closed network connection
2018-12-06T16:03:09.194Z [DEBUG] plugin: plugin process exited: path=/usr/bin/nomad
    2018/12/06 16:03:09.223734 [INFO] client: task "cassandra" for alloc "1c315bf2-688c-2c7b-8d6f-f71fec1254f3" failed: Wait returned exit code 137, signal 0, and error Docker container exited with non-zero exit code: 137
    2018/12/06 16:03:09.223802 [INFO] client: Restarting task "cassandra" for alloc "1c315bf2-688c-2c7b-8d6f-f71fec1254f3" in 2m7.683274502s
    2018/12/06 16:03:09.230053 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 1 services, 0 checks
    2018/12/06 16:03:09.233507 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks
    2018/12/06 16:03:09.296185 [DEBUG] client: updated allocations at index 372 (total 2) (pulled 0) (filtered 2)
    2018/12/06 16:03:09.296313 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 2)
    2018/12/06 16:03:11.541901 [DEBUG] http: Request GET /v1/agent/health?type=client (452.678µs)

但是/tmp/NomadClient.../<alloc_id>/...的内容看似简单:

[root@singularity 1c315bf2-688c-2c7b-8d6f-f71fec1254f3]# ls -lR
.:
total 0
drwxrwxrwx 5 nobody nobody 100 Dec  6 15:52 alloc
drwxrwxrwx 5 nobody nobody 120 Dec  6 15:53 cassandra

./alloc:
total 0
drwxrwxrwx 2 nobody nobody 40 Dec  6 15:52 data
drwxrwxrwx 2 nobody nobody 80 Dec  6 15:53 logs
drwxrwxrwx 2 nobody nobody 40 Dec  6 15:52 tmp

./alloc/data:
total 0

./alloc/logs:
total 0
-rw-r--r-- 1 root root 0 Dec  6 15:53 cassandra.stderr.0
-rw-r--r-- 1 root root 0 Dec  6 15:53 cassandra.stdout.0

./alloc/tmp:
total 0

./cassandra:
total 4
-rw-r--r-- 1 root   root   1248 Dec  6 16:19 executor.out
drwxrwxrwx 2 nobody nobody   40 Dec  6 15:52 local
drwxrwxrwx 2 nobody nobody   60 Dec  6 15:52 secrets
drwxrwxrwt 2 nobody nobody   40 Dec  6 15:52 tmp

./cassandra/local:
total 0

./cassandra/secrets:
total 0

./cassandra/tmp:
total 0

cassandra.stdout.0cassandra.stderr.0都是空的,executor.out文件的完整内容是:

2018/12/06 15:53:22.822072 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin278120866
2018/12/06 15:55:53.009611 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin242312234
2018/12/06 15:58:29.135309 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin226242288
2018/12/06 16:00:53.942271 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin373025133
2018/12/06 16:03:03.252389 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin559865372
2018/12/06 16:05:19.656317 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin090082811
2018/12/06 16:07:28.468809 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin383954837
2018/12/06 16:09:54.068604 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin412544225
2018/12/06 16:12:10.085157 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin279043152
2018/12/06 16:14:48.255653 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin209533710
2018/12/06 16:17:23.735550 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin168184243
2018/12/06 16:19:40.232181 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin839254781
2018/12/06 16:22:13.485457 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin406142133
2018/12/06 16:24:24.869274 [DEBUG] syslog-server: launching syslog server on addr: /tmp/plugin964077792

更新(格林威治标准时间 2018 年 12 月 6 日 16:40):因为很明显,代理需要记录到 syslog,所以我设置并启动了本地 syslog 服务器,但无济于事.并且系统日志服务器没有收到任何消息。

【问题讨论】:

【参考方案1】:

问题解决了。它的性质是双重的:

Nomad 的 docker 驱动程序正在(非常有效地)封装 容器的行为,使它们有时非常沉默。

Cassandra 非常需要资源。比我多得多 原本以为。我确信 4 GB RAM 就足够了 它运行舒适,但事实证明它需要(至少在我的 环境)6 GB。

免责声明:我现在实际上使用的是bitnami/cassandra 而不是cassandra,因为我相信他们的图像质量非常高,安全且可通过环境变量进行配置。我使用 bitnami 的图像做出的这一发现,我还没有测试原始图像对拥有这么多内存的反应。

至于为什么直接从 docker 的 CLI 运行容器时它不会失败,我认为这是因为以这种方式运行它时没有规范限制。 Docker 只是为其容器占用尽可能多的内存,因此,如果最终主机的内存不足以容纳所有容器,那么实现将晚得多(并且可能会很痛苦)。因此,这种早期失败应该是编排平台作为游牧民族的一个可喜的好处。如果我有任何抱怨,那就是由于容器的不可见性,发现问题花了这么长时间!

【讨论】:

以上是关于如何从游牧民族运行 cassandra docker 容器?的主要内容,如果未能解决你的问题,请参考以下文章

如何从 Dock 中的程序快捷方式中检索命令行参数?

如何使用 Docker Desktop for Windows 从主机连接到 Cassandra

如何在mac os X中获取正在运行的dock.app的pid_t

从 Dock 中删除 AppleScript 脚本的图标

从 Datastax Cassandra 中提取并使用 Sqoop 加载到 HBase

从 Dock 运行时打包的 Electron 应用程序不会启动,但从终端运行时运行良好