在 Elastic Beanstalk 上启动 SQS celery worker

Posted 2023-03-04

技术标签:

【中文标题】在 Elastic Beanstalk 上启动 SQS celery worker【英文标题】：Start SQS celery worker on Elastic Beanstalk 【发布时间】：2018-12-13 15:27:54 【问题描述】：

我正在尝试在 EB 上启动一个 celery worker，但遇到一个无法解释的错误。

.ebextensions dir 配置文件中的命令：

03_celery_worker:
  command: "celery worker --app=config --loglevel=info -E --workdir=/opt/python/current/app/my_project/"

列出的命令在我的本地机器上运行良好（只需更改 workdir 参数）。

来自 EB 的错误：

活动执行失败，因为：/opt/python/run/venv/local/lib/python3.6/site-packages/celery/platforms.py:796：RuntimeWarning：您正在以超级用户权限运行工作程序：这是绝对不推荐！

和

开始新的 HTTPS 连接 (1)：eu-west-1.queue.amazonaws.com (ElasticBeanstalk::ExternalInvocationError)

我已经用参数--uid=2 更新了 celery worker 命令，权限错误消失了，但命令执行仍然失败，原因是

外部调用错误

任何建议我做错了什么？

【问题讨论】：

【参考方案1】：

外部调用错误

据我了解，这意味着无法从 EB 容器命令运行列出的命令。需要在服务器上创建一个脚本并从该脚本运行 celery。 This post 描述了如何做到这一点。

更新： 需要在.ebextensions 目录下创建配置文件。我称它为celery.config。上面帖子的链接提供了一个几乎可以正常工作的脚本。需要做一些小的补充才能 100% 正确工作。我在安排定期任务时遇到了问题（celery beat）。以下是有关如何修复的步骤：

安装（添加到要求）django-celery beat pip install django-celery-beat，将其添加到已安装的应用程序中，并在启动 celery beat 时使用 --scheduler 参数。说明是here。

在脚本中指定运行脚本的用户。对于 celery worker，它是 celery 用户，它是在脚本前面添加的（如果不存在）。当我尝试启动 celery beat 时出现错误 PermissionDenied。这意味着 celery 用户没有所有必要的权限。我使用 ssh 登录到 EB，查看了所有用户的列表 (cat /etc/passwd) 并决定使用 daemon 用户。

列出的步骤解决了 celery beat 错误。使用脚本更新的配置文件如下（celery.config）：

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash

      # Create required directories
      sudo mkdir -p /var/log/celery/
      sudo mkdir -p /var/run/celery/

      # Create group called 'celery'
      sudo groupadd -f celery
      # add the user 'celery' if it doesn't exist and add it to the group with same name
      id -u celery &>/dev/null || sudo useradd -g celery celery
      # add permissions to the celery user for r+w to the folders just created
      sudo chown -R celery:celery /var/log/celery/
      sudo chown -R celery:celery /var/run/celery/

      # Get django environment variables
      celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
      celeryenv=$celeryenv%?

      # Create CELERY configuration script
      celeryconf="[program:celeryd]
      directory=/opt/python/current/app
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery worker -A config.celery:app --loglevel=INFO --logfile=\"/var/log/celery/%%n%%I.log\" --pidfile=\"/var/run/celery/%%n.pid\"

      user=celery
      numprocs=1
      stdout_logfile=/var/log/celery-worker.log
      stderr_logfile=/var/log/celery-worker.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 60

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=998

      environment=$celeryenv"


      # Create CELERY BEAT configuraiton script
      celerybeatconf="[program:celerybeat]
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery beat -A config.celery:app --loglevel=INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler --logfile=\"/var/log/celery/celery-beat.log\" --pidfile=\"/var/run/celery/celery-beat.pid\"

      directory=/opt/python/current/app
      user=daemon
      numprocs=1
      stdout_logfile=/var/log/celerybeat.log
      stderr_logfile=/var/log/celerybeat.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 60

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=999

      environment=$celeryenv"

      # Create the celery supervisord conf script
      echo "$celeryconf" | tee /opt/python/etc/celery.conf
      echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf

      # Add configuration script to supervisord conf (if not there already)
      if ! grep -Fxq "celery.conf" /opt/python/etc/supervisord.conf
        then
          echo "[include]" | tee -a /opt/python/etc/supervisord.conf
          echo "files: uwsgi.conf celery.conf celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Enable supervisor to listen for HTTP/XML-RPC requests.
      # supervisorctl will use XML-RPC to communicate with supervisord over port 9001.
      # Source: https://askubuntu.com/questions/911994/supervisorctl-3-3-1-http-localhost9001-refused-connection
      if ! grep -Fxq "[inet_http_server]" /opt/python/etc/supervisord.conf
        then
          echo "[inet_http_server]" | tee -a /opt/python/etc/supervisord.conf
          echo "port = 127.0.0.1:9001" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Reread the supervisord config
      supervisorctl -c /opt/python/etc/supervisord.conf reread

      # Update supervisord in cache without restarting all services
      supervisorctl -c /opt/python/etc/supervisord.conf update

      # Start/Restart celeryd through supervisord
      supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
      supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat

    commands:
      01_killotherbeats:
        command: "ps auxww | grep 'celery beat' | awk 'print $2' | sudo xargs kill -9 || true"
        ignoreErrors: true
      02_restartbeat:
        command: "supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat"
        leader_only: true

需要注意的一点：在我的项目中celery.py文件在config目录下，这就是为什么我在启动celery worker和celery beat的时候写-A config.celery:app

【讨论】：

以上是关于在 Elastic Beanstalk 上启动 SQS celery worker的主要内容，如果未能解决你的问题，请参考以下文章