Elastic Beanstalk 上 Django/Celery 的权限问题

Posted

技术标签:

【中文标题】Elastic Beanstalk 上 Django/Celery 的权限问题【英文标题】:Permission problems with Django/Celery on Elastic Beanstalk 【发布时间】:2018-11-01 21:26:13 【问题描述】:

我的应用程序(clads)在 Django 上运行,并使用 Celery 执行定时和异步任务。不幸的是,我似乎无法找出阻止 Celery 进程写入 Django 应用程序日志或操作由 Django 应用程序创建的文件的一些权限问题。 Django 应用程序在 wsgi 进程中运行,我有一些配置文件来设置应用程序日志目录,以便 wsgi 进程可以写入它(见下文)。

但是,celery 进程似乎以不同的用户身份运行,该用户无权写入这些文件(当它看到日志文件配置时它会自动尝试执行此操作 - 也在下面。注意我试图改变这作为 wsgi 运行但没有工作)。同样的权限问题似乎阻止了 Celery 进程操作由 Django 应用程序创建的临时文件——这是项目的要求。

无可否认,我对 Unix 类型的操作系统非常生疏,所以我确定我错过了一些简单的东西。几天来我一直在搜索这个网站和其他网站,虽然我发现很多帖子让我接近这个问题,但我似乎仍然无法解决它。我怀疑我的配置中可能需要一些额外的命令来设置权限或在不同的用户下运行 Celery。任何帮助将不胜感激。项目配置和相关代码文件摘录如下。大多数配置文件都是从这个网站和其他网站上找到的信息拼凑而成的——很抱歉没有选址,但没有保持足够近的记录来确切地知道它们来自哪里。

settings.py

的原木和 Celery 部分
#log settings
LOGGING = 
'version': 1,
'disable_existing_loggers': False,
'formatters': 
    'verbose': 
        'format': '%(asctime)s - %(levelname)s - %(module)s.%(fileName)s.%(funcName)s %(processName)d %(threadName)d: %(message)s',
    ,
    'simple': 
        'format': '%(asctime)s - %(levelname)s: %(message)s'
    ,
,
'handlers' : 
    'django_log_file': 
        'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        'class': 'logging.FileHandler',
        'filename': os.environ.get('DJANGO_LOG_FILE'),
        'formatter': 'verbose',
    ,
    'app_log_file': 
        'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
        'class': 'logging.FileHandler',
        'filename': os.environ.get('CLADS_LOG_FILE'),
        'formatter': 'verbose',
    ,
,
'loggers': 
    'django': 
        'handlers': ['django_log_file'],
        'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        'propagate': True,
    ,
    'clads': 
        'handlers': ['app_log_file'],
        'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
        'propagate': True,
    ,
,


WSGI_APPLICATION = 'clads.wsgi.application'

# celery settings
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'

CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
CELERY_SEND_EVENTS = False

CELERY_BROKER_URL = os.environ.get('BROKER_URL')

tasks.py 摘录 LOGGER = logging.getLogger('clads.pit')

@shared_task(name="archive_pit_file")
def archive_pit_file(tfile_name):
    LOGGER.debug('archive_date_file called for ' + tfile_name)

    LOGGER.debug('connecting to S3 ...')
    s3 = boto3.client('s3')

    file_fname = os.path.join(settings.TEMP_FOLDER, tfile_name)
    LOGGER.debug('reading temp file from ' + file_fname)
    s3.upload_file(file_fname, settings.S3_ARCHIVE, tfile_name)

    LOGGER.debug('cleaning up temp files ...')

    #THIS LINE CAUSES PROBLEMS BECAUSE THE CELERY PROCESS DOES'T HAVE 
    #PERMISSION TO REMOVE TEH WSGI OWNED FILE 
    os.remove(file_fname)

logging.config

commands:
  01_change_permissions:
      command: chmod g+s /opt/python/log
  02_change_owner:
      command: chown root:wsgi /opt/python/log

99_celery.config

container_commands:  
  04_celery_tasks:
    command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
    leader_only: true
 05_celery_tasks_run:
   command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
   leader_only: true

celery_configuration.txt

#!/usr/bin/env bash

# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=$celeryenv%?

# Create celery configuraiton script
celeryconf="[program:celeryd-worker]  
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A clads -b <broker_url> --loglevel=INFO --without-gossip --without-mingle --without-heartbeat

directory=/opt/python/current/app  
user=nobody  
numprocs=1  
stdout_logfile=/var/log/celery-worker.log  
stderr_logfile=/var/log/celery-worker.log  
autostart=true  
autorestart=true  
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true

; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998

environment=$celeryenv

[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A clads -b <broker_url> --loglevel=INFO --workdir=/tmp

directory=/opt/python/current/app  
user=nobody  
numprocs=1  
stdout_logfile=/var/log/celery-beat.log  
stderr_logfile=/var/log/celery-beat.log  
autostart=true  
autorestart=true  
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true

; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998

environment=$celeryenv"

# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf

# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf  
  then
  echo "[include]" | tee -a /opt/python/etc/supervisord.conf
  echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi

# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread

# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update

# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker  
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat  

【问题讨论】:

【参考方案1】:

我无法准确找出权限问题,但找到了可能对其他人有所帮助的解决方法。我删除了日志设置中的 FileHandler 配置,并用 StreamHandler 替换了这些配置。这解决了权限问题,因为 Celery 进程不必尝试访问 wsgi 用户拥有的日志文件。

来自 Web 应用程序的日志消息最终会出现在 httpd 错误日志中 - 并不理想,但至少我可以找到它们,并且它们也可以通过弹性 beanstalk 控制台访问 - 并且 Celery 日志被写入 celery-worker。在 /var/log 中记录和 celery-beat.log。我无法通过控制台访问这些,但可以通过直接登录到实例来访问它们。这也不是很理想,因为这些日志不会被轮换,并且会在实例退役时丢失,但至少它让我暂时继续前进。

这是使它以这种方式工作的修改后的日志设置:

#log settings
LOGGING = 
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': 
        'verbose': 
            'format': '%(asctime)s - %(levelname)s - %(module)s.%(filename)s.%(funcName)s %(processName)s %(threadName)s: %(message)s',
        ,
        'simple': 
            'format': '%(asctime)s - %(levelname)s: %(message)s'
        ,
    ,
    'handlers' : 
        'console': 
            'class': 'logging.StreamHandler',
            'formatter': 'verbose',
        
    ,
    'loggers': 
        'django': 
            'handlers': ['console'],
            'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
            'propagate': True,
        ,
        'clads': 
            'handlers': ['console'],
            'level': os.getenv('CLADS_LOG_LEVEL', 'INFO'),
            'propagate': True,
        ,
    ,

【讨论】:

@sammyhats 大家有没有找到包含logging.FileHander的方法?

以上是关于Elastic Beanstalk 上 Django/Celery 的权限问题的主要内容,如果未能解决你的问题,请参考以下文章

在 AWS Elastic Beanstalk 和 EKS 上部署了一个 laravel 应用程序 相同的数据库 RDS 为啥在 Elastic Beanstalk 中获得快速响应

text 在Elastic Beanstalk上安装telegraf代理

在 Amazon Elastic Beanstalk 上安装 Anaconda

如何在 aws elastic beanstalk 环境实例启动上运行 shell 脚本

如何在 AWS Elastic Beanstalk 上修改 Nginx 配置

让 django celery worker 在 elastic-beanstalk 上启动的问题