某些 Celery 任务启动但挂起并且从不执​​行

Posted

技术标签:

【中文标题】某些 Celery 任务启动但挂起并且从不执​​行【英文标题】:Certain Celery Tasks starts but hangs and never executes 【发布时间】:2021-04-12 09:44:50 【问题描述】:

我遇到了 Django 和 Celery 的问题,其中一些已注册的任务永远不会被执行。

我的 tasks.py 文件中有三个任务,其中两个; schedule_notification()schedule_archive() 工作没有问题。它们在预定义的 ETA 执行时没有问题。

使用schedule_monitoring() 函数,我可以看到该作业是在 Celery Flower 中启动的,但它从未真正执行过。它就在那里。

我已经确认我可以从工作人员本地运行命令,所以我不确定问题可能出在哪里。

tasks.py(失败函数)

@task
def schedule_monitoring(job_id: str, action: str) -> str:
    salt = OSApi() # This is a wrapper around a REST API.
    job = Job.objects.get(pk=job_id)
    target = ('compound', f"G@hostname: job.network.gateway.host_name  and G@serial: job.network.gateway.serial_number ")

    policies = [
        'foo',
        'bar',
        'foobar',
        'barfoo'
    ]

    if action == 'start':
        salt.run(target, 'spectrum.add_to_collection', fun_args=['foo'])  
        for policy in policies:
            salt.run(target, 'spectrum.refresh_policy', fun_args=[policy])

        create_activity("Informational", "MONITORING", "Started proactive monitoring for job.", job)
    elif action == 'stop':
        salt.run(target, 'spectrum.remove_from_collection', fun_args=['bar'])
        for policy in policies:
            salt.run(target, 'spectrum.refresh_policy', fun_args=[policy])

        create_activity("Informational", "MONITORING", "Stopped proactive monitoring for job.", job)
    else:
        raise NotImplementedError

    return f"Applying monitoring action: action.upper() to Job: job.job_code"

Celery 配置

# Async
CELERY_BROKER_URL = os.environ.get('BROKER_URL', 'redis://localhost:6379')
CELERY_RESULT_BACKEND = os.environ.get('RESULT_BACKEND', 'redis://localhost:6379')
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'UTC'
CELERY_ENABLE_UTC = True

下面是在应该执行它的worker上成功执行命令:

>>> schedule_monitoring(job.pk, 'start')
'Applying monitoring action: START to Job: Test 1'
>>> schedule_monitoring(job.pk, 'stop')
'Applying monitoring action: STOP to Job: Test 1'
>>> exit()
Waiting up to 5 seconds.
Sent all pending logs.
root@9d045ff7dfc1:/app#

从调试工人;工作开始时我看到的只有以下内容,但没有什么有趣的;

[2021-01-06 17:08:00,001: DEBUG/MainProcess] TaskPool: Apply <function _trace_task_ret at 0x7f6adbc29680> (args:('Operations.tasks.schedule_monitoring', '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'lang': 'py', 'task': 'Operations.tasks.schedule_monitoring', 'id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'shadow': None, 'eta': '2021-01-06T17:08:00+00:00', 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'parent_id': None, 'argsrepr': "(UUID('11118a85-20f2-488d-9a12-b8d200ea7a74'), 'start')", 'kwargsrepr': '', 'origin': 'gen442@31a9de56d061', 'reply_to': '24a8dc4c-2e5c-32ce-aa3d-84392d7cbf41', 'correlation_id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'hostname': 'celery@bc4bb7af894f', 'delivery_info': 'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None, 'args': ['11118a85-20f2-488d-9a12-b8d200ea7a74', 'start'], 'kwargs': , b'[["11118a85-20f2-488d-9a12-b8d200ea7a74", "start"], , "callbacks": null, "errbacks": null, "chain": null, "chord": null]', 'application/json', 'utf-8') kwargs:)
[2021-01-06 17:08:00,303: DEBUG/MainProcess] basic.qos: prefetch_count->32
[2021-01-06 17:08:00,305: DEBUG/MainProcess] Task accepted: Operations.tasks.schedule_monitoring[407e8a87-b3bf-4e8f-8a17-776a33ae5fea] pid:44
[2021-01-06 17:08:00,311: DEBUG/ForkPoolWorker-3] Resetting dropped connection: storage.googleapis.com
[2021-01-06 17:08:00,383: DEBUG/ForkPoolWorker-3] https://storage.googleapis.com:443 "GET /download/storage/v1/b/foo/o/bar?alt=media HTTP/1.1" 200 96
[2021-01-06 17:08:01,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:06,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:11,227: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:16,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:21,227: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:26,229: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:31,231: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]

【问题讨论】:

【参考方案1】:

我找到的解决方案是在 Celery 中创建两个队列,一个通过 Celery Beat 管理计划任务,另一个对其余的具有更高优先级。

创建单独的队列后,任务开始流动并正确完成;我的猜测是拥挤的公共汽车或工人。

要创建其他队列,请在 settings.py 中执行以下操作:

from kombu import Queue, Exchange

CELERYD_MAX_TASKS_PER_CHILD = 4

CELERY_DEFAULT_QUEUE = 'scheduled'
CELERY_QUEUES = (
    Queue('scheduled', Exchange('scheduled'), routing_key='sched'),
    Queue('proactive_monitoring', Exchange('proactive_monitoring'), routing_key='prmon'),
)

然后在注册您的任务函数时,传递您希望它们分配到的队列:

tasks.py:

@task(queue='proactive_monitoring')
def schedule_monitoring(job_id: str, action: str) -> str:

最后,确保在每个队列中至少启动一名工作人员。您可以通过在启动工作人员时传递队列来做到这一点:

celery -A proj worker -l INFO -Q proactive_monitoring

如果你在 localhost 上启动多个 worker,你应该通过指定 name 属性来区分每个队列中的前两个:

celery -A proj worker -l INFO -Q proactive_monitoring -n prmon_first_worker

【讨论】:

以上是关于某些 Celery 任务启动但挂起并且从不执​​行的主要内容,如果未能解决你的问题,请参考以下文章

Celery框架

浅谈 Celery 分布式队列

VMware怎么开机自启动,关机自动挂起虚拟机

celery相关问题

executeFetchRequest 挂起并且从不返回错误或继续

Celery的实践指南