Actions
Issue #2954
closedEnsure that queued tasks are not lost by enabling task_reject_on_worker_lost for Celery 4
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.14.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 23
Quarter:
Description
In Celery 3, the resource_manager queue loses a currently running _queue_reserved_task if the resource manager is restarted with sudo systemctl restart pulp_resource_manager
.
The task is lost from the queue but still has an incorrect TaskStatus record showing as waiting which will never run.
Note that if you sudo pkill -9 -f resource_manager
and the sudo systemctl start pulp_resource_manager
it does not lose the task.
sudo systemctl stop pulp_workers
pulp-admin rpm repo sync run --repo-id zoo
qpid-stat -q <<-- observe that the queue depth of the resource_manager queue is 1
sudo systemctl restart pulp_resource_manager
qpid-stat -q <<-- observe that the queue depth of the resource_manager queue is 0
pulp-admin tasks list -s waiting <<-- observe that the task which is gone is listed as 'waiting', but it will never run because it is gone
We need to make sure that this doesn't happen in Celery 4. There's a config task that should prevent this:
http://docs.celeryproject.org/en/latest/userguide/configuration.html#task-reject-on-worker-lost
Also, we to apply this fix for Pulp 2 AND 3.
Related issues
Actions
Turn on task_reject_on_worker_lost to prevent lost tasks
Turn on task_reject_on_worker_lost (aka CELERY_REJECT_ON_WORKER_LOST) to prevent the loss of tasks when a worker dies. This option is only available in Celery 4+.
fixes #2954 https://pulp.plan.io/issues/2954