Actions
Issue #2958
closedEnsure that queued tasks are not lost by enabling task_reject_on_worker_lost
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:
Description
This is a clone of #2954 for pulp 3.
From #2954:
The resource_manager queue loses a currently running _queue_reserved_task if the resource manager is restarted with sudo systemctl restart pulp_resource_manager
.
The task is lost from the queue but still has an incorrect TaskStatus record showing as waiting which will never run.
Note that if you sudo pkill -9 -f resource_manager
and the sudo systemctl start pulp_resource_manager
it does not lose the task.
sudo systemctl stop pulp_workers
pulp-admin rpm repo sync run --repo-id zoo
qpid-stat -q <<-- observe that the queue depth of the resource_manager queue is 1
sudo systemctl restart pulp_resource_manager
qpid-stat -q <<-- observe that the queue depth of the resource_manager queue is 0
pulp-admin tasks list -s waiting <<-- observe that the task which is gone is listed as 'waiting', but it will never run because it is gone
We need to make sure that this doesn't happen in Celery 4. There's a config task that should prevent this:
http://docs.celeryproject.org/en/latest/userguide/configuration.html#task-reject-on-worker-lost
Related issues
Actions
Turn on task_reject_on_worker_lost to prevent lost tasks
Turn on task_reject_on_worker_lost (aka CELERY_REJECT_ON_WORKER_LOST) to prevent the loss of tasks when a worker dies.
fixes #2958 https://pulp.plan.io/issues/2958