Project

Profile

Help

Issue #2958

Ensure that queued tasks are not lost by enabling task_reject_on_worker_lost

Added by daviddavis about 2 years ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

This is a clone of #2954 for pulp 3.

From #2954:

The resource_manager queue loses a currently running _queue_reserved_task if the resource manager is restarted with sudo systemctl restart pulp_resource_manager.

The task is lost from the queue but still has an incorrect TaskStatus record showing as waiting which will never run.

Note that if you sudo pkill -9 -f resource_manager and the sudo systemctl start pulp_resource_manager it does not lose the task.

sudo systemctl stop pulp_workers
pulp-admin rpm repo sync run --repo-id zoo
qpid-stat -q                        <<-- observe that the queue depth of the resource_manager queue is 1
sudo systemctl restart pulp_resource_manager
qpid-stat -q                        <<-- observe that the queue depth of the resource_manager queue is 0
pulp-admin tasks list -s waiting    <<-- observe that the task which is gone is listed as 'waiting', but it will never run because it is gone

We need to make sure that this doesn't happen in Celery 4. There's a config task that should prevent this:

http://docs.celeryproject.org/en/latest/userguide/configuration.html#task-reject-on-worker-lost


Related issues

Related to Pulp - Issue #2954: Ensure that queued tasks are not lost by enabling task_reject_on_worker_lost for Celery 4 CLOSED - CURRENTRELEASE Actions

Associated revisions

Revision f61019e7 View on GitHub
Added by daviddavis about 2 years ago

Turn on task_reject_on_worker_lost to prevent lost tasks

Turn on task_reject_on_worker_lost (aka CELERY_REJECT_ON_WORKER_LOST) to
prevent the loss of tasks when a worker dies.

fixes #2958
https://pulp.plan.io/issues/2958

Revision f61019e7 View on GitHub
Added by daviddavis about 2 years ago

Turn on task_reject_on_worker_lost to prevent lost tasks

Turn on task_reject_on_worker_lost (aka CELERY_REJECT_ON_WORKER_LOST) to
prevent the loss of tasks when a worker dies.

fixes #2958
https://pulp.plan.io/issues/2958

Revision f61019e7 View on GitHub
Added by daviddavis about 2 years ago

Turn on task_reject_on_worker_lost to prevent lost tasks

Turn on task_reject_on_worker_lost (aka CELERY_REJECT_ON_WORKER_LOST) to
prevent the loss of tasks when a worker dies.

fixes #2958
https://pulp.plan.io/issues/2958

History

#1 Updated by daviddavis about 2 years ago

  • Related to Issue #2954: Ensure that queued tasks are not lost by enabling task_reject_on_worker_lost for Celery 4 added

#2 Updated by daviddavis about 2 years ago

  • Description updated (diff)

#3 Updated by daviddavis about 2 years ago

  • Status changed from ASSIGNED to POST

#4 Updated by daviddavis about 2 years ago

I would probably recommend using the following workflow for testing as it's a bit more precise in that it only kill the child worker process. Using sudo systemctl restart pulp_resource_manager will kill both the child and the parent which will potentially leave the message in the queue and thus would be a false positive.

sudo systemctl stop pulp_workers # may need to wait 30 seconds for this to die
pulp-admin rpm repo sync run --repo-id zoo --bg
qpid-stat -q # observe that the queue depth of the resource_manager queue is 1
ps auxf | grep resource_manager # grab the child process id (e.g. 12345) 
sudo kill 12345
qpid-stat -q # observe that the queue depth of the resource_manager queue is still 1
sudo systemctl restart pulp_resource_manager
sudo systemctl start pulp_workers # may need to wait 30 seconds for this to start and pick up task
pulp-admin tasks list -s waiting # should be empty

#5 Updated by daviddavis about 2 years ago

  • Status changed from POST to MODIFIED

#6 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#7 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF