Project

Profile

Help

Issue #1113

If an instance of pulp_celerybeat dies unexpectedly, Pulp incorrectly tries to "cancel all tasks in its queue"

Added by bmbouter almost 6 years ago. Updated about 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
1. Low
Version:
Master
Platform Release:
2.8.5
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Easy Fix, Pulp 2
Sprint:
Quarter:

Description

This bug is introduced with https://github.com/pulp/pulp/pull/1940/ because before then it wasn't possible to run multiple pulp_celerybeat processes concurrently.

1. Start a clustered pulp installation with two machines in the cluster. Suppose these hostnames are called boxA and boxB. Start all pulp_* and httpd services on boxA.
2. Start a second pulp_celerybeat instance on boxB.
3. Use the /status/ API to verify that you can see both entries. They should show as 'scheduler@boxA' and 'scheduler@boxB'.
4. kill -9 the pulp_celerybeat service on boxB.
5. Wait for 6 or 7 minutes
6. Observe a traceback similar to the following in the logs of boxA.

pulp.server.async.scheduler:ERROR: Workers 'scheduler@boxB' has gone missing, removing from list of workers
pulp.server.async.tasks:ERROR: The worker named scheduler@boxB is missing. Canceling the tasks in its queue.

Two things are wrong with this, and both of them are located in this section of code.

(1) It should never call _delete_worker(worker.name) which attempts to cancel tasks, log, and clean up reservations, none of which make sense to do for pulp_celerybeat. Instead it should delete the worker record synchronously and continue.

(2) The error message is misleading. I'll suggest it should read something like:

pulp_celerybeat 'scheduler@boxB' has gone missing.

Related issues

Related to Pulp - Issue #1114: If an instance of pulp_resource_manager dies unexpectedly, Pulp incorrectly tries to "cancel all tasks in its queue"CLOSED - WONTFIX<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Blocked by Pulp - Story #1060: As a user I can run Pulp with multiple celeryBeat instancesCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision a63536ac View on GitHub
Added by dalley almost 5 years ago

Fixes a misleading error message when a duplicate celerybeat worker is killed. Moved and renamed scheduler constants for consistency and clarity. closes #1113

Revision a63536ac View on GitHub
Added by dalley almost 5 years ago

Fixes a misleading error message when a duplicate celerybeat worker is killed. Moved and renamed scheduler constants for consistency and clarity. closes #1113

History

#1 Updated by bmbouter almost 6 years ago

  • Blocked by Story #1060: As a user I can run Pulp with multiple celeryBeat instances added

#2 Updated by bmbouter almost 6 years ago

  • Related to Issue #1114: If an instance of pulp_resource_manager dies unexpectedly, Pulp incorrectly tries to "cancel all tasks in its queue" added

#3 Updated by bmbouter almost 6 years ago

  • Triaged changed from No to Yes

#4 Updated by dalley almost 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley

#5 Updated by dalley almost 5 years ago

  • Status changed from ASSIGNED to POST

#7 Updated by dalley almost 5 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#8 Updated by semyers almost 5 years ago

  • Platform Release set to 2.8.5

#9 Updated by semyers almost 5 years ago

  • Status changed from MODIFIED to 5

#10 Updated by semyers almost 5 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE

#11 Updated by bmbouter about 2 years ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF