Project

Profile

Help

Backport #9116

Issue #8912: [EPIC] Issues with the traditional tasking system

Backport #8779 "Task started on removed worker" to 3.14.z

Added by mdellweg about 2 months ago. Updated about 2 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Triaged:
Yes
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 101
Quarter:

Description

After a postgres outage a couple of tasks were started on workers which logs claim to have been removed. The tasks then got stuck in a 'waiting' state and I had to cancel them to make them go away.

Logs showing worker being removed:

May 19 10:56:34 lxserv2285 rq[589125]: pulp [None]: pulpcore.tasking.worker_watcher:ERROR: Worker '2961917@lxserv2285' has gone missing, removing from list of workers
May 19 10:56:34 lxserv2285 rq[589125]: pulp [None]: pulpcore.tasking.worker_watcher:ERROR: The worker named 2961917@lxserv2285 is missing. Canceling the tasks in its queue.

Task being started after removal of workers (snippet):

{                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "pulp_created": "2021-05-19T10:57:17.819927Z",                                                                                                                                                                                                                                  "state": "waiting",                                                                                                                                                                                                                                                             
  "worker": "/pulp/api/v3/workers/4d159eb5-01e4-4750-a921-c5b28c411e4a/",
  }

The worker above is the worker that had been removed.

Any idea why the task was started on a worker that should have been removed from the list of workers?

On RHEL8, with python3-pulpcore-3.11.0-1.el8.noarch


Related issues

Copied from Pulp - Issue #8779: Task started on removed workerCLOSED - CURRENTRELEASE<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision 0e562768 View on GitHub
Added by mdellweg about 2 months ago

Prevent tasks being assigned to missing workers

backports #8779

fixes #9116

(cherry picked from commit 0cfaa8e7433b7cd272631a6f51b9f4a7b10224a7)

History

#1 Updated by mdellweg about 2 months ago

  • Copied from Issue #8779: Task started on removed worker added

#2 Updated by mdellweg about 2 months ago

  • Subject changed from Backport "Task started on removed worker" to 3.14.z to Backport #8779 "Task started on removed worker" to 3.14.z

#3 Updated by bmbouter about 2 months ago

  • Sprint/Milestone set to 3.14.3

#4 Updated by pulpbot about 2 months ago

  • Status changed from ASSIGNED to POST

#5 Updated by mdellweg about 2 months ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#6 Updated by pulpbot about 2 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF