Actions
Issue #8779
closedIssue #8912: [EPIC] Issues with the traditional tasking system
Task started on removed worker
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 101
Quarter:
Description
After a postgres outage a couple of tasks were started on workers which logs claim to have been removed. The tasks then got stuck in a 'waiting' state and I had to cancel them to make them go away.
Logs showing worker being removed:
May 19 10:56:34 lxserv2285 rq[589125]: pulp [None]: pulpcore.tasking.worker_watcher:ERROR: Worker '2961917@lxserv2285' has gone missing, removing from list of workers
May 19 10:56:34 lxserv2285 rq[589125]: pulp [None]: pulpcore.tasking.worker_watcher:ERROR: The worker named 2961917@lxserv2285 is missing. Canceling the tasks in its queue.
Task being started after removal of workers (snippet):
{ "pulp_created": "2021-05-19T10:57:17.819927Z", "state": "waiting",
"worker": "/pulp/api/v3/workers/4d159eb5-01e4-4750-a921-c5b28c411e4a/",
}
The worker above is the worker that had been removed.
Any idea why the task was started on a worker that should have been removed from the list of workers?
On RHEL8, with python3-pulpcore-3.11.0-1.el8.noarch
Related issues
Actions
Prevent tasks being assigned to missing workers
fixes #8779