Issue #7119
closedTasks stay in waiting state if worker that had resource reservation gone
Description
The problem appears in pulp_ansible
plugin but root cause is related to how Pulp schedules tasks.
Steps to reproduce:
- Spawn worker with name
worker-1
- Trigger import task, that uses resource reservation.
- Delete worker.
- Spawn worker with name
worker-2
- Trigger another import task for the same pulp repository.
Expected behavior:
Task is assigned to worker-2
Actual behavior:
Task is trying to be assigned to worker-1
, which has gone, therefore task is staying in waiting
state forever.
Note 1: This behavior is critical for running pulp in containerized environment such as Kubernetes, where containers are created and destroyed periodically. Worker instances names are based on container hostname which is randomly generated and unique for each container.
Workaround: To avoid this situation worker can be run with predictable name, however it prevents pulp workers from scaling and it is not possible to run more than a single worker at a time or a limited set of workers with hardcoded names.
Note 2: It doesn't seem pulp has a mechanism to cancel jobs in waiting state by timeout.
Related issues