Project

Profile

Help

Issue #7119

closed

Tasks stay in waiting state if worker that had resource reservation gone

Added by osapryki over 4 years ago. Updated over 4 years ago.

Status:
CLOSED - NOTABUG
Priority:
High
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 77
Quarter:

Description

The problem appears in pulp_ansible plugin but root cause is related to how Pulp schedules tasks.

Steps to reproduce:

  1. Spawn worker with name worker-1
  2. Trigger import task, that uses resource reservation.
  3. Delete worker.
  4. Spawn worker with name worker-2
  5. Trigger another import task for the same pulp repository.

Expected behavior:

Task is assigned to worker-2

Actual behavior:

Task is trying to be assigned to worker-1, which has gone, therefore task is staying in waiting state forever.

Note 1: This behavior is critical for running pulp in containerized environment such as Kubernetes, where containers are created and destroyed periodically. Worker instances names are based on container hostname which is randomly generated and unique for each container.

Workaround: To avoid this situation worker can be run with predictable name, however it prevents pulp workers from scaling and it is not possible to run more than a single worker at a time or a limited set of workers with hardcoded names.

Note 2: It doesn't seem pulp has a mechanism to cancel jobs in waiting state by timeout.


Related issues

Related to Pulp - Issue #6449: Tasks stuck in Waiting stateCLOSED - CURRENTRELEASEdalleyActions

Also available in: Atom PDF