Task #2430
closed
Keep Worker model records around instead of deleting them
Status:
CLOSED - CURRENTRELEASE
Description
Currently Pulp worker DB records are deleted when workers go missing or are shut down, and created when they are first observed or updated. It would be beneficial to keep these records. This allows someone to read these records post-mortem. It has a tertiary benefit that Tasks won't be cascade deleted due to the Worker objects being removed.
To resolve this, introduce a boolean named 'online' on the Worker model. The code in pulp_celerybeat that manages these Worker records would be updated to set the 'online' value when workers start and stop. The field's default should correspond with its usage.
Currently on 3.0-dev the pulp_celerybeat code all lives inside of here[0].
[0]: https://github.com/pulp/pulp/blob/3.0-dev/platform/pulp/tasking/services/worker_watcher.py
This may require us to re-think the uniqueness constraint. I suspect the tasking system will want to continue having an easy way to guarantee that there's only one active worker at a time with a given name.
Perhaps we could add a field that's a timestamp for when the worker was "archived", let that field be null, and make that field plus the "name" be unique together? That would allow just one non-archived worker, and any number of archived ones.
I was imagining we would keep the existing uniqueness constraint[0], and when pulp_celerybeat brings a worker online and there is an existing record in the db, it would update(online=True, last_heartbeat=<now>)
I do think we want to "reuse" worker records because we want to aggregate all Task foreign keys onto the same Worker instance by name. Otherwise when we use worker.tasks
we will only be getting the tasks for the most recent instance which is generally not what we want when we write that code. To then get the full list you would have to aggregate all Tasks that have a foreign key to a worker whose name=<someworkername>. Given ^ just reusing the same worker instance would allow us to sidestep this aggregation issue completely.
[0]: https://github.com/pulp/pulp/blob/e29b9e796e5f02699558fc7be85d5f94f85000e4/app/pulp/app/models/task.py#L47
Very nice idea. I like it.
- Description updated (diff)
- Groomed changed from No to Yes
- Description updated (diff)
- Status changed from NEW to ASSIGNED
- Assignee set to fdobrovo
- Status changed from ASSIGNED to POST
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
- Sprint Candidate deleted (
Yes)
- Sprint/Milestone set to 3.0.0
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Also available in: Atom
PDF
Keep workers records when they gone offline
closes #2430 https://pulp.plan.io/issues/2430