Pulp celerybeat can be removed from the Pulp3 architecture entirely if we can transition its two remaining software responsibilities to the workers themselves. Specifically:
- Celerybeat looks to find missing workers and call "mark_worker_offline()". That call effectively cancels any reserved tasks assigned to the worker and removes the worker's records from the database.
- Check that at least one resource_manager process is running and that at least one worker process is running and log loudly as necessary if they aren't.
These responsibilities should move to the worker heartbeat code here All workers should effectively run this cleanup whenever they see it needs to be done, this makes a shared responsibility to across all workers.
The scheduler.py should be deleted along with any orphaned code that was exclusively used by scheduler.py.
This will require a few other updates too:
- searching and updating the documentation to remove celerybeat references
- updating the devel environment to not deal with celerybeat or its units
- updating the galaxy playbooks to not deal with celerybeat or its units.
Also there are two correctness points to verify to ensure that any old records won't cause correctness problems:
- Verify the status API filters out records older than 30 seconds
- Verify the resource manager filters out workers who haven't checked in within 30 seconds
#1 Updated by dalley over 2 years ago
- Description updated (diff)
- Groomed changed from No to Yes
This looks good to me. My one question would be - is it acceptable for all workers to be logging "XYZ offline" messages individually? And if not, we should find a way to avoid flooding the logs with those messages.
#3 Updated by bmbouter over 2 years ago
@dalley, The worst-case logging situation with this change would be that there are many workers and 0 resource managers or many resource managers and 0 workers. I think it's ok for all of them to be logging loudly then since these situations should be relatively rare. I don't think an implementation that makes distributed logging more coordinated would be worth the effort and risk of creating that implementation.
@ipanova, I think we should wait until Pulp3 is GA and then close the celerybeat bugs. I think we can close any celerybeat issues that have the Pulp3 tag once this work is done, but I don't know of any of those.
Please register to edit this issue