Update the failure and recovery scenarios in our user docs.
The user docs have a section on failure and recovery that is somewhat vague. Recently some new notes were written (below). These new notes should be added into this section or incorporated somehow.
======== NOTES ==========
For a recap of the Pulp components read here .
If a pulp_worker dies, the task currently being worked on, and possibly a small number of related tasks, will not be processed. They will be marked cancelled after 5 minutes, or whenever the worker re-starts, whichever comes first. Until the tasks are marked as cancelled, they will show the task state when the failure occurred. Cancellation after 5 minutes is dependent on pulp_celerybeat running.
If pulp_celerybeat dies, if new workers then start, they won't be given work. If existing workers stop, Pulp will continue assigning them work. Once restarted, pulp_celerybeat will synchronize with the current state of all workers. Scheduled tasks will not run while pulp_celerybeat is down, but they will instead run when celerybeat is restarted.
If pulp_resource_manager dies, the Pulp tasking system will halt. Once restarted it will resume.
If the webserver dies the API will become unavailable until it is restored.
==Important new Related Features==
- In Pulp 2.6.0, the /status/ url will show the health of all Pulp components. Read more about it here , which includes sample response output.
=========== END NOTES =============
Please register to edit this issue