Story #3707
closed
As a user, I can run multiple resource_managers for high availability
Status:
CLOSED - CURRENTRELEASE
Description
With the switch to RQ, the resource_manager became not-highly available. If you run two of them, bad things will happen.
We should make the resource_manager highly available again. there are two implementation options:
1. Use TaskLock again¶
Before the RQ transition we were using TaskLock: https://github.com/pulp/pulp/blob/968766e975c2eca00169470b8a028ec28c2a9274/pulpcore/pulpcore/app/models/task.py#L250-L280
This would be a PostgreSQL based table that only serves to elect a single master by one person writing the unique lock record.
2. Built something Redis based¶
This would be more something to contribute upstream to RQ.
3. Find something that we can wrap around the resource manager to provide a singleton function.¶
- Sprint Candidate changed from No to Yes
This will fix our high availability narrative so Pulp will have 0 single points of failure. It's a small amount of work for a big benefit. As such, I'm nominating for the next sprint.
- Sprint Candidate changed from Yes to No
After talking with dalley, we're going to do this work later relative to other higher priority work. I'm removing the sprint candidate flag.
- Description updated (diff)
Updating to reflect that TaskLock is being removed from the codebase.
- Description updated (diff)
- Sprint/Milestone set to 3.0.0
This was expressed as desirable for Katello's usage for the project itself (not Katello end users). So it's not a P1-P4, but it is important for them to adopt Pulp3 (and probably many others).
- Sprint/Milestone deleted (
3.0.0)
I just thought of an even easier option. We could make the worker name require it to be specifically "resource-manager". RQ has a behavior that if another worker already has that same name the worker cannot start. Then systemd would potentially be configured to try to restart anyway and this would allow you to have a N other hot-spares with automatic failover.
- Sprint/Milestone set to 3.0.0
We can accomplish this very easily. I will post a POC on Monday.
- Status changed from NEW to ASSIGNED
- Assignee set to bmbouter
- Status changed from ASSIGNED to POST
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Also available in: Atom
PDF
Change resource manager name disinclude hostname
This is a backwards compatible change, but it will be required once pulpcore requires the name to be exactly
resource-manager
.https://pulp.plan.io/issues/3707 re #3707