Project

Profile

Help

Story #3707

As a user, I can run multiple resource_managers for high availability

Added by bmbouter over 1 year ago. Updated about 2 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
No
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

With the switch to RQ, the resource_manager became not-highly available. If you run two of them, bad things will happen.

We should make the resource_manager highly available again. there are two implementation options:

1. Use TaskLock again

Before the RQ transition we were using TaskLock: https://github.com/pulp/pulp/blob/968766e975c2eca00169470b8a028ec28c2a9274/pulpcore/pulpcore/app/models/task.py#L250-L280

This would be a PostgreSQL based table that only serves to elect a single master by one person writing the unique lock record.

2. Built something Redis based

This would be more something to contribute upstream to RQ.

3. Find something that we can wrap around the resource manager to provide a singleton function.

Associated revisions

Revision 2047c4e7 View on GitHub
Added by bmbouter about 2 months ago

Change resource manager name disinclude hostname

This is a backwards compatible change, but it will be required once
pulpcore requires the name to be exactly `resource-manager`.

https://pulp.plan.io/issues/3707
re #3707

Revision 2047c4e7 View on GitHub
Added by bmbouter about 2 months ago

Change resource manager name disinclude hostname

This is a backwards compatible change, but it will be required once
pulpcore requires the name to be exactly `resource-manager`.

https://pulp.plan.io/issues/3707
re #3707

Revision d56a6175 View on GitHub
Added by bmbouter about 2 months ago

Allow multiple resource-managers to be started

The resource manager name must exctly equal `resource-manager` and
multiple of them can be started at once safely.

https://pulp.plan.io/issues/3707
closes #3707

Revision 97c26e40 View on GitHub
Added by bmbouter about 2 months ago

Allow multiple resource-managers to be started

The resource manager name must exctly equal `resource-manager` and
multiple of them can be started at once safely.

https://pulp.plan.io/issues/3707
closes #3707

(cherry picked from commit d56a61755350f3eeb32975f2d361ebb78f71c667)

History

#1 Updated by bmbouter over 1 year ago

  • Sprint Candidate changed from No to Yes

This will fix our high availability narrative so Pulp will have 0 single points of failure. It's a small amount of work for a big benefit. As such, I'm nominating for the next sprint.

#2 Updated by bmbouter over 1 year ago

  • Sprint Candidate changed from Yes to No

After talking with @dalley, we're going to do this work later relative to other higher priority work. I'm removing the sprint candidate flag.

#3 Updated by bmbouter over 1 year ago

  • Description updated (diff)

Updating to reflect that TaskLock is being removed from the codebase.

#4 Updated by bmbouter 9 months ago

  • Tags deleted (Pulp 3)

#5 Updated by bmbouter 9 months ago

  • Description updated (diff)
  • Sprint/Milestone set to 3.0.0

This was expressed as desirable for Katello's usage for the project itself (not Katello end users). So it's not a P1-P4, but it is important for them to adopt Pulp3 (and probably many others).

#6 Updated by bmbouter 5 months ago

  • Sprint/Milestone deleted (3.0.0)

#7 Updated by bmbouter 2 months ago

I just thought of an even easier option. We could make the worker name require it to be specifically "resource-manager". RQ has a behavior that if another worker already has that same name the worker cannot start. Then systemd would potentially be configured to try to restart anyway and this would allow you to have a N other hot-spares with automatic failover.

#8 Updated by bmbouter 2 months ago

  • Sprint/Milestone set to 3.0.0

We can accomplish this very easily. I will post a POC on Monday.

#9 Updated by bmbouter about 2 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter

#10 Updated by bmbouter about 2 months ago

  • Checklist item deleted (make the resource_manager highly available)
  • Checklist item deleted (Add a "Resource Manager" section to this doc that talks about high availability. https://docs.pulpproject.org/en/3.0/nightly/overview/components/index.html)
  • Status changed from ASSIGNED to POST

#11 Updated by bmbouter about 2 months ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#12 Updated by bmbouter about 2 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF