Issue #3045: Running orphan cleanup tasks simultaniously leads to high mongod cpu usage - Pulp

Actions

Send by e-mail Copy link

Issue #3045

closed

Running orphan cleanup tasks simultaniously leads to high mongod cpu usage

Added by ttereshc over 7 years ago. Updated about 5 years ago.

Status:

CLOSED - WONTFIX

Priority:

Normal

Assignee:

Category:

Sprint/Milestone:

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version:

2.8.7

Platform Release:

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Tags:

Pulp 2

Sprint:

Quarter:

Description

This is often seen on the Katello setups with smart proxies.

Even though it can be improved on Katello side - not to run orphan clean up after each sync on smart proxies, there is no need for two or more orphan tasks to be run in parallel. So suggestion here is to prevent that.

Making this change also would avoid the race condition reported in #3043

As per mhrivnak comment:

I don't see any harm in us making it a resource-reserving task. We aren't gaining much by running multiple in parallel. It should be a simple 1-line change to call "apply_async_with_reservation(...)" instead of "apply_async()"

Check related BZ for more details.

Related issues

Actions

Copy link

Updated by ttereshc over 7 years ago

Related to Issue #3043: Race condition during orphan cleanup added

Actions

Copy link

Updated by mhrivnak over 7 years ago

Description updated (diff)
Tags Easy Fix added

Actions

Copy link

Updated by amacdona@redhat.com over 7 years ago

Priority changed from Normal to High
Sprint/Milestone set to 45
Triaged changed from No to Yes

Actions

Copy link

Updated by bmbouter over 7 years ago

I don't think we should make any changes to restrict the concurrency of an orphan cleanup. A change here is helpful in this pattern of usage, but other patterns of usage would be harmed by a change in this area. Consider this use case:

1) A user performs an operation like a sync
2) The user wants to ensure that any orhpans are deleted

Assuming users are doing the above workflow concurrently, if orphan cleanups are linearized then the following would happen:

1) a user starts a sync on repo A
2) a user starts a sync on repo B
3) repo sync A completes and the user dispatches an orphan cleanup which begins immediately
4) repo sync B completes and the user dispatches an orphan cleanup which does not begin immediately
5) orphan cleanup from step (3) finishes
6) orphan cleanup from step (4) starts
7) orphan cleanup from step (5) finishes

So with the above pattern, we are introducing additional delay between steps (4) and step (7). My concern is that we are trying to be smarter than our users. If users load Pulp up with a bunch of tasks, they may just want Pulp to do them as fast as possible.

Actions

Copy link

Updated by mhrivnak over 7 years ago

How about if we just make it optional? We could preserve the existing behavior, and let the API user optionally request that the operation use a reservation. That would give the user the most flexibility to push their deployment if it can handle parallel orphan cleanups, or restrict them to one at a time if not.

Actions

Copy link

Updated by bmbouter over 7 years ago

I have two concerns with doing this optionally. (1) I don't think implementing that option will create much value. Users can issue the cancels that make sense for their given workflows, so having an option which also cancels tasks I don't think is valuable. Also (2) that option could have very unintented consequences on a multi-tenant Pulp system which we probably can't do.

One way that we can help (maybe-ish) is to put a tip or note section in the orphan cleanup docs about that situation. Even that though I don't think makes perfect sense though. We could probably put a similar note that reads "maybe cancel tasks if you are dispatching a bunch of redundant work to pulp" all over pulp's docs.

So with the multi-tenancy concerns, and that users can already manage task cancellation on their own, I think closing as NOTABUG would be the best. What do others think about these concerns?

Actions

Copy link

Updated by ttereshc over 7 years ago

I agree in general that we should take into account more than one scenario and give users more flexibility to control different parts of Pulp.
In case of orphan cleanup, I'm not sure what users can gain by running it in parallel and what is the issue with having orphans in db for a while?
What is a case when this delay between orphan tasks matters? For sync/publish/some other tasks that makes sense but I'm not sure it has value for the orphan one. Having orphans has no impact on operations user does in Pulp, more over it can help not to re-download content.

As seen in BZ, mongo uses a lot of CPU and everything is slowing down. We can run some tests but I think running orphan tasks sequentially may even speed up the process, especially when there are much more than two orphan tasks running in parallel. Also on the second run potentially there will be less orphans to go through and clean up.

Actions

Copy link

Updated by bmbouter over 7 years ago

Imagine a user who follows a workflow where they want to sync, clean up orphans, and then do something else after the orphan cleanup has completed. Users who do that benefit from these tasks running in parallel because overall the task wait time is lower.

Maybe there are some installations that would benefit from a synchronous runtime for this task type. That would be a feature not an issue. Also that seems relatively low priority since users can resolve their cpu load issue themselves by cancelling or not submitting orphan cleanup tasks themselves without us making a change.

So after thinking more about this, maybe leave as open but switch to a feature and send through feature planning. What do others think about this?

Actions

Copy link

#10

Updated by bmbouter over 7 years ago

To recap some irc discussion, I believe we decided if we were to make an adjustment it would be a Pulp installation-wide setting. Since that is a feature it needs more planning since this was being treated as a bugfix. I think we should take it off the sprint, but I want to hear from others before I change that.

Actions

Copy link

#11

Updated by ttereshc over 7 years ago

Priority changed from High to Normal
Sprint/Milestone deleted (45)

Actions

Copy link

#12

Updated by mhrivnak over 7 years ago

This is being worked around a different way, and we don't have agreement on how to proceed with this issue, thus we're taking it off the sprint. We can re-visit in the future if necessary.

Actions

Copy link

#13