Story #7659: [EPIC] As a user, orphan cleanup does not block all other tasks

Story #7659

## Background 

 When orphan cleanup runs it blocks all tasks submitted after it, until all workers become idle and then executes the orphan cleanup on the resource manager itself. Also orphan cleanup itself as an operation takes on the order of minutes, it's a non-trivial amount of time.  

 ## Problem 

 On a large Pulp system with many workers running long operations, a user will experience a very significant stalling of all newly submitted work. This occurs even when the user is interested in removing a single piece of content. For stakeholders concerned with tasking throughput on large installations, e.g. galaxy_ng this is not viable. 

 ## A simple solution 

 Have the orphan cleanup run asynchronously and without any locks. This will allow any worker to work on it in parallel with other Pulp task types. 

 #### Handling failures 

 It's a race between another task associating an orphan with a repository version and orphan cleanup deleting an orphan. 

 In the case orphan_cleanup wins, plugin writers will need to be told they can no longer guarantee that just because content was there a moment ago, it is there now.    I expect the exception can just bubble up to the user, and they user can restart the job at which point code like sync will get it right the second time. 

 In the case the other task associates an orphan, making it a non-orphan, after the orphan_cleanup has identified it as an orphan, we need to ensure the db will stop orphan_cleanup from deleting it via `on_delete=PROTECT`. 

 ## A more complex solution 

 Basically do the simple solution, but build in recovery workflows in various places that pulpcore provided to plugin writers, e.g. in the stages pipeline itself. I propose we do the simple solution first, and then based on the severity of impact implement the more complex recovery workflows later.
Back
Project

Profile

Help

Pulp

Story #7659