Story #7659: [EPIC] As a user, orphan cleanup does not block all other tasks - Pulp

Actions

Send by e-mail Copy link

Story #7659

closed

[EPIC] As a user, orphan cleanup does not block all other tasks

Added by bmbouter over 4 years ago. Updated over 3 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

Category:

Sprint/Milestone:

Start date:

Due date:

% Done:

100%

Estimated time:

(Total: 0:00 h)

Platform Release:

Groomed:

Yes

Sprint Candidate:

Yes

Tags:

Sprint:

Sprint 107

Quarter:

Q2-2021

Description

Background¶

When orphan cleanup runs it blocks all tasks submitted after it, until all workers become idle and then executes the orphan cleanup on the resource manager itself. Also orphan cleanup itself as an operation takes on the order of minutes, it's a non-trivial amount of time.

Problem¶

On a large Pulp system with many workers running long operations, a user will experience a very significant stalling of all newly submitted work. This occurs even when the user is interested in removing a single piece of content. For stakeholders concerned with tasking throughput on large installations, e.g. galaxy_ng this is not viable.

A simple solution¶

Have the orphan cleanup run asynchronously and without any locks. This will allow any worker to work on it in parallel with other Pulp task types.

Handling failures¶

It's a race between another task associating an orphan with a repository version and orphan cleanup deleting an orphan.

In the case orphan_cleanup wins, plugin writers will need to be told they can no longer guarantee that just because content was there a moment ago, it is there now. I expect the exception can just bubble up to the user, and they user can restart the job at which point code like sync will get it right the second time.

In the case the other task associates an orphan, making it a non-orphan, after the orphan_cleanup has identified it as an orphan, we need to ensure the db will stop orphan_cleanup from deleting it via on_delete=PROTECT.

Proposal for complex solution¶

Add a new field timestamp_of_interest on the artifact and content models

This field should be set at the artifact/content creation time
This field sould be updated whenever we work with the existing artifact/content
Add a configurable option preserve_orphaned_content = X seconds/minutes. It can be whether a global setting or a parameter to the orphan clean up call.

It is expected that the artifact will be added to the content (unorphaning the artifact) quicker than X seconds/minutes (TBD) from the timestamp_of_interest. Similarly, content will be added to a repo version (unorphaning the content) also within X seconds/minutes(TBD).

Sync Pipeline¶

timestamp_of_interest This timestamp will be set in the sync pipeline in the QueryExistingArtifacts and QueryExistingContents

When querying for existing artifacts/content and they are found, set the timestamp. If committing this transaction fails due to orphan cleanup committing a transaction that removed the objects having timestamp_of_interest set, retry the exact same transaction again, the second time removed objects will not be found and the pipeline will proceed normally to re-download/re-create the artifact/content in the further stages.

For newly created artifacts and content we need to set this timestamp as well, this way they will be marked 'as-planned-to-be-used'( aka I plan to add this artifact to the content or I plan to add this content to the repo)

Upload¶

if one shot upload - handle in 1 transaction artifact and content creation.
for non 1 shot upload, set the timestamp during content/artifact creation, or in case artifact/content are existing ones, update the timestamp. It is expected users will make their second call to associate the orphaned artifact within X seconds/minutes (TDB) since timestamp_of_interest.

Modify¶

set the timestamp on the content specified in the add_content_hrefs to prevent the case when orphan would clean them up in the middle of the running task.

Orphan cleanup logic (orphan clean up task)¶

Will be able to run in parallel without locks.

Remove Content that:

has no membership ( does not belong to any of the repo versions)
timestamp_of_interest is older than X seconds( was marked to be used X seconds ago but still does not belong to any of repo versions)

Remove Artifact that:

has no membership ( does not belong to any of the content)
timestamp_of_interest is older than X seconds( was marked to be used X second ago but still does not belong to any of the content)

Under consideration¶

some syncs can take long time to finish, because of that we could look at the most recent and long running task before removing orphans

Sub-issues 2 (0 open — 2 closed)

Actions

Project

Profile

Help

Pulp

Agile boards

Custom queries

Story #7659

[EPIC] As a user, orphan cleanup does not block all other tasks

Background¶

Problem¶

A simple solution¶

Handling failures¶

Proposal for complex solution¶

Sync Pipeline¶

Upload¶

Modify¶

Orphan cleanup logic (orphan clean up task)¶

Under consideration¶

Updated by bmbouter over 4 years ago

Updated by bmbouter over 4 years ago

Updated by iballou over 4 years ago

Updated by ipanova@redhat.com over 4 years ago

Updated by bmbouter over 4 years ago

Updated by ipanova@redhat.com over 4 years ago

Updated by ipanova@redhat.com about 4 years ago

Updated by ipanova@redhat.com about 4 years ago

Updated by daviddavis about 4 years ago

Updated by daviddavis about 4 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by daviddavis over 3 years ago

Updated by daviddavis over 3 years ago

Updated by daviddavis over 3 years ago

Updated by daviddavis over 3 years ago

Updated by daviddavis over 3 years ago

Updated by daviddavis over 3 years ago

Updated by daviddavis over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by daviddavis over 3 years ago

Updated by bmbouter over 3 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by daviddavis over 3 years ago

Updated by mdellweg over 3 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by daviddavis over 3 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by daviddavis over 3 years ago

Updated by rchan over 3 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by rchan over 3 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by daviddavis over 3 years ago

Updated by daviddavis over 3 years ago

Updated by rchan over 3 years ago

Updated by daviddavis over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by ipanova@redhat.com over 3 years ago