Project

Profile

Help

Story #2567

As a user, I can quickly propogate one package change from one Pulp to another Pulp

Added by dkliban@redhat.com almost 3 years ago. Updated 8 months ago.

Status:
NEW
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2, Sync Performance
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

With nodes having been deprecated, it is necessary to improve performance of syncs from one Pulp to another Pulp. Each repository sync requires the downstream Pulp to parse metadata. This step can take minutes - depending on repository size. When hundreds of repositories with same content need to be synced from one Pulp to another Pulp, the process can take many hours - even though all the data is already present in the downstream Pulp after the first repository finished syncing.


Related issues

Related to RPM Support - Refactor #1013: Reduce sync time spent processing metadata up-front CLOSED - WONTFIX Actions

History

#1 Updated by mhrivnak almost 3 years ago

Pulp 3 opens up a lot of possibilities to leverage the REST API for pulp-pulp sync. Broadly, one Pulp can interrogate another, and get full representations of each unit in each repo. And it can easily retrieve the files as well.

Versioned repos could make this efficient. Child Pulp could watch the version of a repo on Parent Pulp, and when the version increases, request only the changes for the missing versions.

For ideas around implementing something like this, a "Pulp Importer" seems like a natural place to start.

Dennis, what do you think about moving this to the "Pulp" project instead of it being RPM-specific? If you were hoping to make some improvement on the 2 line, then we would probably be better off keeping it here. Although that could be difficult to achieve in a low-risk way.

#2 Updated by bmbouter almost 3 years ago

Sync performance is a very important goal for Pulp, but there are many ways to accomplish that goal. I want to make a case that we should optimize the normal sync codepaths and not create separate codepaths that optimize Pulp to Pulp sync. We had the nodes feature and it was effectively this, an optimized Pulp to Pulp sync. We got rid of it for all of these reasons: http://pulpproject.org/2016/12/07/deprecating-nodes/

So if we don't make an optimized Pulp-to-Pulp sync, how can we accomplish this? I'll suggest a two-pronged approach:

First, we should file and fix stories and bugs for specific improvements with the normal sync workflow. For instance Issue 1013 These types of improvements will help us reach this goal without the downsides we experiences with nodes.

Second, performance measurement. We don't track how our code performance is doing over time. We also don't track which portions of the sync workflow would benefit the most from improvement. As we investigate where we are spending the most time we can inform specific fixes to be handled by the first approach above.

#3 Updated by dkliban@redhat.com almost 3 years ago

@bmbouter I like what you are proposing here. I see the value in not introducing any new code paths for us to support. The outcome of issue #1013 needs to be a reduction in time by at least a factor of 2 for it to truly make a difference.

#4 Updated by dkliban@redhat.com almost 3 years ago

  • Subject changed from As a user, I can optimize Pulp to Pulp syncs to As a user, I can quickly propogate one package change from one Pulp to another Pulp

#5 Updated by dkliban@redhat.com almost 3 years ago

  • Related to Refactor #1013: Reduce sync time spent processing metadata up-front added

#6 Updated by dkliban@redhat.com almost 3 years ago

  • Tags Sync Performance added

#7 Updated by bmbouter 8 months ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF