Story #2567
closed
As a user, I can quickly propogate one package change from one Pulp to another Pulp
Description
With nodes having been deprecated, it is necessary to improve performance of syncs from one Pulp to another Pulp. Each repository sync requires the downstream Pulp to parse metadata. This step can take minutes - depending on repository size. When hundreds of repositories with same content need to be synced from one Pulp to another Pulp, the process can take many hours - even though all the data is already present in the downstream Pulp after the first repository finished syncing.
Pulp 3 opens up a lot of possibilities to leverage the REST API for pulp-pulp sync. Broadly, one Pulp can interrogate another, and get full representations of each unit in each repo. And it can easily retrieve the files as well.
Versioned repos could make this efficient. Child Pulp could watch the version of a repo on Parent Pulp, and when the version increases, request only the changes for the missing versions.
For ideas around implementing something like this, a "Pulp Importer" seems like a natural place to start.
Dennis, what do you think about moving this to the "Pulp" project instead of it being RPM-specific? If you were hoping to make some improvement on the 2 line, then we would probably be better off keeping it here. Although that could be difficult to achieve in a low-risk way.
Sync performance is a very important goal for Pulp, but there are many ways to accomplish that goal. I want to make a case that we should optimize the normal sync codepaths and not create separate codepaths that optimize Pulp to Pulp sync. We had the nodes feature and it was effectively this, an optimized Pulp to Pulp sync. We got rid of it for all of these reasons: http://pulpproject.org/2016/12/07/deprecating-nodes/
So if we don't make an optimized Pulp-to-Pulp sync, how can we accomplish this? I'll suggest a two-pronged approach:
First, we should file and fix stories and bugs for specific improvements with the normal sync workflow. For instance Issue 1013 These types of improvements will help us reach this goal without the downsides we experiences with nodes.
Second, performance measurement. We don't track how our code performance is doing over time. We also don't track which portions of the sync workflow would benefit the most from improvement. As we investigate where we are spending the most time we can inform specific fixes to be handled by the first approach above.
bmbouter I like what you are proposing here. I see the value in not introducing any new code paths for us to support. The outcome of issue #1013 needs to be a reduction in time by at least a factor of 2 for it to truly make a difference.
- Subject changed from As a user, I can optimize Pulp to Pulp syncs to As a user, I can quickly propogate one package change from one Pulp to another Pulp
- Related to Refactor #1013: Reduce sync time spent processing metadata up-front added
- Tags Sync Performance added
- Tags Performance added
- Tags deleted (
Sync Performance)
- Status changed from NEW to CLOSED - WONTFIX
Also available in: Atom
PDF