As a user, I can quickly propogate one package change from one Pulp to another Pulp
With nodes having been deprecated, it is necessary to improve performance of syncs from one Pulp to another Pulp. Each repository sync requires the downstream Pulp to parse metadata. This step can take minutes - depending on repository size. When hundreds of repositories with same content need to be synced from one Pulp to another Pulp, the process can take many hours - even though all the data is already present in the downstream Pulp after the first repository finished syncing.
Updated by mhrivnak almost 6 years ago
Pulp 3 opens up a lot of possibilities to leverage the REST API for pulp-pulp sync. Broadly, one Pulp can interrogate another, and get full representations of each unit in each repo. And it can easily retrieve the files as well.
Versioned repos could make this efficient. Child Pulp could watch the version of a repo on Parent Pulp, and when the version increases, request only the changes for the missing versions.
For ideas around implementing something like this, a "Pulp Importer" seems like a natural place to start.
Dennis, what do you think about moving this to the "Pulp" project instead of it being RPM-specific? If you were hoping to make some improvement on the 2 line, then we would probably be better off keeping it here. Although that could be difficult to achieve in a low-risk way.
Updated by bmbouter almost 6 years ago
Sync performance is a very important goal for Pulp, but there are many ways to accomplish that goal. I want to make a case that we should optimize the normal sync codepaths and not create separate codepaths that optimize Pulp to Pulp sync. We had the nodes feature and it was effectively this, an optimized Pulp to Pulp sync. We got rid of it for all of these reasons: http://pulpproject.org/2016/12/07/deprecating-nodes/
So if we don't make an optimized Pulp-to-Pulp sync, how can we accomplish this? I'll suggest a two-pronged approach:
First, we should file and fix stories and bugs for specific improvements with the normal sync workflow. For instance Issue 1013 These types of improvements will help us reach this goal without the downsides we experiences with nodes.
Second, performance measurement. We don't track how our code performance is doing over time. We also don't track which portions of the sync workflow would benefit the most from improvement. As we investigate where we are spending the most time we can inform specific fixes to be handled by the first approach above.