Story #2567


As a user, I can quickly propogate one package change from one Pulp to another Pulp

Added by almost 6 years ago. Updated almost 2 years ago.

Start date:
Due date:
% Done:


Estimated time:
Platform Release:
Sprint Candidate:
Performance, Pulp 2


With nodes having been deprecated, it is necessary to improve performance of syncs from one Pulp to another Pulp. Each repository sync requires the downstream Pulp to parse metadata. This step can take minutes - depending on repository size. When hundreds of repositories with same content need to be synced from one Pulp to another Pulp, the process can take many hours - even though all the data is already present in the downstream Pulp after the first repository finished syncing.

Related issues

Related to RPM Support - Refactor #1013: Reduce sync time spent processing metadata up-frontCLOSED - WONTFIX

Actions #1

Updated by mhrivnak almost 6 years ago

Pulp 3 opens up a lot of possibilities to leverage the REST API for pulp-pulp sync. Broadly, one Pulp can interrogate another, and get full representations of each unit in each repo. And it can easily retrieve the files as well.

Versioned repos could make this efficient. Child Pulp could watch the version of a repo on Parent Pulp, and when the version increases, request only the changes for the missing versions.

For ideas around implementing something like this, a "Pulp Importer" seems like a natural place to start.

Dennis, what do you think about moving this to the "Pulp" project instead of it being RPM-specific? If you were hoping to make some improvement on the 2 line, then we would probably be better off keeping it here. Although that could be difficult to achieve in a low-risk way.

Actions #2

Updated by bmbouter almost 6 years ago

Sync performance is a very important goal for Pulp, but there are many ways to accomplish that goal. I want to make a case that we should optimize the normal sync codepaths and not create separate codepaths that optimize Pulp to Pulp sync. We had the nodes feature and it was effectively this, an optimized Pulp to Pulp sync. We got rid of it for all of these reasons:

So if we don't make an optimized Pulp-to-Pulp sync, how can we accomplish this? I'll suggest a two-pronged approach:

First, we should file and fix stories and bugs for specific improvements with the normal sync workflow. For instance Issue 1013 These types of improvements will help us reach this goal without the downsides we experiences with nodes.

Second, performance measurement. We don't track how our code performance is doing over time. We also don't track which portions of the sync workflow would benefit the most from improvement. As we investigate where we are spending the most time we can inform specific fixes to be handled by the first approach above.

Actions #3

Updated by almost 6 years ago

bmbouter I like what you are proposing here. I see the value in not introducing any new code paths for us to support. The outcome of issue #1013 needs to be a reduction in time by at least a factor of 2 for it to truly make a difference.

Actions #4

Updated by over 5 years ago

  • Subject changed from As a user, I can optimize Pulp to Pulp syncs to As a user, I can quickly propogate one package change from one Pulp to another Pulp
Actions #5

Updated by over 5 years ago

  • Related to Refactor #1013: Reduce sync time spent processing metadata up-front added
Actions #6

Updated by over 5 years ago

  • Tags Sync Performance added
Actions #7

Updated by bmbouter over 3 years ago

  • Tags Pulp 2 added
Actions #8

Updated by bmbouter over 2 years ago

  • Tags Performance added
  • Tags deleted (Sync Performance)
Actions #9

Updated by dalley almost 2 years ago

  • Status changed from NEW to CLOSED - WONTFIX

Also available in: Atom PDF