Story #2567: As a user, I can quickly propogate one package change from one Pulp to another Pulp - RPM Support - Pulp

Actions

Send by e-mail Copy link

Story #2567

closed

As a user, I can quickly propogate one package change from one Pulp to another Pulp

Added by dkliban@redhat.com almost 8 years ago. Updated about 4 years ago.

Status:

CLOSED - WONTFIX

Priority:

Normal

Assignee:

Sprint/Milestone:

Start date:

Due date:

% Done:

Estimated time:

Platform Release:

Groomed:

Sprint Candidate:

Tags:

Performance, Pulp 2

Sprint:

Quarter:

Description

With nodes having been deprecated, it is necessary to improve performance of syncs from one Pulp to another Pulp. Each repository sync requires the downstream Pulp to parse metadata. This step can take minutes - depending on repository size. When hundreds of repositories with same content need to be synced from one Pulp to another Pulp, the process can take many hours - even though all the data is already present in the downstream Pulp after the first repository finished syncing.

Related issues

Actions

Copy link

Updated by mhrivnak almost 8 years ago

Pulp 3 opens up a lot of possibilities to leverage the REST API for pulp-pulp sync. Broadly, one Pulp can interrogate another, and get full representations of each unit in each repo. And it can easily retrieve the files as well.

Versioned repos could make this efficient. Child Pulp could watch the version of a repo on Parent Pulp, and when the version increases, request only the changes for the missing versions.

For ideas around implementing something like this, a "Pulp Importer" seems like a natural place to start.

Dennis, what do you think about moving this to the "Pulp" project instead of it being RPM-specific? If you were hoping to make some improvement on the 2 line, then we would probably be better off keeping it here. Although that could be difficult to achieve in a low-risk way.

Actions

Copy link

Updated by bmbouter almost 8 years ago

Sync performance is a very important goal for Pulp, but there are many ways to accomplish that goal. I want to make a case that we should optimize the normal sync codepaths and not create separate codepaths that optimize Pulp to Pulp sync. We had the nodes feature and it was effectively this, an optimized Pulp to Pulp sync. We got rid of it for all of these reasons: http://pulpproject.org/2016/12/07/deprecating-nodes/

So if we don't make an optimized Pulp-to-Pulp sync, how can we accomplish this? I'll suggest a two-pronged approach:

First, we should file and fix stories and bugs for specific improvements with the normal sync workflow. For instance Issue 1013 These types of improvements will help us reach this goal without the downsides we experiences with nodes.

Second, performance measurement. We don't track how our code performance is doing over time. We also don't track which portions of the sync workflow would benefit the most from improvement. As we investigate where we are spending the most time we can inform specific fixes to be handled by the first approach above.

Actions

Copy link

Updated by dkliban@redhat.com almost 8 years ago

bmbouter I like what you are proposing here. I see the value in not introducing any new code paths for us to support. The outcome of issue #1013 needs to be a reduction in time by at least a factor of 2 for it to truly make a difference.

Actions

Copy link

Updated by dkliban@redhat.com almost 8 years ago

Subject changed from As a user, I can optimize Pulp to Pulp syncs to As a user, I can quickly propogate one package change from one Pulp to another Pulp

Actions

Copy link