Project

Profile

Help

Issue #2457

closed

When syncing do not associate units that are already associated to the repo

Added by ipanova@redhat.com over 7 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
2.11.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 12
Quarter:

Description

I synced an el6 repo, where first sync too 1h 15 mins:

To download metadata 1 min
To generate db file 4 mins
To determine what to download 4 mins
To actually download the content 66 mins
To download addition units 2 mins

I re-synced same repo after i removed couple of rpms:

To download metadata 1 min
To generate db file 4 mins
To determine what to download 7 mins
To actually download the content 4mins (the ones I removed)
To download addition units 2 mins

After some investigation it was clear that step "determine what to download" takes the most time 7 mins
Half of this time is spent on metadata file handling, here we cannot do anything about that.
The other half of the time is spent here, where we check whether the unit that we want is present on the filesystem already.
https://github.com/pulp/pulp_rpm/blob/2.8-dev/plugins/pulp_rpm/plugins/importers/yum/existing.py#L92

We could work on time optimization in this part of the code, and at least not associate units that are already associated to the repo
and not to add them to the catalog because they are already there.

Another place where we could do same improvements is during the step "download addition units" ( like errata, comps, yumrepometadata file)


Related issues

Related to RPM Support - Task #2466: Remove unnecessary `deepcopy` calls for sync CLOSED - CURRENTRELEASEttereshc

Actions

Also available in: Atom PDF