Issue #2457
closedWhen syncing do not associate units that are already associated to the repo
Description
I synced an el6 repo, where first sync too 1h 15 mins:
To download metadata 1 min
To generate db file 4 mins
To determine what to download 4 mins
To actually download the content 66 mins
To download addition units 2 mins
I re-synced same repo after i removed couple of rpms:
To download metadata 1 min
To generate db file 4 mins
To determine what to download 7 mins
To actually download the content 4mins (the ones I removed)
To download addition units 2 mins
After some investigation it was clear that step "determine what to download" takes the most time 7 mins
Half of this time is spent on metadata file handling, here we cannot do anything about that.
The other half of the time is spent here, where we check whether the unit that we want is present on the filesystem already.
https://github.com/pulp/pulp_rpm/blob/2.8-dev/plugins/pulp_rpm/plugins/importers/yum/existing.py#L92
We could work on time optimization in this part of the code, and at least not associate units that are already associated to the repo
and not to add them to the catalog because they are already there.
Another place where we could do same improvements is during the step "download addition units" ( like errata, comps, yumrepometadata file)
Related issues
Updated by ipanova@redhat.com over 7 years ago
- Project changed from Pulp to RPM Support
- Sprint/Milestone set to 29
- Version set to 2.8.7
Updated by bizhang over 7 years ago
- Priority changed from Normal to High
- Triaged changed from No to Yes
Updated by ttereshc over 7 years ago
- Status changed from NEW to POST
- Assignee set to ttereshc
Updated by ttereshc over 7 years ago
- Related to Task #2466: Remove unnecessary `deepcopy` calls for sync added
Added by ttereshc over 7 years ago
Updated by ttereshc over 7 years ago
- Status changed from POST to MODIFIED
Applied in changeset 0a487d33b0695fec1a87f4faae35ac86dd99706a.
Updated by ttereshc over 7 years ago
- Sprint/Milestone changed from 29 to 30
- Platform Release set to 2.10.4
Updated by semyers over 7 years ago
- Platform Release changed from 2.10.4 to 2.11.1
Updated by semyers over 7 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Reduce number of writes to db during sync
This commit eliminates the following unnecessary operations:
save()
to errata model even when no new collections were addedcloses #2457 https://pulp.plan.io/issues/2457