Project

Profile

Help

Issue #2457

closed

When syncing do not associate units that are already associated to the repo

Added by ipanova@redhat.com over 7 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
2.11.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 12
Quarter:

Description

I synced an el6 repo, where first sync too 1h 15 mins:

To download metadata 1 min
To generate db file 4 mins
To determine what to download 4 mins
To actually download the content 66 mins
To download addition units 2 mins

I re-synced same repo after i removed couple of rpms:

To download metadata 1 min
To generate db file 4 mins
To determine what to download 7 mins
To actually download the content 4mins (the ones I removed)
To download addition units 2 mins

After some investigation it was clear that step "determine what to download" takes the most time 7 mins
Half of this time is spent on metadata file handling, here we cannot do anything about that.
The other half of the time is spent here, where we check whether the unit that we want is present on the filesystem already.
https://github.com/pulp/pulp_rpm/blob/2.8-dev/plugins/pulp_rpm/plugins/importers/yum/existing.py#L92

We could work on time optimization in this part of the code, and at least not associate units that are already associated to the repo
and not to add them to the catalog because they are already there.

Another place where we could do same improvements is during the step "download addition units" ( like errata, comps, yumrepometadata file)


Related issues

Related to RPM Support - Task #2466: Remove unnecessary `deepcopy` calls for sync CLOSED - CURRENTRELEASEttereshc

Actions
Actions #1

Updated by ipanova@redhat.com over 7 years ago

  • Project changed from Pulp to RPM Support
  • Sprint/Milestone set to 29
  • Version set to 2.8.7
Actions #2

Updated by bizhang over 7 years ago

  • Priority changed from Normal to High
  • Triaged changed from No to Yes
Actions #3

Updated by ttereshc over 7 years ago

  • Status changed from NEW to POST
  • Assignee set to ttereshc
Actions #4

Updated by ttereshc over 7 years ago

  • Related to Task #2466: Remove unnecessary `deepcopy` calls for sync added

Added by ttereshc over 7 years ago

Revision 0a487d33 | View on GitHub

Reduce number of writes to db during sync

This commit eliminates the following unnecessary operations:

  • addition of RPM/SRPM/DRPM to PackageCatalog when unit is already associated with the repository
  • re-association of RPM/SRPM/DRPM with repository when such association already exists
  • additional save() to errata model even when no new collections were added

closes #2457 https://pulp.plan.io/issues/2457

Actions #5

Updated by ttereshc over 7 years ago

  • Status changed from POST to MODIFIED
Actions #6

Updated by ttereshc over 7 years ago

  • Sprint/Milestone changed from 29 to 30
  • Platform Release set to 2.10.4
Actions #7

Updated by semyers over 7 years ago

  • Platform Release changed from 2.10.4 to 2.11.1
Actions #8

Updated by semyers over 7 years ago

  • Status changed from MODIFIED to 5
Actions #10

Updated by semyers over 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #12

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 12
Actions #13

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (30)
Actions #14

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF