Project

Profile

Help

Story #4527

Improve performance of rpm duplicate nevra check

Added by rmcgover 8 months ago. Updated 6 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Platform Release:
2.19.0
Blocks Release:
Backwards Incompatible:
No
Groomed:
Yes
Sprint Candidate:
No
Tags:
Pulp 2
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
Yes
Sprint:
Sprint 50

Description

In current versions of Pulp 2.x, uploading an RPM to a repo will remove other RPMs with the same NEVRA.

Currently, we are upgrading from an old version of Pulp 2.7, and I've found that performance of import_uploaded_unit tasks for RPMs has regressed significantly. In Pulp 2.7, imports would usually take around 0.5s. In Pulp 2-master, imports to the same repos have taken from 8 to 130 seconds, depending on the size of the repo.

By debugging I've found most of the time is spent in this duplicate check (remove_unit_duplicate_nevra).

This issue is for improving the performance of remove_unit_duplicate_nevra to reduce the severity of the performance regression.


Related issues

Copied to RPM Support - Test #4566: Improve performance of rpm duplicate nevra check CLOSED - COMPLETE Actions

Associated revisions

Revision 3bfdbd84 View on GitHub
Added by rmcgover 8 months ago

Improve performance of remove_unit_duplicate_nevra

This function, which is used whenever a new RPM is uploaded, was
slower than necessary.

The old implementation would first use find_repo_content_units,
which queries for all unit IDs of the required type in the repo
and then performs a unit query combining the relevant NEVRA with
the IDs.

In fact finding all those unit IDs is measurably slow for a large
repo, and it's much faster (while still correct) to simply search
for the RPMs to remove directly. It's harmless if this finds
some RPMs with same NEVRA which are already not in the repo.

On our installation, for a repo with ~24000 RPMs, this reduced
the runtime of this method from ~8 seconds to <0.1 seconds.

fixes #4527
https://pulp.plan.io/issues/4527

History

#2 Updated by rmcgover 8 months ago

  • Status changed from ASSIGNED to POST

#3 Updated by ttereshc 7 months ago

  • Groomed changed from No to Yes
  • Sprint set to Sprint 50

#4 Updated by rmcgover 7 months ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#5 Updated by ttereshc 7 months ago

  • Platform Release set to 2.19.0

#6 Updated by ttereshc 7 months ago

  • Sprint/Milestone set to 2.19.0

#7 Updated by ttereshc 7 months ago

  • Verification Required changed from No to Yes

#8 Updated by ttereshc 7 months ago

  • Status changed from MODIFIED to ON_QA

#9 Updated by bherring 7 months ago

  • Copied to Test #4566: Improve performance of rpm duplicate nevra check added

#10 Updated by ttereshc 7 months ago

  • Status changed from ON_QA to CLOSED - CURRENTRELEASE

#11 Updated by bmbouter 6 months ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF