Story #4527
closedImprove performance of rpm duplicate nevra check
100%
Description
In current versions of Pulp 2.x, uploading an RPM to a repo will remove other RPMs with the same NEVRA.
Currently, we are upgrading from an old version of Pulp 2.7, and I've found that performance of import_uploaded_unit tasks for RPMs has regressed significantly. In Pulp 2.7, imports would usually take around 0.5s. In Pulp 2-master, imports to the same repos have taken from 8 to 130 seconds, depending on the size of the repo.
By debugging I've found most of the time is spent in this duplicate check (remove_unit_duplicate_nevra).
This issue is for improving the performance of remove_unit_duplicate_nevra to reduce the severity of the performance regression.
Related issues
Added by rmcgover almost 6 years ago
Updated by rmcgover almost 6 years ago
Pull request: https://github.com/pulp/pulp_rpm/pull/1297
Updated by ttereshc almost 6 years ago
- Groomed changed from No to Yes
- Sprint set to Sprint 50
Updated by rmcgover almost 6 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset 3bfdbd849aaad46cbe98bb9ba7d1dd16d63b5726.
Updated by bherring almost 6 years ago
- Copied to Test #4566: Improve performance of rpm duplicate nevra check added
Updated by ttereshc almost 6 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Improve performance of remove_unit_duplicate_nevra
This function, which is used whenever a new RPM is uploaded, was slower than necessary.
The old implementation would first use find_repo_content_units, which queries for all unit IDs of the required type in the repo and then performs a unit query combining the relevant NEVRA with the IDs.
In fact finding all those unit IDs is measurably slow for a large repo, and it's much faster (while still correct) to simply search for the RPMs to remove directly. It's harmless if this finds some RPMs with same NEVRA which are already not in the repo.
On our installation, for a repo with ~24000 RPMs, this reduced the runtime of this method from ~8 seconds to <0.1 seconds.
fixes #4527 https://pulp.plan.io/issues/4527