Issue #8306
closedImprove the speed of syncing repository
Description
Cloned from bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1932735
Description of problem: Since we have been facing more and more slow capsule sync issues, I had decided to take some time to fix the pulp 2 codes to improve this as we still have some time before migrating to pulp 3.
Below are the changes I made. With the change, RHEL 7 Server rpm repository will sync 40% - 50% quicker. Some small repositories like satellite tools and RHEL 7 extras repositories which currently taking a few minutes to sync will only take 1 minute or less to finish.
The codes fixes the following issues:
- Avoid reading unwanted repo metadata fields (such as Primary.xml, Updateinfo.xml) while determining units to download
- Avoid reading unwanted repo metadata fields (such as Primary.xml, Updateinfo.xml) when removing missing units (Mirror on Sync)
- Improve the query to purge duplicate units. Previously, it read the whole units_rpm collection (the largest collection). Once the collection reached millions of records it become very slow.
- Skipping repository publishing if Errata, Yum repo metadata and Comps are not changed. Previously, it would be triggered on every full sync.
Added by hyu about 3 years ago
Updated by pulpbot about 3 years ago
- Status changed from NEW to POST
Updated by hyu about 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset commit:pulp|89310783da47ac03d398f92ee667a5ec655363a4.
Updated by ipanova@redhat.com about 3 years ago
- Status changed from MODIFIED to POST
Updated by dalley about 3 years ago
- Status changed from POST to CLOSED - WONTFIX
Pulp 2 is in maintenance mode upstream, so we will not be merging this patch. However, any downstream that would like to include this, is free to do so. The patch looks reasonable to us.
Allow find_units method to include only certain fields
closes: #8306 https://pulp.plan.io/issues/8306