Improve the speed of syncing repository
Cloned from bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1932735
Description of problem: Since we have been facing more and more slow capsule sync issues, I had decided to take some time to fix the pulp 2 codes to improve this as we still have some time before migrating to pulp 3.
Below are the changes I made. With the change, RHEL 7 Server rpm repository will sync 40% - 50% quicker. Some small repositories like satellite tools and RHEL 7 extras repositories which currently taking a few minutes to sync will only take 1 minute or less to finish.
The codes fixes the following issues:
- Avoid reading unwanted repo metadata fields (such as Primary.xml, Updateinfo.xml) while determining units to download
- Avoid reading unwanted repo metadata fields (such as Primary.xml, Updateinfo.xml) when removing missing units (Mirror on Sync)
- Improve the query to purge duplicate units. Previously, it read the whole units_rpm collection (the largest collection). Once the collection reached millions of records it become very slow.
- Skipping repository publishing if Errata, Yum repo metadata and Comps are not changed. Previously, it would be triggered on every full sync.
Please register to edit this issue