Issue #8890
closedPublishing a repository can take longer time to finish if many same errata are in many synced repositories
Description
Description of problem: Pulp can take more than an hour to publish a repository when a large number of repositories have been synced from upstream and same errata are existed in the synced repositories, such as RHEL 7.x, RHEL 7 EUS and different aches.
The more repositories have the same errata the more "erratum pkglist" entries will be created in the mongodb which can cause the performance degradation.
For example:
db.erratum_pkglists.find({errata_id: "RHSA-2018:2557"}).count() 387 db.erratum_pkglists.find({errata_id: "RHBA-2019:2180"}).count() 217
When publishing errata, Pulp will use the above query to get all package lists of the errata. This will take long time to process when they are many package lists returned by the query and each package list is consist of many packages.
As we can see below, the "Publish Errata" step is very slow. 53 minutes has passed, it has only processed about 2073 errata. It will take more than an hour to finish. ... { "description": "Publishing Errata", "details": "", "error_details": [], "items_total": 4789, "num_failures": 0, "num_processed": 2073, "num_success": 2073, "state": "IN_PROGRESS", "step_id": "2f09190d-013a-4300-9445-eccb52ad94fe", "step_type": "errata" }, ... "start_time": "2021-06-09T12:41:13Z",
date¶
Wed Jun 9 13:32:41 UTC 2021
Fix slow publish when errata are associated to many repos
closes: #8890 https://pulp.plan.io/issues/8890