Maintenance: Planio will be observing a scheduled maintenance window this Tuesday, November 5, 2024 from 03:00 UTC until 06:30 UTC to perform urgent network maintenance in our primary data center. Your Planio account will be unavailable during this maintenance window.
Issue #4591
closedPublish of a mid-size repo is slow
Description
Sync and then publish EPEL7 repo https://dl.fedoraproject.org/pub/epel/7/x86_64/
Publish itself takes ~2 mins.
Potentially it's sqlitedb generation (do we need them at all, they are listed among functionality which doesn't make into Pulp3 in the roadmap doc).
Investigation/profiling is required to figure out if anything can be optimized.
If not, no-op publish functionality should be considered.
Updated by bmbouter over 5 years ago
+1 to a cprofile collection as a next step. It would be great to see where all that time is going.
Updated by ttereshc over 5 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 50
Updated by daviddavis over 5 years ago
- Sprint changed from Sprint 51 to Sprint 52
Updated by daviddavis over 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to daviddavis
Updated by daviddavis over 5 years ago
Looking at the cprofile results, it seems that a bulk of time is going towards making calls to the database (200 seconds) but per call, it's only about 0.00-0.02 seconds. I am going to take a look at reducing the number of sql calls we're making. I'm hoping to prefetch some of the data.
As for sqlite generation, removing that code saved about 46 seconds (441.148s vs 395.415s). I don't see a huge performance benefit in removing it although I am not opposed to doing so.
Updated by daviddavis over 5 years ago
Using prefetching and bulk create I was able to speed up the publish task code from about 440 to 120 seconds.
I left the sqlite generation code in place but again, I will be ok to remove it.
I double checked with Katello and publishing epel7 (13,190 packages) in <= 2 min would meet their requirements.
Updated by daviddavis over 5 years ago
- Status changed from ASSIGNED to POST
Added by daviddavis over 5 years ago
Updated by daviddavis over 5 years ago
I monitored the memory while running a publish on epel7. Here's a representative snapshot I saw:
PID/USER/PR/NI/VIRT/RES/SHR/S/%CPU/%MEM/TIME/COMMAND
29923 vagrant 20 0 1061332 888876 7872 R 82.0 22.0 1:08.23 rq
Looks like it consumed about 900 MB (was 500 MB before). We could add batching later if it memory becomes a problem.
Updated by daviddavis over 5 years ago
- Status changed from POST to MODIFIED
Applied in changeset d107a1945c2943d26a8f2c5a521ba3aaa4a7c428.
Updated by ttereshc almost 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Speed up publishing using prefetching and bulk create
fixes #4591 https://pulp.plan.io/issues/4591