Issue #4591
closed
Publish of a mid-size repo is slow
Status:
CLOSED - CURRENTRELEASE
Description
Sync and then publish EPEL7 repo https://dl.fedoraproject.org/pub/epel/7/x86_64/
Publish itself takes ~2 mins.
Potentially it's sqlitedb generation (do we need them at all, they are listed among functionality which doesn't make into Pulp3 in the roadmap doc).
Investigation/profiling is required to figure out if anything can be optimized.
If not, no-op publish functionality should be considered.
+1 to a cprofile collection as a next step. It would be great to see where all that time is going.
- Triaged changed from No to Yes
- Sprint set to Sprint 50
- Sprint changed from Sprint 50 to Sprint 51
- Sprint changed from Sprint 51 to Sprint 52
- Sprint changed from Sprint 52 to Sprint 53
- Status changed from NEW to ASSIGNED
- Assignee set to daviddavis
Looking at the cprofile results, it seems that a bulk of time is going towards making calls to the database (200 seconds) but per call, it's only about 0.00-0.02 seconds. I am going to take a look at reducing the number of sql calls we're making. I'm hoping to prefetch some of the data.
As for sqlite generation, removing that code saved about 46 seconds (441.148s vs 395.415s). I don't see a huge performance benefit in removing it although I am not opposed to doing so.
Using prefetching and bulk create I was able to speed up the publish task code from about 440 to 120 seconds.
I left the sqlite generation code in place but again, I will be ok to remove it.
I double checked with Katello and publishing epel7 (13,190 packages) in <= 2 min would meet their requirements.
- Status changed from ASSIGNED to POST
I monitored the memory while running a publish on epel7. Here's a representative snapshot I saw:
PID/USER/PR/NI/VIRT/RES/SHR/S/%CPU/%MEM/TIME/COMMAND
29923 vagrant 20 0 1061332 888876 7872 R 82.0 22.0 1:08.23 rq
Looks like it consumed about 900 MB (was 500 MB before). We could add batching later if it memory becomes a problem.
- Status changed from POST to MODIFIED
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Also available in: Atom
PDF
Speed up publishing using prefetching and bulk create
fixes #4591 https://pulp.plan.io/issues/4591