Publish of a mid-size repo is slow
Sync and then publish EPEL7 repo https://dl.fedoraproject.org/pub/epel/7/x86_64/
Publish itself takes ~2 mins.
Potentially it's sqlitedb generation (do we need them at all, they are listed among functionality which doesn't make into Pulp3 in the roadmap doc).
Investigation/profiling is required to figure out if anything can be optimized.
If not, no-op publish functionality should be considered.
#8 Updated by daviddavis over 2 years ago
Looking at the cprofile results, it seems that a bulk of time is going towards making calls to the database (200 seconds) but per call, it's only about 0.00-0.02 seconds. I am going to take a look at reducing the number of sql calls we're making. I'm hoping to prefetch some of the data.
As for sqlite generation, removing that code saved about 46 seconds (441.148s vs 395.415s). I don't see a huge performance benefit in removing it although I am not opposed to doing so.
#9 Updated by daviddavis over 2 years ago
Using prefetching and bulk create I was able to speed up the publish task code from about 440 to 120 seconds.
I left the sqlite generation code in place but again, I will be ok to remove it.
I double checked with Katello and publishing epel7 (13,190 packages) in <= 2 min would meet their requirements.
#11 Updated by daviddavis over 2 years ago
I monitored the memory while running a publish on epel7. Here's a representative snapshot I saw:
PID/USER/PR/NI/VIRT/RES/SHR/S/%CPU/%MEM/TIME/COMMAND 29923 vagrant 20 0 1061332 888876 7872 R 82.0 22.0 1:08.23 rq
Looks like it consumed about 900 MB (was 500 MB before). We could add batching later if it memory becomes a problem.
Please register to edit this issue