Project

Profile

Help

Issue #4591

closed

Publish of a mid-size repo is slow

Added by ttereshc over 5 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 53
Quarter:

Description

Sync and then publish EPEL7 repo https://dl.fedoraproject.org/pub/epel/7/x86_64/
Publish itself takes ~2 mins.

Potentially it's sqlitedb generation (do we need them at all, they are listed among functionality which doesn't make into Pulp3 in the roadmap doc).

Investigation/profiling is required to figure out if anything can be optimized.
If not, no-op publish functionality should be considered.

Actions #1

Updated by bmbouter over 5 years ago

+1 to a cprofile collection as a next step. It would be great to see where all that time is going.

Actions #2

Updated by ttereshc over 5 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 50
Actions #3

Updated by rchan over 5 years ago

  • Sprint changed from Sprint 50 to Sprint 51
Actions #4

Updated by daviddavis over 5 years ago

  • Sprint changed from Sprint 51 to Sprint 52
Actions #5

Updated by bmbouter over 5 years ago

  • Tags deleted (Pulp 3)
Actions #6

Updated by rchan over 5 years ago

  • Sprint changed from Sprint 52 to Sprint 53
Actions #7

Updated by daviddavis over 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
Actions #8

Updated by daviddavis over 5 years ago

Looking at the cprofile results, it seems that a bulk of time is going towards making calls to the database (200 seconds) but per call, it's only about 0.00-0.02 seconds. I am going to take a look at reducing the number of sql calls we're making. I'm hoping to prefetch some of the data.

As for sqlite generation, removing that code saved about 46 seconds (441.148s vs 395.415s). I don't see a huge performance benefit in removing it although I am not opposed to doing so.

Actions #9

Updated by daviddavis over 5 years ago

Using prefetching and bulk create I was able to speed up the publish task code from about 440 to 120 seconds.

I left the sqlite generation code in place but again, I will be ok to remove it.

I double checked with Katello and publishing epel7 (13,190 packages) in <= 2 min would meet their requirements.

Actions #10

Updated by daviddavis over 5 years ago

  • Status changed from ASSIGNED to POST

Added by daviddavis over 5 years ago

Revision d107a194 | View on GitHub

Speed up publishing using prefetching and bulk create

fixes #4591 https://pulp.plan.io/issues/4591

Actions #11

Updated by daviddavis over 5 years ago

I monitored the memory while running a publish on epel7. Here's a representative snapshot I saw:

PID/USER/PR/NI/VIRT/RES/SHR/S/%CPU/%MEM/TIME/COMMAND
29923 vagrant   20   0 1061332 888876   7872 R  82.0  22.0   1:08.23 rq         

Looks like it consumed about 900 MB (was 500 MB before). We could add batching later if it memory becomes a problem.

Actions #12

Updated by daviddavis over 5 years ago

  • Status changed from POST to MODIFIED
Actions #13

Updated by ttereshc almost 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF