Project

Profile

Help

Issue #3770

Pulp 3 is about 2x slower than pulp 2 in syncing a large file repo

Added by jsherril@redhat.com over 1 year ago. Updated 3 months ago.

Status:
CLOSED - NOTABUG
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Severity:
2. Medium
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello-P1
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Target Release - File:
Sprint:

Description

Syncing a large file repo is slower in pulp 3 than pulp 2, with 70,000 files:

pulp 3: 25 minutes
pulp 2: 15 minutes

I've attached the repo i used to sync, and then simply ran:

http --auth admin:admin http://127.0.0.1:8000/pulp/api/v3/repositories/ name=foo

http --auth admin:admin http://127.0.0.1:8000/pulp/api/v3/remotes/file/  name=foo url="http://localhost/pub/repos/large_file/"

http --auth admin:admin http://127.0.0.1:8000/pulp/api/v3/remotes/file/d8db8df9-8ac9-4e61-9070-fa8e0a57e1a4/sync/ repository=http://127.0.0.1:8000/pulp/api/v3/repositories/cc907a15-5313-447a-9d92-ec9213d81e05/
large_file.tar.gz (1.08 MB) jsherril@redhat.com, 06/18/2018 08:22 PM large_file.tar.gz
large_file2.tar.tz (3.09 MB) jsherril@redhat.com, 06/22/2018 05:32 PM large_file2.tar.tz
pulp-code-hotspots-only-cprofile.png (44.8 KB) A cprofile graph using gprof2dot and filtered for only Pulp code bmbouter, 12/05/2018 09:37 PM pulp-code-hotspots-only-cprofile.png
250

Subtasks

Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_saveCLOSED - WONTFIXdalleyActions
Pulp - Issue #3813: Pulp3 Artifacts are not compatible with bulk_saveCLOSED - CURRENTRELEASEbmbouterActions
Pulp - Issue #3814: RemositoryVersion's add_content and remove_content does not perform bulk operationsCLOSED - CURRENTRELEASEdaviddavisActions

History

#1 Updated by CodeHeeler over 1 year ago

  • Triaged changed from No to Yes

#2 Updated by bmbouter over 1 year ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter

I ran this test against my DeclarativeVersion asyncio code. I haven't run a formal benchmark, but I did cprofile the code. I found some interesting things.

The cProfile shows:

67.9% of the time is spent in the base save() method: /home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/base.py:655(save)

Also of interest, it shows 5000 calls to add_content for a total of 38% of the time is spent in that call cumulatively /home/vagrant/devel/pulp/pulpcore/pulpcore/app/models/repository.py:337(add_content)

One of the issues is that we can't use the bulk_create option in Django due to our usage of the Artifact.save() method which bulk_create won't call. This prevents us from having lower runtimes when saving N artifacts.

We should also determine if Content, ContentArtifact, and RemoteArtifact are also safe to use with bulk_create.

#3 Updated by bmbouter over 1 year ago

Content can not be used with bulk_create because it uses multi-table inheritance. An exception is raised when this is tried.

RemoteArtifact and ContentArtifact both can be used with bulk_create. My code does that successfully.

#4 Updated by jsherril@redhat.com over 1 year ago

Uploading a 2nd repository where artifacts are not shared. each file is unique.

(sorry it should be large_file2.tar.xz getting my compression formats confused :)

#5 Updated by jsherril@redhat.com over 1 year ago

  • Subject changed from Pulp 3 is about 40% slower than pulp 2 in syncing a large file repo to Pulp 3 is about 2x slower than pulp 2 in syncing a large file repo

My results with the file_large2.tar.tz repo:

Pulp 2: 18 minutes
Pulp 3: 44 minutes

#6 Updated by mhrivnak over 1 year ago

I encourage you to put performance requirements into context. This experiment revealed that when writing data to postgresql vs mongodb, there is a real cost to using a database with ACID guarantees and to writing more individual records per content unit. (I don't know if one or both of those is responsible, but generally doing more work with more features takes more time.) But: are you optimizing Pulp 3 for the ability to sync 70k 0-byte files from localhost?

If we started making those files larger, as the amount of time it takes to download the files approaches the amount of time it takes to write records to the database, the user's wait time attributable to the database approaches zero. An example will help.

At 70k units in 44 minutes, that's 26.5 units per second. What if each unit were 1MB in size on average? As long as downloading happens no faster than 26.5MBps (212Mbps), database writes would no longer be the limiting factor. Of course this assumes that database writes and downloading are happening concurrently, but I think Pulp has that problem mostly solved. In other words, if you made these files 1MB on average and put them on a remote machine across the internet, I bet the performance difference between Pulp 2 and 3 would look very different. Pulp 3 might even win.

Of course there are more variables. There could be unrelated database activity slowing our database writes, or unrelated network activity slowing our downloads. Some files might already exist and wouldn't need to be downloaded again.

So the question is: how much faster does your database write performance need to be than your download performance in order to be comfortable that most (you pick a percentage) users won't be waiting on their database? Pick your margin of comfort. What if, in a "normal case", the database is 1.5x faster than the fastest expected download speed? 2x? 3x?

You may need to choose:

  • max download speed you are willing to optimize for. "If your connection is faster than X, wow, lucky you! You can wait a few minutes on the database."
  • max portion of already-downloaded content you will optimize for. Again, if you happen to have more than X% of files already present on disk, what good fortune! This time you'll need to wait on the DB.
  • expected average size of a content unit's files
  • how much margin you want to build in for worse-than-normal database performance. 1.5x? 2x? 3x? 5x?

The tests in above comments are fine scenarios to use for improving your database write performance. That's a great thing to do, and it sounds like it revealed some opportunities. But before you set any hard performance requirements for Pulp 3 GA, I suggest picking those benchmarks carefully.

#7 Updated by bmbouter over 1 year ago

I also thought that any increases in runtime due to database operations being slower in Pulp3 would be offset by the huge amount of time that would be spent downloading. Then @jsherrill on irc pointed out to me that lazy downloading is the default for Katello, which is why they made the test with 0-byte files. This test is effectively the common case, which is my motivation to make it a Pulp3 requirement. The good news is I believe this goal is very achievable.

Here is a revised goal statement specifically highlighting that it's about lazy sync:

Pulp3 lazy sync of pulp_file should be roughly as fast on Pulp3 as Pulp2.

I believe this would make sense as a P1 item for the gap fit on https://projects.theforeman.org/projects/katello/wiki/PulpV3GapAnalysis#Core-Problem-Statements

#8 Updated by jsherril@redhat.com over 1 year ago

Yes, i agree that we may end up seeing some sort of performance degregation with inserts using an ACID database, and filed this for some investigation. Initial investigation by brian indicates that this was not due solely to inherent slowness with the database, but instead with design decisions that had been made around how to use the database and django.

It seems important to address these structural changes before release if ever at all. In addition, what Brian said is correct, the default configuration of Katello and Satellite 6 is to use on_demand for yum repositories, so the zero byte test was somewhat of a good test for that scenario.

For this reason I think it is a P1 issue, until we either address the structural changes or determine that addressing those sub-optimial design decisions is either not possible or not worthwhile.

#9 Updated by daviddavis over 1 year ago

+1 to making this a P1.

#11 Updated by jsherril@redhat.com over 1 year ago

  • Tags Katello-P1, Pulp 3 added

#12 Updated by dalley over 1 year ago

On my PC (with a fast SSD), I get an 18 minute sync for Pulp 2, and also an 18 minute sync for Pulp 3 with a minor modification to increase the batch size. I plan to add that modification to core this afternoon.

So it looks like we have performance parity for the file plugin, for "fake" lazy sync. Once the real lazy sync work is complete we can repeat this test a few times to see where we stand.

I will also test the single-table content branch but it's a good sign that even master branch is getting decent performance without having had much in the way of focused optimization yet. I may also try a few other low-hanging fruit optimizations on master branch to get a clearer picture when comparing against the single-table content branch, since that represents a fairly significant change.

#13 Updated by bmbouter over 1 year ago

I think this has caused so many good changes in our API and python API that as long as the performance is on-par I'm +1 on accepting it.

#14 Updated by dalley over 1 year ago

The performance gap has been seemingly rectified as of the latest master branch.

Last night I did a lazy sync of EPEL with Pulp 2 and Pulp 3, with the result that Pulp 3 only took 40% as long to perform the lazy sync (2 minutes 15 seconds vs 5 minutes 16 seconds). Actual sync performance as mentioned also seems to be comparable when using many very small files (specifically, 70k files each containing only a UUID).

One thing I'd like to test is downloading performance, so I may try to construct a file benchmark with larger files to make sure there's no regression there. If not, I think we should close this issue as-is and treat the single-table content work as a separate issue, presuming we still go forwards with it.

#15 Updated by dalley over 1 year ago

Here's the results of my performance testing

Tests with File sync, 70k small files served locally
----------------------------------------------------

Pulp 2 (master):     ~18 minutes (I didn't record the exact number)
Pulp 3 (master):     17m 47s

Pulp 3 (lazy sync):  3m 46s

Tests with RPM lazy sync against EPEL (https://dl.fedoraproject.org/pub/epel/7/x86_64/)
---------------------------------------------------------------------------------------

# Note: these numbers are from my laptop and not directly comparable to the ones above

Pulp 2 (master):  5m 16s
Pulp 3 (master):  2m 16s

Tests with RPM immediate sync against EPEL (https://dl.fedoraproject.org/pub/epel/7/x86_64/)
--------------------------------------------------------------------------------------------

Pulp 2 (master):  24m 56s
Pulp 3 (master):  21m 52s

Tests with RPM immediate sync against RPMFusion (https://download1.rpmfusion.org/free/fedora/updates/28/x86_64/)
----------------------------------------------------------------------------------------------------------------

Pulp 2 (master):  1m 03s
Pulp 3 (master):  0m 52s

Since Pulp 3 is now faster than Pulp 2 across the board (even before in-depth performance tuning), I'm going to close out this issue.

#16 Updated by dalley over 1 year ago

  • Status changed from ASSIGNED to CLOSED - NOTABUG

#17 Updated by bmbouter about 1 year ago

250

I cprofiled the lazy case with the single-content branches and 70k pulp_file units being synced. In that testing and code reading, I don't see a way for us to make it faster than master.

Foremost, if you take the gprof2dot and filter by the Pulp code only (see attached) there are only a few places in Pulp methods that are themselves slow. Even though they are slow enough for gprof2dot to include, there isn't much opportunity. If you add up the self% time (the time spent exclusively in these Pulp functions and not in a subroutine) it only adds to 1.6% of the runtime. That means we can only make the total system go 1.6% faster overall due to the execution time of Pulp code being improved. That's not enough to make up the 30-40% different between the single-content branches and master.

I reviewed these hotspot areas for algorithmic improvements, but I did not see any. It's already been optimized in many ways, for example if only the id is needed from the query, only the id is returned to limit the amount of data. I don't see any way to algorithmically do "less work".

I've confirmed that batching is working as expected with a batch size of 100. Changing this would be an opportunity for both master and single-content alike, so I don't think it's a consideration when contemplating using single-content instead of master.

The single-content branch calls the cursor significantly fewer times, but still has a much higher runtime. This is unexpected, but verified in my testing. Thinking about it from a high level though, it makes sense. The single-content has to perform much more complex queries across more tables so overall the SQL workload is higher even with fewer queries. To think of this another way, it makes sense that for many content unit query operations, to interact with only 1 table can be very efficient.

#18 Updated by bmbouter 10 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF