Issue #2118
closedReduce runtime of file path migration
Description
When using a latent filesystem, such as NFS, users are finding that the file path migration can take a long time to run. Some have reported over 24 hours. Having investigated and done some quick-and-dirty PoCs, I found two options for improving performance.
I tested with 2 repos, about 5k RPMs each, and 20 published copies of one of them. The system had an NFS share mounted from a desktop machine on a 100Mbps link with latency <1ms.
On my setup, these two changes reduced migration time from 54 minutes to 14 minutes.
Remove Pruning¶
About half the time was spent searching for and deleting empty directories after the files themselves had been moved. There do not seem to be any opportunities to make the pruning go faster in python, but the same operation can be accomplished with a "find" command in a shell, and about 20% faster. Removing it from the migrations would allow the user to do this at their leisure, although there is possibly a small risk of the operation interfering with other pulp operations. If pulp is otherwise manipulating files and directories in /var/lib/pulp/content/, this command could inadvertently remove a directory out from under something that was about to use it. We may be able to limit the scope of the command to avoid parts of the filesystem used by pulp 2.8.
Introduce Concurrency¶
For the operations that move files around and fix symlinks, introducing a small number of threads to do the work concurrently has a big speed increase. These operations go roughly twice as fast with 4 threads as opposed to a single thread.
Both of these can be accomplished in a fairly short amount of time. I have working PoCs, and it's not a lot of code change. The changes can be done only in the platform, without the need to touch the plugins.
Updated by mhrivnak over 7 years ago
- Status changed from NEW to ASSIGNED
- Sprint/Milestone set to 24
- Sprint Candidate changed from No to Yes
Adding to the sprint per request from @jalberts
Updated by mhrivnak over 7 years ago
- Status changed from ASSIGNED to POST
Updated by jortel@redhat.com over 7 years ago
Team decided it would be more appropriate to include Y release. Align to master.
Added by mhrivnak over 7 years ago
Added by mhrivnak over 7 years ago
Revision caddb227 | View on GitHub
Speeds up the unit file path migrations
Removes the empty directory purge phase of the 2.8 migrations, which was taking some users many hours when done over NFS.
Introduces multi-threaded concurrency for the bulk of the migration's work.
Updated by mhrivnak over 7 years ago
New PR on master: https://github.com/pulp/pulp/pull/2675
Updated by mhrivnak over 7 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset pulp|caddb227d217f7f7640d4c1933aa08b06e70603b.
Updated by dkliban@redhat.com over 7 years ago
- Tracker changed from Refactor to Issue
- Severity set to 2. Medium
- Triaged set to No
Updated by semyers over 7 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Speeds up the unit file path migrations
Removes the empty directory purge phase of the 2.8 migrations, which was taking some users many hours when done over NFS.
Introduces multi-threaded concurrency for the bulk of the migration's work.
https://pulp.plan.io/issues/2118 fixes #2118