Project

Profile

Help

Issue #2118

closed

Reduce runtime of file path migration

Added by mhrivnak almost 8 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.10.0
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Sprint 6
Quarter:

Description

When using a latent filesystem, such as NFS, users are finding that the file path migration can take a long time to run. Some have reported over 24 hours. Having investigated and done some quick-and-dirty PoCs, I found two options for improving performance.

I tested with 2 repos, about 5k RPMs each, and 20 published copies of one of them. The system had an NFS share mounted from a desktop machine on a 100Mbps link with latency <1ms.

On my setup, these two changes reduced migration time from 54 minutes to 14 minutes.

Remove Pruning

About half the time was spent searching for and deleting empty directories after the files themselves had been moved. There do not seem to be any opportunities to make the pruning go faster in python, but the same operation can be accomplished with a "find" command in a shell, and about 20% faster. Removing it from the migrations would allow the user to do this at their leisure, although there is possibly a small risk of the operation interfering with other pulp operations. If pulp is otherwise manipulating files and directories in /var/lib/pulp/content/, this command could inadvertently remove a directory out from under something that was about to use it. We may be able to limit the scope of the command to avoid parts of the filesystem used by pulp 2.8.

Introduce Concurrency

For the operations that move files around and fix symlinks, introducing a small number of threads to do the work concurrently has a big speed increase. These operations go roughly twice as fast with 4 threads as opposed to a single thread.

Both of these can be accomplished in a fairly short amount of time. I have working PoCs, and it's not a lot of code change. The changes can be done only in the platform, without the need to touch the plugins.

Also available in: Atom PDF