Project

Profile

Help

Issue #8839

Updated by pulpbot about 2 years ago

 

 **Ticket moved to GitHub**: "pulp/pulpcore/2007":https://github.com/pulp/pulpcore/issues/2007 




 ---- 


 Following up from IRC #pulp, I've just recently started using pulp 3.12.2, and tried uploading a 1Gbyte file. 

 I created it with: 

 ~~~ 
 dd if=/dev/urandom of=./testfile bs=1048576 count=1024 
 ~~~ 

 That took about 5 seconds (suggesting my local spinning disk based filesystem could handle about 200MB/s, assuming no bottlenecks from /dev/urandom). 

 Then I did a: 

 ~~~ 
 pulp file content upload ... 
 ~~~ 

 It took about 3 mins, which is quite a long time. This is on the same machine where I created the file, to pulp running locally. 

 I did an rsync over ssh of the same file to a second machine and it took a little over 9s. 
 On the second machine, I set up a replica repo with a remote set to the first pulp instance. 
 The sync took a little over 9s -- i.e. matched rsync. 

 On the first pulp instance, based on suggestions from IRC, I updated nginx.conf to set client_max_body_size to 1024m for /pulp/api/v3 (and /pulp/content -- though I don't know if the latter was needed). 

 I then used --chunk-size 1000000000 (1 billion bytes) with the pulp file content upload and got down to 43s. That's still 4.8x slower than rsync. 

 I realize there are a few seconds of overhead for database operations and checksums (I measured the latter at about 5-6 seconds by running sha256/384/512sum and totaling them). But still, it seems quite slow. 

 At the moment, this is just a prototype setup. My goal is to have pulp instances globally as replicas for our custom file artifacts. We need to keep the total time taken from uploading an artifact through availability to clients around the world to be as low as possible -- i..e. exploit the best that the underlying infrastructure is capable of. Most of the files I anticipate to be under 100Mbytes, in which case I expect other background operations to take up a greater percent of the total time. I haven't even counted time taken for updating the Publication and Distribution that clients will connect to. 

 What can be done to improve this? 

Back