Project

Profile

Help

Issue #2783

Updated by rmcgover almost 7 years ago

In Pulp platform there is a CopyDirectoryStep used by various publishers.    It's notably used by pulp_rpm to copy an existing yum repo and incrementally update it. 

 Currently, it will copy file contents and discard attributes, including mtime. 

 It would be better if this copy retained the mtime on files for the following reasons: 

 h2. Improved rsync performance 

 When yum and rsync distributors are used together, an rsync publish may reference many files which have already been published with the same content (e.g. sqlite files which are retained over publishes). 

 Because these files were copied discarding timestamps, the rsync "quick check" algorithm (which checks if size and last modified time are equal between local and remote) will fail and rsync must read the entire file on both sides to calculate its checksum, which can be much slower if the repo has many files. 

 I see this in publishes where the rsync itemized output contains many lines like this: 

 <pre> 
 <f..t...... 0258070a02e25a56bb6c05130c1ae275164a58fd4f07585bb43a8476d3cf1cc9-filelists.sqlite.bz2 
 <f..t...... 0259762a377462d92d93277b10b70b05b2356fdc474c0ac427db4f282df50dc1-primary.sqlite.bz2 
 <f..t...... 0264321f868112ee7bc735f7f25e47c87485da7847389047a5b0e3f076d3c11d-other.sqlite.bz2 
 <f..t...... 02751717b265bd5f4644da81e8f1129fcb972867caed8998daff4709cd4cb9ec-primary.sqlite.bz2 
 </pre> 

 ...indicating that file content didn't change, but timestamp changed. 

 (Side note: about yum repos retaining files between publish, see related issues issue https://pulp.plan.io/issues/1684 , https://pulp.plan.io/issues/2788 ) 

 h2. More user-friendly 

 mtime is useful for humans, e.g. when looking at a directory listing on a Pulp-hosted yum repo, I would prefer to see the real mtime of each file (so I can tell old and new repodata apart) rather than the time when the CopyDirectoryStep ran.

Back