Project

Profile

Help

Story #6737

Story #6134: [EPIC] Pulp import/export

As a user, I can import a split export

Added by daviddavis 5 months ago. Updated about 2 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 77
Quarter:

Description

Export files can be split. This needs to handle this case.

This is basically the import version of https://pulp.plan.io/issues/6736


Related issues

Related to Pulp - Story #6736: As a user, I can export into a series of files of a particular sizeCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision 40f7cc3b View on GitHub
Added by ggainey 2 months ago

Taught export to produce, and import to understand, a table-of-contents (toc) file.

Emitted 'next to' the export file or files, named -toc.json.

Consists of keys "meta" and "files". "files" is a dictionary of export-file/checksums. "meta" contains the "file", "chunk_size", and "global_hash" of the export.

Added toc= to import. Import will find and validate the checksums of any chunk_files, reassemble them into a single .tar.gz, and pass that along to the rest of the import process. Deletes chunks as it goes, to minimize disk-space.

Updated import-export docs to describe TOC file and its use.

closes #6737

History

#1 Updated by daviddavis 5 months ago

  • Related to Story #6736: As a user, I can export into a series of files of a particular size added

#2 Updated by ggainey 4 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ggainey
  • Sprint set to Sprint 73

#3 Updated by ggainey 4 months ago

We won't be able to do the subprocess-streaming trick we did for PulpExport here with a chunked-export - import needs random-access 'into' the export-tarfile, and you can't have that and stream from a subprocess. We could:

  1. take the import-filename and look for chunks of the form .dddd. If found, recreate the tarfile, and and process as normal
  2. require the import-caller to recreate the tarfile before calling import
  3. add a param to import "chunk_list", which would be the output of the export.output_file_info field. This would let us do a), above, while also checking checksums of each chunk for integrity

There is (iirc) a 'clever' trick to use 'dd' to recreate the .tar.gz that would never use more disk than <tar.gz full size> + 1 'chunk' size. Regardless of whether pulp or its caller is responsible for recreating the tar.gz should investigate using this approach to minimize disk requirements.

#4 Updated by rchan 4 months ago

  • Sprint changed from Sprint 73 to Sprint 74

#5 Updated by rchan 4 months ago

  • Sprint changed from Sprint 74 to Sprint 75

#6 Updated by rchan 3 months ago

  • Sprint changed from Sprint 75 to Sprint 76

#7 Updated by ggainey 3 months ago

ggainey wrote:

We won't be able to do the subprocess-streaming trick we did for PulpExport here with a chunked-export - import needs random-access 'into' the export-tarfile, and you can't have that and stream from a subprocess. We could:

  1. take the import-filename and look for chunks of the form .dddd. If found, recreate the tarfile, and and process as normal

Requires way too much implied-magic, and doesn't let us vet the 'chunks' we're trying to recombine.

  1. require the import-caller to recreate the tarfile before calling import

This is what we get if we do nothing and is available right now. Implementing the functionality described in 3. will not break this option.

  1. add a param to import "chunk_list", which would be the output of the export.output_file_info field. This would let us do a), above, while also checking checksums of each chunk for integrity

This is the best option, as it requires an implicit request from the user, allows us to find the chunks and check their validity (via checksum), and lets us do the 'minimal extra filespace used' trick without requiring the pulp-user to know how to do Magic With DD.

Implementing a chunks= on import is mutually exclusive with filename= - either you have the whole file, or you have a file with the output of export.output_file_info, whic is expected to be in the 'same place' as the chunks it describes.

#8 Updated by rchan 3 months ago

  • Sprint changed from Sprint 76 to Sprint 77

#9 Updated by pulpbot 2 months ago

  • Status changed from ASSIGNED to POST

#10 Updated by ggainey 2 months ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#11 Updated by dkliban@redhat.com about 2 months ago

  • Sprint/Milestone set to 3.6.0

#12 Updated by pulpbot about 2 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF