Story #6737
closedStory #6134: [EPIC] Pulp import/export
As a user, I can import a split export
Added by daviddavis over 4 years ago. Updated over 4 years ago.
100%
Description
Export files can be split. This needs to handle this case.
This is basically the import version of https://pulp.plan.io/issues/6736
Related issues
Updated by daviddavis over 4 years ago
- Related to Story #6736: As a user, I can export into a series of files of a particular size added
Updated by ggainey over 4 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ggainey
- Sprint set to Sprint 73
Updated by ggainey over 4 years ago
We won't be able to do the subprocess-streaming trick we did for PulpExport here with a chunked-export - import needs random-access 'into' the export-tarfile, and you can't have that and stream from a subprocess. We could:
- take the import-filename and look for chunks of the form .dddd. If found, recreate the tarfile, and and process as normal
- require the import-caller to recreate the tarfile before calling import
- add a param to import "chunk_list", which would be the output of the export.output_file_info field. This would let us do a), above, while also checking checksums of each chunk for integrity
There is (iirc) a 'clever' trick to use 'dd' to recreate the .tar.gz that would never use more disk than <tar.gz full size> + 1 'chunk' size. Regardless of whether pulp or its caller is responsible for recreating the tar.gz should investigate using this approach to minimize disk requirements.
Updated by ggainey over 4 years ago
ggainey wrote:
We won't be able to do the subprocess-streaming trick we did for PulpExport here with a chunked-export - import needs random-access 'into' the export-tarfile, and you can't have that and stream from a subprocess. We could:
- take the import-filename and look for chunks of the form .dddd. If found, recreate the tarfile, and and process as normal
Requires way too much implied-magic, and doesn't let us vet the 'chunks' we're trying to recombine.
- require the import-caller to recreate the tarfile before calling import
This is what we get if we do nothing and is available right now. Implementing the functionality described in 3. will not break this option.
- add a param to import "chunk_list", which would be the output of the export.output_file_info field. This would let us do a), above, while also checking checksums of each chunk for integrity
This is the best option, as it requires an implicit request from the user, allows us to find the chunks and check their validity (via checksum), and lets us do the 'minimal extra filespace used' trick without requiring the pulp-user to know how to do Magic With DD.
Implementing a chunks= on import is mutually exclusive with filename= - either you have the whole file, or you have a file with the output of export.output_file_info, whic is expected to be in the 'same place' as the chunks it describes.
Updated by pulpbot over 4 years ago
- Status changed from ASSIGNED to POST
Added by ggainey over 4 years ago
Updated by ggainey over 4 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset pulpcore|40f7cc3bb28830d9944a9908f971ea7002702b16.
Updated by pulpbot over 4 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Taught export to produce, and import to understand, a table-of-contents (toc) file.
Emitted 'next to' the export file or files, named -toc.json.
Consists of keys "meta" and "files". "files" is a dictionary of export-file/checksums. "meta" contains the "file", "chunk_size", and "global_hash" of the export.
Added toc= to import. Import will find and validate the checksums of any chunk_files, reassemble them into a single .tar.gz, and pass that along to the rest of the import process. Deletes chunks as it goes, to minimize disk-space.
Updated import-export docs to describe TOC file and its use.
closes #6737