Story #6737: As a user, I can import a split export - Pulp

Actions

Send by e-mail Copy link

Story #6737

closed

Story #6134: [EPIC] Pulp import/export

As a user, I can import a split export

Added by daviddavis over 4 years ago. Updated over 4 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

ggainey

Category:

Sprint/Milestone:

3.6.0

Start date:

Due date:

% Done:

100%

Estimated time:

Platform Release:

Groomed:

Sprint Candidate:

Tags:

Sprint:

Sprint 77

Quarter:

Description

Export files can be split. This needs to handle this case.

This is basically the import version of https://pulp.plan.io/issues/6736

Related issues

Actions

Copy link

Updated by daviddavis over 4 years ago

Related to Story #6736: As a user, I can export into a series of files of a particular size added

Actions

Copy link

Updated by ggainey over 4 years ago

Status changed from NEW to ASSIGNED
Assignee set to ggainey
Sprint set to Sprint 73

Actions

Copy link

Updated by ggainey over 4 years ago

We won't be able to do the subprocess-streaming trick we did for PulpExport here with a chunked-export - import needs random-access 'into' the export-tarfile, and you can't have that and stream from a subprocess. We could:

take the import-filename and look for chunks of the form .dddd. If found, recreate the tarfile, and and process as normal
require the import-caller to recreate the tarfile before calling import
add a param to import "chunk_list", which would be the output of the export.output_file_info field. This would let us do a), above, while also checking checksums of each chunk for integrity

There is (iirc) a 'clever' trick to use 'dd' to recreate the .tar.gz that would never use more disk than <tar.gz full size> + 1 'chunk' size. Regardless of whether pulp or its caller is responsible for recreating the tar.gz should investigate using this approach to minimize disk requirements.

Actions

Copy link

Updated by rchan over 4 years ago

Sprint changed from Sprint 73 to Sprint 74

Actions

Copy link

Updated by rchan over 4 years ago

Sprint changed from Sprint 74 to Sprint 75

Actions

Copy link

Updated by rchan over 4 years ago

Sprint changed from Sprint 75 to Sprint 76

Actions

Copy link

Updated by ggainey over 4 years ago

ggainey wrote:

We won't be able to do the subprocess-streaming trick we did for PulpExport here with a chunked-export - import needs random-access 'into' the export-tarfile, and you can't have that and stream from a subprocess. We could:

take the import-filename and look for chunks of the form .dddd. If found, recreate the tarfile, and and process as normal

Requires way too much implied-magic, and doesn't let us vet the 'chunks' we're trying to recombine.

require the import-caller to recreate the tarfile before calling import

This is what we get if we do nothing and is available right now. Implementing the functionality described in 3. will not break this option.

add a param to import "chunk_list", which would be the output of the export.output_file_info field. This would let us do a), above, while also checking checksums of each chunk for integrity

This is the best option, as it requires an implicit request from the user, allows us to find the chunks and check their validity (via checksum), and lets us do the 'minimal extra filespace used' trick without requiring the pulp-user to know how to do Magic With DD.

Implementing a chunks= on import is mutually exclusive with filename= - either you have the whole file, or you have a file with the output of export.output_file_info, whic is expected to be in the 'same place' as the chunks it describes.

Actions

Copy link

Updated by rchan over 4 years ago

Sprint changed from Sprint 76 to Sprint 77

Actions

Copy link

Updated by pulpbot over 4 years ago

Status changed from ASSIGNED to POST

PR: https://github.com/pulp/pulpcore/pull/794

Added by ggainey over 4 years ago

Revision 40f7cc3b | View on GitHub

Taught export to produce, and import to understand, a table-of-contents (toc) file.

Emitted 'next to' the export file or files, named -toc.json.

Consists of keys "meta" and "files". "files" is a dictionary of export-file/checksums. "meta" contains the "file", "chunk_size", and "global_hash" of the export.

Added toc= to import. Import will find and validate the checksums of any chunk_files, reassemble them into a single .tar.gz, and pass that along to the rest of the import process. Deletes chunks as it goes, to minimize disk-space.

Updated import-export docs to describe TOC file and its use.

closes #6737

Actions

Copy link

#10