Project

Profile

Help

Story #6737

closed

Story #6134: [EPIC] Pulp import/export

As a user, I can import a split export

Added by daviddavis over 4 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 77
Quarter:

Description

Export files can be split. This needs to handle this case.

This is basically the import version of https://pulp.plan.io/issues/6736


Related issues

Related to Pulp - Story #6736: As a user, I can export into a series of files of a particular sizeCLOSED - CURRENTRELEASEggainey

Actions
Actions #1

Updated by daviddavis over 4 years ago

  • Related to Story #6736: As a user, I can export into a series of files of a particular size added
Actions #2

Updated by ggainey over 4 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ggainey
  • Sprint set to Sprint 73
Actions #3

Updated by ggainey over 4 years ago

We won't be able to do the subprocess-streaming trick we did for PulpExport here with a chunked-export - import needs random-access 'into' the export-tarfile, and you can't have that and stream from a subprocess. We could:

  1. take the import-filename and look for chunks of the form .dddd. If found, recreate the tarfile, and and process as normal
  2. require the import-caller to recreate the tarfile before calling import
  3. add a param to import "chunk_list", which would be the output of the export.output_file_info field. This would let us do a), above, while also checking checksums of each chunk for integrity

There is (iirc) a 'clever' trick to use 'dd' to recreate the .tar.gz that would never use more disk than <tar.gz full size> + 1 'chunk' size. Regardless of whether pulp or its caller is responsible for recreating the tar.gz should investigate using this approach to minimize disk requirements.

Actions #4

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 73 to Sprint 74
Actions #5

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 74 to Sprint 75
Actions #6

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 75 to Sprint 76
Actions #7

Updated by ggainey over 4 years ago

ggainey wrote:

We won't be able to do the subprocess-streaming trick we did for PulpExport here with a chunked-export - import needs random-access 'into' the export-tarfile, and you can't have that and stream from a subprocess. We could:

  1. take the import-filename and look for chunks of the form .dddd. If found, recreate the tarfile, and and process as normal

Requires way too much implied-magic, and doesn't let us vet the 'chunks' we're trying to recombine.

  1. require the import-caller to recreate the tarfile before calling import

This is what we get if we do nothing and is available right now. Implementing the functionality described in 3. will not break this option.

  1. add a param to import "chunk_list", which would be the output of the export.output_file_info field. This would let us do a), above, while also checking checksums of each chunk for integrity

This is the best option, as it requires an implicit request from the user, allows us to find the chunks and check their validity (via checksum), and lets us do the 'minimal extra filespace used' trick without requiring the pulp-user to know how to do Magic With DD.

Implementing a chunks= on import is mutually exclusive with filename= - either you have the whole file, or you have a file with the output of export.output_file_info, whic is expected to be in the 'same place' as the chunks it describes.

Actions #8

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 76 to Sprint 77
Actions #9

Updated by pulpbot over 4 years ago

  • Status changed from ASSIGNED to POST

Added by ggainey over 4 years ago

Revision 40f7cc3b | View on GitHub

Taught export to produce, and import to understand, a table-of-contents (toc) file.

Emitted 'next to' the export file or files, named -toc.json.

Consists of keys "meta" and "files". "files" is a dictionary of export-file/checksums. "meta" contains the "file", "chunk_size", and "global_hash" of the export.

Added toc= to import. Import will find and validate the checksums of any chunk_files, reassemble them into a single .tar.gz, and pass that along to the rest of the import process. Deletes chunks as it goes, to minimize disk-space.

Updated import-export docs to describe TOC file and its use.

closes #6737

Actions #10

Updated by ggainey over 4 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #11

Updated by dkliban@redhat.com over 4 years ago

  • Sprint/Milestone set to 3.6.0
Actions #12

Updated by pulpbot over 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF