Project

Profile

Help

Story #892

closed

Redesign of Uploads API

Added by bmbouter over 9 years ago. Updated over 4 years ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Motivation

Here are some proposed adjustments to the upload API to be simpler. The current upload API is documented here: http://pulp.readthedocs.org/en/latest/dev-guide/integration/rest-api/content/upload.html

These are mostly small changes, but the design is adapted from Dropbox API, who likely has thought about the right way to do uploads. This is only the API part; #923 is written to update the CLI/bindings to match.

Proposed Usage

  1. Send a PUT request to /upload with the first chunk of the file without creating an upload request. An upload_id will be automatically created and returned.
  2. Repeatedly PUT subsequent chunks using the upload_id parameter to identify the upload in progress and an offset representing the number of bytes transferred so far. Both upload_id and offset are GET style parameters.
  3. After each chunk has been committed to disk, the server returns a new offset representing the total amount transferred.
  4. After the last chunk, POST to /import_upload to import the entire file into a repo.

Chunks can be any size up to 150 MB. A typical chunk is 4 MB. Using large chunks will mean fewer calls to /upload and faster overall throughput. However, whenever a transfer is interrupted, you will have to resume at the beginning of the last chunk, so it is often safer to use smaller chunks.

If the offset you submit does not match the expected offset on the server, the server will ignore the request and respond with a 400 error that includes the current offset. To resume upload, seek to the correct offset (in bytes) within the file and then resume uploading from that point. This allows the client to be stateless and attempt to resume uploads by upload_id from the beginning, and rely on the server to tell the client the correct offset to resume from.

Chunks support optional checksums using an additional GET style parameter named sha1sum which is a sha1 checksum of the chunk computed by the client. Upon receiving a chunk that specifies sha1sum, the server verifies the checksum before committing it to disk and returning the 200 OK. If a chunk checksum fails to verify the server responds with a 400 error that indicates the checksum failed to verify.

The /import_upload API call also supports optional checksum verification at the file level and using sha1sum as a POST parameter. If sha1sum is specified, the server verifies the checksum of the file before proceeding with the import. If the checksum fails to verify the server responds with a 400 error that indicates the checksum failed to verify.

A chunked upload can take a maximum of 48 hours before expiring. This will be configurable in server.conf somewhere.

Differences from today

  • You can start uploading and an uploading session is created in case you need chunking, but you don't have to do chunking if you don't actually need it. If you need chunking you do the same operation again, only with an upload_id and offset as GET style params to the same URL. We'll save another URL by not have to have a specific endpoint to create an upload request that is different from where the content is uploaded.
  • Pulp won't have a DELETE API endpoint anymore. Instead Pulp would auto-cleanup with a reaper cleanup that would use timestamps to clean up after the expiration time.
  • Pulp won't support the listing of uploads anymore. It's not that useful, especially since a new one could be started and the old one will be auto cleaned up
  • Checksums at the chunking level is also a new feature which should be useful for large uploads like isos.
  • This implementation allows for the uploading of a single file, and importing into multiple repos without re-uploading. The current design allows for this, but the implementation does not because several upload importers move uploaded files which prevents a separate call to the current API import. This implementation should leave in files in place during all calls to /import_upload and let the auto-delete handle any cleanup later.

API for /upload

Method: PUT

GET style Parameters:

upload_id -- The unique ID of the in-progress upload on the server. If left blank, the server will create a new upload session.

offset -- The byte offset of this chunk, relative to the beginning of the full file. The server will verify that this matches the offset it expects. If it does not, the server will return an error with the expected offset.

sha1sum -- (optional) The sha1 checksum for the chunk. The server will verify the chunk sha1sum. If it does not, the server will return an error indicating the chunk failed to verify.

The body is reserved for the upload content binary data so POST style params are not supported.

Example Response:

{
    "upload_id": "16fd2706-8baf-433b-82eb-8c7fada847da",
    "offset": 31337,
    "expires": "Tue, 19 Jul 2011 21:55:38 +0000"
}

API for /import_upload

Method: POST

POST parameters are divided into two types: platform parameters and plugin specific parameters:

Platform POST style Parameters:

upload_id (string) - identifies the upload request being imported
sha1sum (string) -- (optional) The sha1 checksum for the file. The server will verify the file sha1sum. If it does not, the server will return an error indicating the file failed to verify.

Plugin POST style parameters:

These are optional because they aren't required on all plugins by definition. These are some examples, but each plugin will document and specify its own set of parameters.

unit_type_id (string) - identifies the type of unit the upload represents
unit_key (object) - unique identifier for the new unit; the contents are contingent on the type of unit being uploaded
unit_metadata (object) - (optional) extra metadata describing the unit; the contents will vary based on the importer handling the import
override_config (object) - (optional) importer configuration values that override the importer’s default configuration

A 202 OK or error will be returned as it is imported asynchronously. Importing will leave the upload file in place in case the user wants to import again into other repos using the upload interface. If not then auto cleanup will take care of the vestige upload_id.


Related issues

Blocks Pulp - Story #894: Consolidate upload functionality of CLI and bindingsCLOSED - WONTFIX

Actions
Blocks Pulp - Task #923: Update CLI and bindings to to use new-style upload API in story #892CLOSED - WONTFIX

Actions
Blocks Pulp - Issue #636: API delete nonexisting upload_id returns 200CLOSED - WONTFIXActions

Also available in: Atom PDF