Story #892

Updated by bmbouter about 9 years ago

h3. Motivation 

 Here are some proposed adjustments to the upload API to be simpler. The current upload API is documented here: 

 These are mostly minor change, but the design is adapted from "Dropbox API":, who likely has thought about the right way to do uploads. This is only the API part a separate story will be written to extend the CLI/bindings to match. 

 h3. Typical usage 

 # Send a PUT request to /upload with the first chunk of the file without setting upload_id, and receive an upload_id in return. 
 # Repeatedly PUT subsequent chunks using the upload_id to identify the upload in progress and an offset representing the number of bytes transferred so far. 
 # After each chunk has been uploaded, the server returns a new offset representing the total amount transferred. 
 # After the last chunk, POST to /import_upload to complete the upload. 

 Chunks can be any size up to 150 MB. A typical chunk is 4 MB. Using large chunks will mean fewer calls to /upload and faster overall throughput. However, whenever a transfer is interrupted, you will have to resume at the beginning of the last chunk, so it is often safer to use smaller chunks. 

 If the offset you submit does not match the expected offset on the server, the server will ignore the request and respond with a 400 error that includes the current offset. To resume upload, seek to the correct offset (in bytes) within the file and then resume uploading from that point. 

 A chunked upload can take a maximum of 48 hours before expiring. This will be configurable, but what the setting should be called still needs input. 

 h3. Differences from today 

 * You can start uploading and an uploading session is created in case you need chunking, but you don't have to do chunking if you don't actually need it. If you need chunking you do the same operation again, only with an upload_id and offset as GET style params to the same URL. We'll save another URL by not have to have a specific endpoint to create an upload request that is different from where the content is uploaded. 

 * Pulp won't have a DELETE API endpoint anymore. Instead Pulp would auto-cleanup with a reaper cleanup that would use timestamps to clean up after the expiration time. 

 * Pulp won't support the listing of uploads anymore. It's not that useful, especially since a new one could be started and the old one will be auto cleaned up 

 h2. API for /upload 

 Method: PUT 

 GET style Parameters: 
 upload_id -- The unique ID of the in-progress upload on the server. If left blank, the server will create a new upload session. 

 offset -- The byte offset of this chunk, relative to the beginning of the full file. The server will verify that this matches the offset it expects. If it does not, the server will return an error with the expected offset. 

 The body is reserved for the upload content binary data so POST style params are not supported. 

 Example Response: 
     "upload_id": "16fd2706-8baf-433b-82eb-8c7fada847da", 
     "offset": 31337, 
     "expires": "Tue, 19 Jul 2011 21:55:38 +0000" 

 h2. API for /import_upload 

 Method: POST 

 POST style Parameters: 
 upload_id (string) - identifies the upload request being imported 
 unit_type_id (string) - identifies the type of unit the upload represents 
 unit_key (object) - unique identifier for the new unit; the contents are contingent on the type of unit being uploaded 
 unit_metadata (object) - (optional) extra metadata describing the unit; the contents will vary based on the importer handling the import 
 override_config (object) - (optional) importer configuration values that override the importer’s default configuration 

 A 202 OK or error will be returned as it is imported asynchronously. Importing will leave the upload file in place in case the user wants to import again into other repos using the upload interface. If not then auto cleanup will take care of the vestige upload_id.