Project

Profile

Help

Story #4488

closed

As a user, I can upload chunks in parallel

Added by daviddavis about 5 years ago. Updated almost 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Katello
Sprint:
Sprint 55
Quarter:

Description

We're currently using drf-chunked-uploads[0] but it seems like the library has become unmaintained[1] since we adopted. It has some other quirks and missing features too. So I think we should move off of it and roll our code as part of this story.

Solution

Add a design which supports sha256 and parallel uploads of chunks.

Models

Upload

id = UUID
file = File
size = BigIntegerField
user = FK
created_at = DateTimeField
completed_at = DateTimeField

UploadChunk

id = UUID
upload = FK
offset = BigIntegerField
size = BigIntegerField

Workflow

# create the upload session
http POST :24817/pulp/api/v3/uploads/ size=10485759 # returns a UUID (e.g. 345b7d58-f1f8-45d9-d354-82a31eb879bf)
export UPLOAD='/pulp/api/v3/uploads345b7d58-f1f8-45d9-d354-82a31eb879bf/'

# note the order doesn't matter here
http --form PUT :24817$UPLOAD file@./chunkab 'Content-Range:bytes 6291456-10485759/32095676'
http --form PUT :24817$UPLOAD file@./chunkaa 'Content-Range:bytes 0-6291455/32095676'

# view the upload and its chunks
http :24817${UPLOAD}

# complete the upload
http PUT :24817${UPLOAD}commit sha256=037a47d93670e64f2b1038e6f90e4cfd

# create the artifact from the upload
http POST :24817/pulp/api/v3/artifacts/ upload=$UPLOAD

Additional references

https://github.com/douglasmiranda/django-fine-uploader
https://medium.com/box-developer-blog/introducing-the-chunked-upload-api-f82c820ccfcb

[0] https://github.com/jkeifer/drf-chunked-upload
[1] https://github.com/jkeifer/drf-chunked-upload/pull/8


Related issues

Related to Pulp - Story #4196: As a user, I can upload files in chunks.CLOSED - CURRENTRELEASEdaviddavis

Actions
Related to Pulp - Test #5263: Test - As a user, I can upload chunks in parallelCLOSED - DUPLICATEActions
Blocks Pulp - Story #4988: As a user, I can remove uploadsCLOSED - CURRENTRELEASEdaviddavis

Actions
Actions #1

Updated by daviddavis about 5 years ago

  • Related to Story #4196: As a user, I can upload files in chunks. added
Actions #2

Updated by bmbouter almost 5 years ago

  • Tags deleted (Pulp 3)
Actions #3

Updated by daviddavis almost 5 years ago

  • Description updated (diff)
Actions #4

Updated by bmbouter almost 5 years ago

I really like this API. It's legit. I had a few questions I wanted to ask.

What if we didn't have the 'create the upload session' at all? Couldn't the client generate a uuid and start using it?

How do chunks that were never part of an artifact removed?

Should we send a digest value for each chunk? If you have a large file, e.g. many gigs, one incorrect chunk would cause you to upload everything again.

Actions #5

Updated by daviddavis almost 5 years ago

What if we didn't have the 'create the upload session' at all? Couldn't the client generate a uuid and start using it?

I see a number of downsides to doing this. First, it's less RESTful. Second, we need to have the total file size before the upload to create the initial file. So we'd have to either pass in the TOTAL file size with the first request (may be hard with parallel uploads) or with every request (kind of awkward).

How do chunks that were never part of an artifact removed?

I am not totally sure what you're asking but if it's how to remove incomplete uploads, in drf-chunked-uploads they support this (see https://github.com/jkeifer/drf-chunked-upload#settings) but we have yet to leverage this feature. This problem exists currently though and is not needed for this story.

Should we send a digest value for each chunk? If you have a large file, e.g. many gigs, one incorrect chunk would cause you to upload everything again.

We could definitely add this but I think that's outside the scope of this story. Maybe file another story?

Actions #6

Updated by dkliban@redhat.com almost 5 years ago

The user should have to start a session so Pulp can have an opportunity to allocate space for the entire upload. Each uploaded chunk can then be written to it's specific place in the file created session creation. This avoids having to write out the whole file when the upload is complete.

Accepting checksums with each uploaded chunk would be helpful.

Actions #7

Updated by bmbouter almost 5 years ago

We can keep the session creation, it does allow you to make the large file and write into it. If you want to not have chunk checksums initially that is ok too. Also we can worry about the cleanup of uploads that never became Artifacts later too.

Actions #9

Updated by daviddavis almost 5 years ago

  • Blocks Story #4988: As a user, I can remove uploads added
Actions #10

Updated by ttereshc almost 5 years ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #11

Updated by daviddavis almost 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
  • Sprint set to Sprint 54
  • Tags Katello-P1 added

The changes to the API are blocking Katello who is trying to integrate chunked uploads. Setting P1 tag and adding to sprint.

Actions #12

Updated by daviddavis over 4 years ago

  • Description updated (diff)
Actions #13

Updated by daviddavis over 4 years ago

  • Description updated (diff)
Actions #14

Updated by ttereshc over 4 years ago

  • Sprint changed from Sprint 54 to Sprint 55
Actions #15

Updated by daviddavis over 4 years ago

  • Status changed from ASSIGNED to POST

Added by daviddavis over 4 years ago

Revision 24b50710 | View on GitHub

Add support for parallel chunks and sha256

Also removed drf-chunked-upload.

fixes #4488,#4486

Actions #16

Updated by daviddavis over 4 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #17

Updated by kersom over 4 years ago

  • Related to Test #5263: Test - As a user, I can upload chunks in parallel added
Actions #18

Updated by bmbouter over 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #19

Updated by ggainey almost 4 years ago

  • Tags Katello added
  • Tags deleted (Katello-P1)

Also available in: Atom PDF