Project

Profile

Help

Story #4196

As a user, I can upload files in chunks.

Added by akofink 11 months ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
API
Sprint/Milestone:
Start date:
Due date:
% Done:

0%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Katello-P1
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 49

Description

Pulp needs to allow users to upload large files in chunks. There is already a django project0 out there that helps solve this problem. Pulp should integrate this into it's REST API.

django-chunked-upload does not currently work with S3, but there is a relatively small PR1 to make it possible.

It would also be nice if the chunks could be uploaded in parallel, but that is not currently possible2 with django-chunked-upload.

[0] https://github.com/juliomalegria/django-chunked-upload
[1] https://github.com/juliomalegria/django-chunked-upload/pull/39
[2] https://github.com/juliomalegria/django-chunked-upload/issues/45


Related issues

Related to Pulp - Test #4197: Test upload of large artifacts NEW Actions
Related to Pulp - Task #4486: Uploading requires use of md5 MODIFIED Actions
Related to Pulp - Story #4488: As a user, I can upload chunks in parallel MODIFIED Actions
Related to Pulp - Story #4498: As a user, I can use chunked uploading with S3 NEW Actions
Related to Pulp - Issue #4896: [Ruby client] Chunked Uploads API doesn't recognize file parameter MODIFIED Actions
Related to Pulp - Story #4982: As a user, I can set a checksum with each upload chunk to have the system verify the upload MODIFIED Actions
Related to Pulp - Story #4981: Remove incomplete chunked uploads after a set amount of time NEW Actions
Related to Pulp - Story #4988: As a user, I can remove uploads MODIFIED Actions

Associated revisions

History

#1 Updated by akofink 11 months ago

Pulp 3 should accept very large files (~10GB or more) via chunked uploads like Pulp 2 does, and the API documentation should detail how to do this.

#2 Updated by kersom 11 months ago

  • Related to Test #4197: Test upload of large artifacts added

#3 Updated by dkliban@redhat.com 11 months ago

I just tested the same command directly on the django web server and I was able to upload 14mb file.

This seems to be a problem with the nginx config provided by the installer.

#4 Updated by amacdona@redhat.com 11 months ago

  • Duplicated by Issue #4214: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE added

#5 Updated by amacdona@redhat.com 11 months ago

  • Duplicated by deleted (Issue #4214: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE)

#6 Updated by dkliban@redhat.com 11 months ago

  • Tracker changed from Issue to Story
  • Subject changed from Cannot upload large artifacts to As a user, I can upload files in chunks.
  • % Done set to 0

#7 Updated by daviddavis 8 months ago

  • Tags Pulp 3 RC Blocker added

#8 Updated by dkliban@redhat.com 8 months ago

  • Description updated (diff)

#9 Updated by daviddavis 8 months ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes

#10 Updated by ttereshc 8 months ago

  • Sprint set to Sprint 49

#11 Updated by daviddavis 8 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis

#12 Updated by daviddavis 8 months ago

I emailed the author of django-chunked-upload since it's been a year since code has been merged in the repository. Waiting to hear back.

#13 Updated by bmbouter 8 months ago

Here's another dead uploader project: https://github.com/douglasmiranda/django-fine-uploader

#14 Updated by daviddavis 8 months ago

I found another package that seems to be (more?) active. It fits our needs better since it integrates with DRF (as opposed to django). It's also based on django-chunked-upload. I'm going to try to use it.

https://github.com/jkeifer/drf-chunked-upload

#15 Updated by daviddavis 8 months ago

Looking at the django-chunked-uploader package, the workflow to create an artifact from a chunked upload would be something like the following. Assume we have 3 file chunks.

$ http --form PUT https://pulp3/8000/pulp/api/v3/uploads/ file@./chunk1
# {"url": "https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31", "offset": 10000, "expires": "2019-03-18T17:56:22.186Z"}
$ http --form PUT https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 file@./chunk2
$ http --form PUT https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 file@./chunk3 
$ http POST https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 md5=0d599f0ec05c3bda8c3b8a68c32a1b47
# POSTing md5 creates the file
$ http POST https://pulp3:8000/pulp/api/v3/artifacts/ upload_id=/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31

#16 Updated by dkliban@redhat.com 8 months ago

This looks good to me.

It would be nice if there was a solution that could support parallel chunk uploads.

#17 Updated by daviddavis 8 months ago

The django-fine-uploader package does but it's the most out of date package (hasn't been updated since April 2017) and has some gaps like Python 3 support. And also, it's not made to work with DRF so getting it to work with DRF will probably require some work.

#18 Updated by jsherril@redhat.com 8 months ago

Looks good to me too! I guess you don't have to worry about offsets and what not, you just upload chunks in a serial manner?

#19 Updated by daviddavis 8 months ago

That's correct. Does Katello require parallel chunk uploads?

I'm thinking of using drf-chunked-upload and (eventually) opening a PR against it with support for parallel chunk uploads.

#20 Updated by jsherril@redhat.com 8 months ago

No, i would not say its in our requirements to upload multiple chunks of the same file at once.

#21 Updated by bmbouter 8 months ago

@daviddavis your plan sounds good to me. There are some users who do value parallel chunk support for very large files, e.g. isos, but that can definitely come later. Thank you for looking into this upload feature.

#22 Updated by daviddavis 8 months ago

  • Related to Task #4486: Uploading requires use of md5 added

#23 Updated by daviddavis 8 months ago

  • Related to Story #4488: As a user, I can upload chunks in parallel added

#24 Updated by daviddavis 8 months ago

  • Status changed from ASSIGNED to POST

#25 Updated by daviddavis 8 months ago

  • Related to Story #4498: As a user, I can use chunked uploading with S3 added

#26 Updated by daviddavis 8 months ago

  • Status changed from POST to MODIFIED

#27 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#28 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3, Pulp 3 RC Blocker)

#29 Updated by kersom 5 months ago

  • Related to Issue #4896: [Ruby client] Chunked Uploads API doesn't recognize file parameter added

#31 Updated by daviddavis 4 months ago

  • Related to Story #4982: As a user, I can set a checksum with each upload chunk to have the system verify the upload added

#32 Updated by daviddavis 4 months ago

  • Related to Story #4981: Remove incomplete chunked uploads after a set amount of time added

#33 Updated by daviddavis 4 months ago

  • Related to Story #4988: As a user, I can remove uploads added

Please register to edit this issue

Also available in: Atom PDF