Project

Profile

Help

Story #4196

As a user, I can upload files in chunks.

Added by akofink almost 2 years ago. Updated 4 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Katello
Sprint:
Sprint 49
Quarter:

Description

Pulp needs to allow users to upload large files in chunks. There is already a django project[0] out there that helps solve this problem. Pulp should integrate this into it's REST API.

django-chunked-upload does not currently work with S3, but there is a relatively small PR[1] to make it possible.

It would also be nice if the chunks could be uploaded in parallel, but that is not currently possible[2] with django-chunked-upload.

[0] https://github.com/juliomalegria/django-chunked-upload
[1] https://github.com/juliomalegria/django-chunked-upload/pull/39
[2] https://github.com/juliomalegria/django-chunked-upload/issues/45


Related issues

Related to Pulp - Test #4197: Test upload of large artifactsCLOSED - WONTFIX<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Related to Pulp - Task #4486: Uploading requires use of md5CLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Related to Pulp - Story #4488: As a user, I can upload chunks in parallelCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Related to Pulp - Story #4498: As a user, I can use chunked uploading with S3POST

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Related to Pulp - Issue #4896: [Ruby client] Chunked Uploads API doesn't recognize file parameterCLOSED - CURRENTRELEASE<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Related to Pulp - Story #4982: As a user, I can set a checksum with each upload chunk to have the system verify the uploadCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Related to Pulp - Story #4981: Remove incomplete chunked uploads after a set amount of timeCLOSED - WONTFIX

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Related to Pulp - Story #4988: As a user, I can remove uploadsCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

History

#1 Updated by akofink almost 2 years ago

Pulp 3 should accept very large files (~10GB or more) via chunked uploads like Pulp 2 does, and the API documentation should detail how to do this.

#2 Updated by kersom almost 2 years ago

  • Related to Test #4197: Test upload of large artifacts added

#3 Updated by dkliban@redhat.com almost 2 years ago

I just tested the same command directly on the django web server and I was able to upload 14mb file.

This seems to be a problem with the nginx config provided by the installer.

#4 Updated by amacdona@redhat.com almost 2 years ago

  • Has duplicate Issue #4214: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE added

#5 Updated by amacdona@redhat.com almost 2 years ago

  • Has duplicate deleted (Issue #4214: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE)

#6 Updated by dkliban@redhat.com almost 2 years ago

  • Tracker changed from Issue to Story
  • Subject changed from Cannot upload large artifacts to As a user, I can upload files in chunks.
  • % Done set to 0

#7 Updated by daviddavis over 1 year ago

  • Tags Pulp 3 RC Blocker added

#8 Updated by dkliban@redhat.com over 1 year ago

  • Description updated (diff)

#9 Updated by daviddavis over 1 year ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes

#10 Updated by ttereshc over 1 year ago

  • Sprint set to Sprint 49

#11 Updated by daviddavis over 1 year ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis

#12 Updated by daviddavis over 1 year ago

I emailed the author of django-chunked-upload since it's been a year since code has been merged in the repository. Waiting to hear back.

#13 Updated by bmbouter over 1 year ago

Here's another dead uploader project: https://github.com/douglasmiranda/django-fine-uploader

#14 Updated by daviddavis over 1 year ago

I found another package that seems to be (more?) active. It fits our needs better since it integrates with DRF (as opposed to django). It's also based on django-chunked-upload. I'm going to try to use it.

https://github.com/jkeifer/drf-chunked-upload

#15 Updated by daviddavis over 1 year ago

Looking at the django-chunked-uploader package, the workflow to create an artifact from a chunked upload would be something like the following. Assume we have 3 file chunks.

$ http --form PUT https://pulp3/8000/pulp/api/v3/uploads/ file@./chunk1
# {"url": "https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31", "offset": 10000, "expires": "2019-03-18T17:56:22.186Z"}
$ http --form PUT https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 file@./chunk2
$ http --form PUT https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 file@./chunk3 
$ http POST https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 md5=0d599f0ec05c3bda8c3b8a68c32a1b47
# POSTing md5 creates the file
$ http POST https://pulp3:8000/pulp/api/v3/artifacts/ upload_id=/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31

#16 Updated by dkliban@redhat.com over 1 year ago

This looks good to me.

It would be nice if there was a solution that could support parallel chunk uploads.

#17 Updated by daviddavis over 1 year ago

The django-fine-uploader package does but it's the most out of date package (hasn't been updated since April 2017) and has some gaps like Python 3 support. And also, it's not made to work with DRF so getting it to work with DRF will probably require some work.

#18 Updated by jsherril@redhat.com over 1 year ago

Looks good to me too! I guess you don't have to worry about offsets and what not, you just upload chunks in a serial manner?

#19 Updated by daviddavis over 1 year ago

That's correct. Does Katello require parallel chunk uploads?

I'm thinking of using drf-chunked-upload and (eventually) opening a PR against it with support for parallel chunk uploads.

#20 Updated by jsherril@redhat.com over 1 year ago

No, i would not say its in our requirements to upload multiple chunks of the same file at once.

#21 Updated by bmbouter over 1 year ago

daviddavis your plan sounds good to me. There are some users who do value parallel chunk support for very large files, e.g. isos, but that can definitely come later. Thank you for looking into this upload feature.

#22 Updated by daviddavis over 1 year ago

  • Related to Task #4486: Uploading requires use of md5 added

#23 Updated by daviddavis over 1 year ago

  • Related to Story #4488: As a user, I can upload chunks in parallel added

#24 Updated by daviddavis over 1 year ago

  • Status changed from ASSIGNED to POST

#25 Updated by daviddavis over 1 year ago

  • Related to Story #4498: As a user, I can use chunked uploading with S3 added

#26 Updated by daviddavis over 1 year ago

  • Status changed from POST to MODIFIED

#27 Updated by daviddavis over 1 year ago

  • Sprint/Milestone set to 3.0.0

#28 Updated by bmbouter over 1 year ago

  • Tags deleted (Pulp 3, Pulp 3 RC Blocker)

#29 Updated by kersom over 1 year ago

  • Related to Issue #4896: [Ruby client] Chunked Uploads API doesn't recognize file parameter added

#31 Updated by daviddavis over 1 year ago

  • Related to Story #4982: As a user, I can set a checksum with each upload chunk to have the system verify the upload added

#32 Updated by daviddavis over 1 year ago

  • Related to Story #4981: Remove incomplete chunked uploads after a set amount of time added

#33 Updated by daviddavis over 1 year ago

  • Related to Story #4988: As a user, I can remove uploads added

#34 Updated by bmbouter 11 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

#35 Updated by ggainey 6 months ago

  • Tags Katello added
  • Tags deleted (Katello-P1)

#36 Updated by bmbouter 4 months ago

  • Category deleted (14)

We are removing the 'API' category per open floor discussion June 16, 2020.

Please register to edit this issue

Also available in: Atom PDF