Project

Profile

Help

Story #4196

closed

As a user, I can upload files in chunks.

Added by akofink about 6 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Katello
Sprint:
Sprint 49
Quarter:

Description

Pulp needs to allow users to upload large files in chunks. There is already a django project[0] out there that helps solve this problem. Pulp should integrate this into it's REST API.

django-chunked-upload does not currently work with S3, but there is a relatively small PR[1] to make it possible.

It would also be nice if the chunks could be uploaded in parallel, but that is not currently possible[2] with django-chunked-upload.

[0] https://github.com/juliomalegria/django-chunked-upload
[1] https://github.com/juliomalegria/django-chunked-upload/pull/39
[2] https://github.com/juliomalegria/django-chunked-upload/issues/45


Related issues

Related to Pulp - Test #4197: Test upload of large artifactsCLOSED - WONTFIXkersomActions
Related to Pulp - Task #4486: Uploading requires use of md5CLOSED - CURRENTRELEASEdaviddavis

Actions
Related to Pulp - Story #4488: As a user, I can upload chunks in parallelCLOSED - CURRENTRELEASEdaviddavis

Actions
Related to Pulp - Story #4498: As a user, I can use chunked uploading with S3CLOSED - CURRENTRELEASElmjachky

Actions
Related to Pulp - Issue #4896: [Ruby client] Chunked Uploads API doesn't recognize file parameterCLOSED - CURRENTRELEASEdkliban@redhat.comActions
Related to Pulp - Story #4982: As a user, I can set a checksum with each upload chunk to have the system verify the uploadCLOSED - CURRENTRELEASEfao89

Actions
Related to Pulp - Story #4981: Remove incomplete chunked uploads after a set amount of timeCLOSED - WONTFIX

Actions
Related to Pulp - Story #4988: As a user, I can remove uploadsCLOSED - CURRENTRELEASEdaviddavis

Actions
Actions #1

Updated by akofink about 6 years ago

Pulp 3 should accept very large files (~10GB or more) via chunked uploads like Pulp 2 does, and the API documentation should detail how to do this.

Actions #2

Updated by kersom about 6 years ago

  • Related to Test #4197: Test upload of large artifacts added
Actions #3

Updated by dkliban@redhat.com about 6 years ago

I just tested the same command directly on the django web server and I was able to upload 14mb file.

This seems to be a problem with the nginx config provided by the installer.

Actions #4

Updated by amacdona@redhat.com about 6 years ago

  • Has duplicate Issue #4214: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE added
Actions #5

Updated by amacdona@redhat.com about 6 years ago

  • Has duplicate deleted (Issue #4214: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE)
Actions #6

Updated by dkliban@redhat.com about 6 years ago

  • Tracker changed from Issue to Story
  • Subject changed from Cannot upload large artifacts to As a user, I can upload files in chunks.
  • % Done set to 0
Actions #7

Updated by daviddavis almost 6 years ago

  • Tags Pulp 3 RC Blocker added
Actions #8

Updated by dkliban@redhat.com almost 6 years ago

  • Description updated (diff)
Actions #9

Updated by daviddavis almost 6 years ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #10

Updated by ttereshc almost 6 years ago

  • Sprint set to Sprint 49
Actions #11

Updated by daviddavis almost 6 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
Actions #12

Updated by daviddavis almost 6 years ago

I emailed the author of django-chunked-upload since it's been a year since code has been merged in the repository. Waiting to hear back.

Actions #13

Updated by bmbouter almost 6 years ago

Here's another dead uploader project: https://github.com/douglasmiranda/django-fine-uploader

Actions #14

Updated by daviddavis almost 6 years ago

I found another package that seems to be (more?) active. It fits our needs better since it integrates with DRF (as opposed to django). It's also based on django-chunked-upload. I'm going to try to use it.

https://github.com/jkeifer/drf-chunked-upload

Actions #15

Updated by daviddavis almost 6 years ago

Looking at the django-chunked-uploader package, the workflow to create an artifact from a chunked upload would be something like the following. Assume we have 3 file chunks.

$ http --form PUT https://pulp3/8000/pulp/api/v3/uploads/ file@./chunk1
# {"url": "https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31", "offset": 10000, "expires": "2019-03-18T17:56:22.186Z"}
$ http --form PUT https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 file@./chunk2
$ http --form PUT https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 file@./chunk3 
$ http POST https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 md5=0d599f0ec05c3bda8c3b8a68c32a1b47
# POSTing md5 creates the file
$ http POST https://pulp3:8000/pulp/api/v3/artifacts/ upload_id=/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31
Actions #16

Updated by dkliban@redhat.com almost 6 years ago

This looks good to me.

It would be nice if there was a solution that could support parallel chunk uploads.

Actions #17

Updated by daviddavis almost 6 years ago

The django-fine-uploader package does but it's the most out of date package (hasn't been updated since April 2017) and has some gaps like Python 3 support. And also, it's not made to work with DRF so getting it to work with DRF will probably require some work.

Actions #18

Updated by jsherril@redhat.com almost 6 years ago

Looks good to me too! I guess you don't have to worry about offsets and what not, you just upload chunks in a serial manner?

Actions #19

Updated by daviddavis almost 6 years ago

That's correct. Does Katello require parallel chunk uploads?

I'm thinking of using drf-chunked-upload and (eventually) opening a PR against it with support for parallel chunk uploads.

Actions #20

Updated by jsherril@redhat.com almost 6 years ago

No, i would not say its in our requirements to upload multiple chunks of the same file at once.

Actions #21

Updated by bmbouter almost 6 years ago

@daviddavis your plan sounds good to me. There are some users who do value parallel chunk support for very large files, e.g. isos, but that can definitely come later. Thank you for looking into this upload feature.

Actions #22

Updated by daviddavis almost 6 years ago

  • Related to Task #4486: Uploading requires use of md5 added
Actions #23

Updated by daviddavis almost 6 years ago

  • Related to Story #4488: As a user, I can upload chunks in parallel added
Actions #24

Updated by daviddavis almost 6 years ago

  • Status changed from ASSIGNED to POST
Actions #25

Updated by daviddavis almost 6 years ago

  • Related to Story #4498: As a user, I can use chunked uploading with S3 added
Actions #26

Updated by daviddavis almost 6 years ago

  • Status changed from POST to MODIFIED
Actions #27

Updated by daviddavis over 5 years ago

  • Sprint/Milestone set to 3.0.0
Actions #28

Updated by bmbouter over 5 years ago

  • Tags deleted (Pulp 3, Pulp 3 RC Blocker)
Actions #29

Updated by kersom over 5 years ago

  • Related to Issue #4896: [Ruby client] Chunked Uploads API doesn't recognize file parameter added
Actions #31

Updated by daviddavis over 5 years ago

  • Related to Story #4982: As a user, I can set a checksum with each upload chunk to have the system verify the upload added
Actions #32

Updated by daviddavis over 5 years ago

  • Related to Story #4981: Remove incomplete chunked uploads after a set amount of time added
Actions #33

Updated by daviddavis over 5 years ago

  • Related to Story #4988: As a user, I can remove uploads added
Actions #34

Updated by bmbouter about 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #35

Updated by ggainey over 4 years ago

  • Tags Katello added
  • Tags deleted (Katello-P1)
Actions #36

Updated by bmbouter over 4 years ago

  • Category deleted (14)

We are removing the 'API' category per open floor discussion June 16, 2020.

Also available in: Atom PDF