Story #4196
closedAs a user, I can upload files in chunks.
0%
Description
Pulp needs to allow users to upload large files in chunks. There is already a django project[0] out there that helps solve this problem. Pulp should integrate this into it's REST API.
django-chunked-upload does not currently work with S3, but there is a relatively small PR[1] to make it possible.
It would also be nice if the chunks could be uploaded in parallel, but that is not currently possible[2] with django-chunked-upload.
[0] https://github.com/juliomalegria/django-chunked-upload
[1] https://github.com/juliomalegria/django-chunked-upload/pull/39
[2] https://github.com/juliomalegria/django-chunked-upload/issues/45
Related issues
Updated by akofink almost 6 years ago
Pulp 3 should accept very large files (~10GB or more) via chunked uploads like Pulp 2 does, and the API documentation should detail how to do this.
Updated by kersom almost 6 years ago
- Related to Test #4197: Test upload of large artifacts added
Updated by dkliban@redhat.com almost 6 years ago
I just tested the same command directly on the django web server and I was able to upload 14mb file.
This seems to be a problem with the nginx config provided by the installer.
Updated by amacdona@redhat.com almost 6 years ago
- Has duplicate Issue #4214: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE added
Updated by amacdona@redhat.com almost 6 years ago
- Has duplicate deleted (Issue #4214: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE)
Updated by dkliban@redhat.com almost 6 years ago
- Tracker changed from Issue to Story
- Subject changed from Cannot upload large artifacts to As a user, I can upload files in chunks.
- % Done set to 0
Updated by daviddavis almost 6 years ago
- Groomed changed from No to Yes
- Sprint Candidate changed from No to Yes
Updated by daviddavis almost 6 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to daviddavis
Updated by daviddavis almost 6 years ago
I emailed the author of django-chunked-upload since it's been a year since code has been merged in the repository. Waiting to hear back.
Updated by bmbouter almost 6 years ago
Here's another dead uploader project: https://github.com/douglasmiranda/django-fine-uploader
Updated by daviddavis over 5 years ago
I found another package that seems to be (more?) active. It fits our needs better since it integrates with DRF (as opposed to django). It's also based on django-chunked-upload. I'm going to try to use it.
Updated by daviddavis over 5 years ago
Looking at the django-chunked-uploader package, the workflow to create an artifact from a chunked upload would be something like the following. Assume we have 3 file chunks.
$ http --form PUT https://pulp3/8000/pulp/api/v3/uploads/ file@./chunk1
# {"url": "https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31", "offset": 10000, "expires": "2019-03-18T17:56:22.186Z"}
$ http --form PUT https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 file@./chunk2
$ http --form PUT https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 file@./chunk3
$ http POST https://pulp3:8000/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31 md5=0d599f0ec05c3bda8c3b8a68c32a1b47
# POSTing md5 creates the file
$ http POST https://pulp3:8000/pulp/api/v3/artifacts/ upload_id=/pulp/api/v3/uploads/5230ec1f59d1485d9d7974b853802e31
Updated by dkliban@redhat.com over 5 years ago
This looks good to me.
It would be nice if there was a solution that could support parallel chunk uploads.
Updated by daviddavis over 5 years ago
The django-fine-uploader package does but it's the most out of date package (hasn't been updated since April 2017) and has some gaps like Python 3 support. And also, it's not made to work with DRF so getting it to work with DRF will probably require some work.
Updated by jsherril@redhat.com over 5 years ago
Looks good to me too! I guess you don't have to worry about offsets and what not, you just upload chunks in a serial manner?
Updated by daviddavis over 5 years ago
That's correct. Does Katello require parallel chunk uploads?
I'm thinking of using drf-chunked-upload and (eventually) opening a PR against it with support for parallel chunk uploads.
Updated by jsherril@redhat.com over 5 years ago
No, i would not say its in our requirements to upload multiple chunks of the same file at once.
Updated by bmbouter over 5 years ago
@daviddavis your plan sounds good to me. There are some users who do value parallel chunk support for very large files, e.g. isos, but that can definitely come later. Thank you for looking into this upload feature.
Updated by daviddavis over 5 years ago
- Related to Task #4486: Uploading requires use of md5 added
Updated by daviddavis over 5 years ago
- Related to Story #4488: As a user, I can upload chunks in parallel added
Updated by daviddavis over 5 years ago
- Status changed from ASSIGNED to POST
Updated by daviddavis over 5 years ago
- Related to Story #4498: As a user, I can use chunked uploading with S3 added
Added by daviddavis over 5 years ago
Updated by kersom over 5 years ago
- Related to Issue #4896: [Ruby client] Chunked Uploads API doesn't recognize file parameter added
Updated by daviddavis over 5 years ago
- Related to Story #4982: As a user, I can set a checksum with each upload chunk to have the system verify the upload added
Updated by daviddavis over 5 years ago
- Related to Story #4981: Remove incomplete chunked uploads after a set amount of time added
Updated by daviddavis over 5 years ago
- Related to Story #4988: As a user, I can remove uploads added
Updated by bmbouter almost 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Updated by ggainey over 4 years ago
- Tags Katello added
- Tags deleted (
Katello-P1)
Updated by bmbouter over 4 years ago
- Category deleted (
14)
We are removing the 'API' category per open floor discussion June 16, 2020.
Add support for chunked uploads
fixes #4196 https://pulp.plan.io/issues/4196