Issue #5087
closedCreating artifact in pulp3 fails for big files
Description
Used Version¶
pulp_source_dir: "git+https://github.com/pulp/pulpcore.git@3.0.0rc3"
pulp_plugin_source_dir:
"git+https://github.com/pulp/pulpcore-plugin.git@0.1.0rc3"
pulp_install_plugins:
pulp-rpm:
app_label: "rpm"
source_dir: "git+https://github.com/pulp/pulp_rpm.git@3.0.0b4"
Steps to reproduce¶
I created different size files
dd if=/dev/zero of=500m.bin bs=256M count=2
dd if=/dev/zero of=750m.bin bs=256M count=3
dd if=/dev/zero of=1g.bin bs=256M count=4
dd if=/dev/zero of=1.5g.bin bs=256M count=6
dd if=/dev/zero of=5.5g.bin bs=256M count=22
Using script test-chunk.sh join to this ticket I do:
./test-chunk.sh 500m.bin # OK
./test-chunk.sh 750m.bin # OK
./test-chunk.sh 1g.bin # OK
./test-chunk.sh 1.5g.bin # Fails with error
Creating artifact
http: error: Request timed out (30s).
Changing the script and adding bigger timeout
http --timeout=120 POST $PORT/pulp/api/v3/artifacts/ upload=$UPLOAD
I get the error:
Creating artifact
http: error: ConnectionError: ('Connection aborted.', BadStatusLine("''",)) while doing POST request to URL: http://dev-pulp-server.ptci.dev:24817/pulp/api/v3/artifacts/
Trynig the bigest file 5.5g.bin I get the error:
./test-chunk.sh 5.5g.bin
...
...
Creating artifact
HTTP/1.1 500 Internal Server Error
Connection: close
Content-Length: 27
Content-Type: text/html
Date: Fri, 05 Jul 2019 09:59:10 GMT
Server: gunicorn/19.9.0
Vary: Cookie
X-Frame-Options: SAMEORIGIN
<h1>Server Error (500)</h1>
In the server the upload files seems OK
[root@dev-pulp-server upload]# pwd
/var/lib/pulp/upload
[root@dev-pulp-server upload]# ls -lhs
total 9.5G
1.5G -rw-r--r--. 1 pulp pulp 1.5G Jul 5 11:44 3259c600-29ad-4629-a7f4-fa56add68b7d
5.5G -rw-r--r--. 1 pulp pulp 5.5G Jul 5 11:58 5bbe89e6-2f86-4738-a196-b3ed4c88d8de
1.0G -rw-r--r--. 1 pulp pulp 1.0G Jul 5 11:35 66d19833-0eea-4bfb-af8d-54bb6840d9cb
1.5G -rw-r--r--. 1 pulp pulp 1.5G Jul 5 11:38 90af4a0d-6f1a-4f14-9b47-67f7327fe067
[root@dev-pulp-server upload]# sha256sum 5bbe89e6-2f86-4738-a196-b3ed4c88d8de
4da89f41df88aa946bee824842471f89ac378b337dcf5cef2dafa53bb1e82cc6 5bbe89e6-2f86-4738-a196-b3ed4c88d8de
In the client
[vagrant@dev-pulp-client scripts]$ sha256sum 5.5g.bin
4da89f41df88aa946bee824842471f89ac378b337dcf5cef2dafa53bb1e82cc6 5.5g.bin
Files
Related issues
Updated by daviddavis over 5 years ago
- Subject changed from Creating artifact in pulp3 fails for big uploaded files in chunks to Creating artifact in pulp3 fails for big files
Thanks for the excellent bug report. It makes investigating these issues easy.
I looked into why artifact creation is failing for files < 2GB. The reason is that it's taking too long to calculate the checksums. There are 6 checksum types and each one takes about 4-8 seconds from the command line in my test environment. Calculating the digests in Python seems to add about 1-2 seconds. The default timeout in gunicorn is 30 seconds after which you get:
Jul 05 14:21:56 pulp3 gunicorn[13691]: [2019-07-05 14:21:56 +0000] [13691] [CRITICAL] WORKER TIMEOUT (pid:29843)
Jul 05 14:21:57 pulp3 gunicorn[13691]: [2019-07-05 14:21:57 +0000] [30031] [INFO] Booting worker with pid: 30031
You can raise this timeout or also you can pass in the checksums when creating the artifact[0]. I think the best solution though might be to make artifact creation a background task.
[0] http POST :24817/pulp/api/v3/artifacts/ upload=$UPLOAD sha256=abc...
Updated by bmbouter over 5 years ago
+1 to moving this to a task. It's there to allow for long-running workloads like this one.
Updated by daviddavis over 5 years ago
- Related to Issue #4998: Artifact size is limited to 2 GB added
Updated by dkliban@redhat.com over 5 years ago
We should calculate the checksums of each chunk and then simply add tehm up at the end. That way the final request can be performed quickly.
Updated by amacdona@redhat.com over 5 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 55
Updated by daviddavis over 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to daviddavis
Updated by dkliban@redhat.com over 5 years ago
- Sprint changed from Sprint 55 to Sprint 56
Updated by dkliban@redhat.com over 5 years ago
Artifact creation API calculates the checksums of the upload as it is being received. So this call can stay synchronous. However, we should make the 'upload_commit] operation[0] asynchronous. The checksums calculated during that task should then be saved to the db so they can be used for creating an artifact from the upload.
[0] https://docs.pulpproject.org/en/3.0/nightly/restapi.html#operation/uploads_commit
Updated by daviddavis over 5 years ago
The upload commit action only calculates the sha256 checksum. We'd have to duplicate the logic that calculates checksums from artifact creation to upload commit. Why avoid having a background task for artifact creation?
Updated by dkliban@redhat.com over 5 years ago
@daviddavis and I discussed this some more on IRC and here is the plan we came up with:
Make the 'uploads_commit'[0] return a 202 and calculate the checksum of a file in a task. The created_resource of that task will be an Artifact.
Remove the ability of the user to submit an upload href when creating an Artifact with 'artifacts_create'[1].
[0] https://docs.pulpproject.org/en/3.0/nightly/restapi.html#operation/uploads_commit
[1] https://docs.pulpproject.org/en/3.0/nightly/restapi.html#operation/artifacts_create
Updated by daviddavis over 5 years ago
- Assignee changed from daviddavis to fao89
Updated by daviddavis over 5 years ago
Regarding the design in https://pulp.plan.io/issues/5087#note-11, we have a PUT /uploads/<uuid>/commit/
endpoint that dispatches a task that (among other things) creates an artifact. This artifact is set as a created_resource in the task.
The problem is that pulp-smash is not set up to handle such a case currently as it expects an endpoint that creates a resource to use POST[0]. I lean towards keeping it PUT since the main action is to commit the upload and the artifact creation is a side effect.
Looking for feedback.
Updated by dkliban@redhat.com over 5 years ago
pulp-smash should not drive our design. However, I always associate PUT requests with specific resources. In this case the user is making a request on an action URL for the resource. So doing a POST to /pulp/api/v3/uploads/<id>/commit/' seems most appropriate.
Updated by daviddavis over 5 years ago
- Status changed from ASSIGNED to POST
Added by Fabricio Aguiar over 5 years ago
Added by Fabricio Aguiar over 5 years ago
Revision 95e51304 | View on GitHub
async artifact creation
closes #5087
Updated by Anonymous over 5 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|95e513047cfc8a432a6faf9e1ebe868ff5a46091.
Updated by bmbouter almost 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
change UploadViewSet.commit to POST?
ref #5087