Creating artifact in pulp3 fails for big files
Steps to reproduce¶
I created different size files
dd if=/dev/zero of=500m.bin bs=256M count=2 dd if=/dev/zero of=750m.bin bs=256M count=3 dd if=/dev/zero of=1g.bin bs=256M count=4 dd if=/dev/zero of=1.5g.bin bs=256M count=6 dd if=/dev/zero of=5.5g.bin bs=256M count=22
Using script test-chunk.sh join to this ticket I do:
./test-chunk.sh 500m.bin # OK ./test-chunk.sh 750m.bin # OK ./test-chunk.sh 1g.bin # OK ./test-chunk.sh 1.5g.bin # Fails with error Creating artifact http: error: Request timed out (30s).
Changing the script and adding bigger timeout
http --timeout=120 POST $PORT/pulp/api/v3/artifacts/ upload=$UPLOAD
I get the error:
Creating artifact http: error: ConnectionError: ('Connection aborted.', BadStatusLine("''",)) while doing POST request to URL: http://dev-pulp-server.ptci.dev:24817/pulp/api/v3/artifacts/
Trynig the bigest file 5.5g.bin I get the error:
./test-chunk.sh 5.5g.bin ... ... Creating artifact HTTP/1.1 500 Internal Server Error Connection: close Content-Length: 27 Content-Type: text/html Date: Fri, 05 Jul 2019 09:59:10 GMT Server: gunicorn/19.9.0 Vary: Cookie X-Frame-Options: SAMEORIGIN <h1>Server Error (500)</h1>
In the server the upload files seems OK
[root@dev-pulp-server upload]# pwd /var/lib/pulp/upload [root@dev-pulp-server upload]# ls -lhs total 9.5G 1.5G -rw-r--r--. 1 pulp pulp 1.5G Jul 5 11:44 3259c600-29ad-4629-a7f4-fa56add68b7d 5.5G -rw-r--r--. 1 pulp pulp 5.5G Jul 5 11:58 5bbe89e6-2f86-4738-a196-b3ed4c88d8de 1.0G -rw-r--r--. 1 pulp pulp 1.0G Jul 5 11:35 66d19833-0eea-4bfb-af8d-54bb6840d9cb 1.5G -rw-r--r--. 1 pulp pulp 1.5G Jul 5 11:38 90af4a0d-6f1a-4f14-9b47-67f7327fe067 [root@dev-pulp-server upload]# sha256sum 5bbe89e6-2f86-4738-a196-b3ed4c88d8de 4da89f41df88aa946bee824842471f89ac378b337dcf5cef2dafa53bb1e82cc6 5bbe89e6-2f86-4738-a196-b3ed4c88d8de
In the client
[vagrant@dev-pulp-client scripts]$ sha256sum 5.5g.bin 4da89f41df88aa946bee824842471f89ac378b337dcf5cef2dafa53bb1e82cc6 5.5g.bin
#2 Updated by daviddavis 9 months ago
- Subject changed from Creating artifact in pulp3 fails for big uploaded files in chunks to Creating artifact in pulp3 fails for big files
Thanks for the excellent bug report. It makes investigating these issues easy.
I looked into why artifact creation is failing for files < 2GB. The reason is that it's taking too long to calculate the checksums. There are 6 checksum types and each one takes about 4-8 seconds from the command line in my test environment. Calculating the digests in Python seems to add about 1-2 seconds. The default timeout in gunicorn is 30 seconds after which you get:
Jul 05 14:21:56 pulp3 gunicorn: [2019-07-05 14:21:56 +0000]  [CRITICAL] WORKER TIMEOUT (pid:29843) Jul 05 14:21:57 pulp3 gunicorn: [2019-07-05 14:21:57 +0000]  [INFO] Booting worker with pid: 30031
You can raise this timeout or also you can pass in the checksums when creating the artifact. I think the best solution though might be to make artifact creation a background task.
 http POST :24817/pulp/api/v3/artifacts/ upload=$UPLOAD sha256=abc...
#9 Updated by email@example.com 9 months ago
Artifact creation API calculates the checksums of the upload as it is being received. So this call can stay synchronous. However, we should make the 'upload_commit] operation asynchronous. The checksums calculated during that task should then be saved to the db so they can be used for creating an artifact from the upload.
#11 Updated by firstname.lastname@example.org 9 months ago
@daviddavis and I discussed this some more on IRC and here is the plan we came up with:
Make the 'uploads_commit' return a 202 and calculate the checksum of a file in a task. The created_resource of that task will be an Artifact.
Remove the ability of the user to submit an upload href when creating an Artifact with 'artifacts_create'.
#13 Updated by daviddavis 8 months ago
Regarding the design in https://pulp.plan.io/issues/5087#note-11, we have a
PUT /uploads/<uuid>/commit/ endpoint that dispatches a task that (among other things) creates an artifact. This artifact is set as a created_resource in the task.
The problem is that pulp-smash is not set up to handle such a case currently as it expects an endpoint that creates a resource to use POST. I lean towards keeping it PUT since the main action is to commit the upload and the artifact creation is a side effect.
Looking for feedback.
#14 Updated by email@example.com 8 months ago
pulp-smash should not drive our design. However, I always associate PUT requests with specific resources. In this case the user is making a request on an action URL for the resource. So doing a POST to /pulp/api/v3/uploads/<id>/commit/' seems most appropriate.
Please register to edit this issue