Issue #5087

Creating artifact in pulp3 fails for big files

Added by jcabrera 9 months ago. Updated 4 months ago.

Start date:
Due date:
2. Medium
Platform Release:
Blocks Release:
Backwards Incompatible:
Sprint Candidate:
QA Contact:
Smash Test:
Verification Required:
Sprint 56


Used Version

pulp_source_dir: "git+"
app_label: "rpm"
source_dir: "git+"

Steps to reproduce

I created different size files

dd if=/dev/zero of=500m.bin bs=256M count=2
dd if=/dev/zero of=750m.bin bs=256M count=3
dd if=/dev/zero of=1g.bin bs=256M count=4
dd if=/dev/zero of=1.5g.bin bs=256M count=6
dd if=/dev/zero of=5.5g.bin bs=256M count=22

Using script join to this ticket I do:

./ 500m.bin # OK
./ 750m.bin  # OK
./ 1g.bin  # OK
./ 1.5g.bin # Fails with error

Creating artifact

http: error: Request timed out (30s).

Changing the script and adding bigger timeout

http --timeout=120 POST $PORT/pulp/api/v3/artifacts/ upload=$UPLOAD

I get the error:

Creating artifact

http: error: ConnectionError: ('Connection aborted.', BadStatusLine("''",)) while doing POST request to URL:

Trynig the bigest file 5.5g.bin I get the error:

./ 5.5g.bin
Creating artifact
HTTP/1.1 500 Internal Server Error
Connection: close
Content-Length: 27
Content-Type: text/html
Date: Fri, 05 Jul 2019 09:59:10 GMT
Server: gunicorn/19.9.0
Vary: Cookie
X-Frame-Options: SAMEORIGIN

<h1>Server Error (500)</h1>

In the server the upload files seems OK

[root@dev-pulp-server upload]# pwd
[root@dev-pulp-server upload]# ls -lhs
total 9.5G
1.5G -rw-r--r--. 1 pulp pulp 1.5G Jul  5 11:44 3259c600-29ad-4629-a7f4-fa56add68b7d
5.5G -rw-r--r--. 1 pulp pulp 5.5G Jul  5 11:58 5bbe89e6-2f86-4738-a196-b3ed4c88d8de
1.0G -rw-r--r--. 1 pulp pulp 1.0G Jul  5 11:35 66d19833-0eea-4bfb-af8d-54bb6840d9cb
1.5G -rw-r--r--. 1 pulp pulp 1.5G Jul  5 11:38 90af4a0d-6f1a-4f14-9b47-67f7327fe067
[root@dev-pulp-server upload]# sha256sum 5bbe89e6-2f86-4738-a196-b3ed4c88d8de
4da89f41df88aa946bee824842471f89ac378b337dcf5cef2dafa53bb1e82cc6  5bbe89e6-2f86-4738-a196-b3ed4c88d8de

In the client

[vagrant@dev-pulp-client scripts]$ sha256sum 5.5g.bin
4da89f41df88aa946bee824842471f89ac378b337dcf5cef2dafa53bb1e82cc6  5.5g.bin (1017 Bytes) jcabrera, 07/05/2019 11:48 AM

Related issues

Related to Pulp - Issue #4998: Artifact size is limited to 2 GB CLOSED - CURRENTRELEASE Actions

Associated revisions

Revision 28b80238 View on GitHub
Added by Fabricio Aguiar 8 months ago

change UploadViewSet.commit to POST?

ref #5087

Revision 95e51304 View on GitHub
Added by Fabricio Aguiar 8 months ago

async artifact creation

closes #5087


#1 Updated by daviddavis 9 months ago

  • Project changed from RPM Support to Pulp

#2 Updated by daviddavis 9 months ago

  • Subject changed from Creating artifact in pulp3 fails for big uploaded files in chunks to Creating artifact in pulp3 fails for big files

Thanks for the excellent bug report. It makes investigating these issues easy.

I looked into why artifact creation is failing for files < 2GB. The reason is that it's taking too long to calculate the checksums. There are 6 checksum types and each one takes about 4-8 seconds from the command line in my test environment. Calculating the digests in Python seems to add about 1-2 seconds. The default timeout in gunicorn is 30 seconds after which you get:

Jul 05 14:21:56 pulp3 gunicorn[13691]: [2019-07-05 14:21:56 +0000] [13691] [CRITICAL] WORKER TIMEOUT (pid:29843)
Jul 05 14:21:57 pulp3 gunicorn[13691]: [2019-07-05 14:21:57 +0000] [30031] [INFO] Booting worker with pid: 30031

You can raise this timeout or also you can pass in the checksums when creating the artifact[0]. I think the best solution though might be to make artifact creation a background task.

[0] http POST :24817/pulp/api/v3/artifacts/ upload=$UPLOAD sha256=abc...

#3 Updated by bmbouter 9 months ago

+1 to moving this to a task. It's there to allow for long-running workloads like this one.

#4 Updated by daviddavis 9 months ago

  • Related to Issue #4998: Artifact size is limited to 2 GB added

#5 Updated by 9 months ago

We should calculate the checksums of each chunk and then simply add tehm up at the end. That way the final request can be performed quickly.

#6 Updated by 9 months ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 55

#7 Updated by daviddavis 9 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis

#8 Updated by 9 months ago

  • Sprint changed from Sprint 55 to Sprint 56

#9 Updated by 9 months ago

Artifact creation API calculates the checksums of the upload as it is being received. So this call can stay synchronous. However, we should make the 'upload_commit] operation[0] asynchronous. The checksums calculated during that task should then be saved to the db so they can be used for creating an artifact from the upload.


#10 Updated by daviddavis 9 months ago

The upload commit action only calculates the sha256 checksum. We'd have to duplicate the logic that calculates checksums from artifact creation to upload commit. Why avoid having a background task for artifact creation?

#11 Updated by 9 months ago

@daviddavis and I discussed this some more on IRC and here is the plan we came up with:

Make the 'uploads_commit'[0] return a 202 and calculate the checksum of a file in a task. The created_resource of that task will be an Artifact.

Remove the ability of the user to submit an upload href when creating an Artifact with 'artifacts_create'[1].


#12 Updated by daviddavis 9 months ago

  • Assignee changed from daviddavis to fabricio.aguiar

#13 Updated by daviddavis 8 months ago

Regarding the design in, we have a PUT /uploads/<uuid>/commit/ endpoint that dispatches a task that (among other things) creates an artifact. This artifact is set as a created_resource in the task.

The problem is that pulp-smash is not set up to handle such a case currently as it expects an endpoint that creates a resource to use POST[0]. I lean towards keeping it PUT since the main action is to commit the upload and the artifact creation is a side effect.

Looking for feedback.


#14 Updated by 8 months ago

pulp-smash should not drive our design. However, I always associate PUT requests with specific resources. In this case the user is making a request on an action URL for the resource. So doing a POST to /pulp/api/v3/uploads/<id>/commit/' seems most appropriate.

#15 Updated by daviddavis 8 months ago

  • Status changed from ASSIGNED to POST

#16 Updated by Anonymous 8 months ago

  • Status changed from POST to MODIFIED

#17 Updated by bmbouter 4 months ago

  • Sprint/Milestone set to 3.0.0

#18 Updated by bmbouter 4 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF