Project

Profile

Help

Issue #5087

closed

Creating artifact in pulp3 fails for big files

Added by jcabrera almost 5 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 56
Quarter:

Description

Used Version

pulp_source_dir: "git+https://github.com/pulp/pulpcore.git@3.0.0rc3"
pulp_plugin_source_dir:
"git+https://github.com/pulp/pulpcore-plugin.git@0.1.0rc3"
pulp_install_plugins:
pulp-rpm:
app_label: "rpm"
source_dir: "git+https://github.com/pulp/pulp_rpm.git@3.0.0b4"

Steps to reproduce

I created different size files

dd if=/dev/zero of=500m.bin bs=256M count=2
dd if=/dev/zero of=750m.bin bs=256M count=3
dd if=/dev/zero of=1g.bin bs=256M count=4
dd if=/dev/zero of=1.5g.bin bs=256M count=6
dd if=/dev/zero of=5.5g.bin bs=256M count=22

Using script test-chunk.sh join to this ticket I do:

./test-chunk.sh 500m.bin # OK
./test-chunk.sh 750m.bin  # OK
./test-chunk.sh 1g.bin  # OK
./test-chunk.sh 1.5g.bin # Fails with error

Creating artifact

http: error: Request timed out (30s).

Changing the script and adding bigger timeout

http --timeout=120 POST $PORT/pulp/api/v3/artifacts/ upload=$UPLOAD

I get the error:

Creating artifact

http: error: ConnectionError: ('Connection aborted.', BadStatusLine("''",)) while doing POST request to URL: http://dev-pulp-server.ptci.dev:24817/pulp/api/v3/artifacts/

Trynig the bigest file 5.5g.bin I get the error:

./test-chunk.sh 5.5g.bin
...
...
Creating artifact
HTTP/1.1 500 Internal Server Error
Connection: close
Content-Length: 27
Content-Type: text/html
Date: Fri, 05 Jul 2019 09:59:10 GMT
Server: gunicorn/19.9.0
Vary: Cookie
X-Frame-Options: SAMEORIGIN

<h1>Server Error (500)</h1>

In the server the upload files seems OK

[root@dev-pulp-server upload]# pwd
/var/lib/pulp/upload
[root@dev-pulp-server upload]# ls -lhs
total 9.5G
1.5G -rw-r--r--. 1 pulp pulp 1.5G Jul  5 11:44 3259c600-29ad-4629-a7f4-fa56add68b7d
5.5G -rw-r--r--. 1 pulp pulp 5.5G Jul  5 11:58 5bbe89e6-2f86-4738-a196-b3ed4c88d8de
1.0G -rw-r--r--. 1 pulp pulp 1.0G Jul  5 11:35 66d19833-0eea-4bfb-af8d-54bb6840d9cb
1.5G -rw-r--r--. 1 pulp pulp 1.5G Jul  5 11:38 90af4a0d-6f1a-4f14-9b47-67f7327fe067
[root@dev-pulp-server upload]# sha256sum 5bbe89e6-2f86-4738-a196-b3ed4c88d8de
4da89f41df88aa946bee824842471f89ac378b337dcf5cef2dafa53bb1e82cc6  5bbe89e6-2f86-4738-a196-b3ed4c88d8de

In the client

[vagrant@dev-pulp-client scripts]$ sha256sum 5.5g.bin
4da89f41df88aa946bee824842471f89ac378b337dcf5cef2dafa53bb1e82cc6  5.5g.bin

Files

test-chunk.sh (1017 Bytes) test-chunk.sh jcabrera, 07/05/2019 11:48 AM

Related issues

Related to Pulp - Issue #4998: Artifact size is limited to 2 GBCLOSED - CURRENTRELEASEdaviddavisActions
Actions #1

Updated by daviddavis almost 5 years ago

  • Project changed from RPM Support to Pulp
Actions #2

Updated by daviddavis almost 5 years ago

  • Subject changed from Creating artifact in pulp3 fails for big uploaded files in chunks to Creating artifact in pulp3 fails for big files

Thanks for the excellent bug report. It makes investigating these issues easy.

I looked into why artifact creation is failing for files < 2GB. The reason is that it's taking too long to calculate the checksums. There are 6 checksum types and each one takes about 4-8 seconds from the command line in my test environment. Calculating the digests in Python seems to add about 1-2 seconds. The default timeout in gunicorn is 30 seconds after which you get:

Jul 05 14:21:56 pulp3 gunicorn[13691]: [2019-07-05 14:21:56 +0000] [13691] [CRITICAL] WORKER TIMEOUT (pid:29843)
Jul 05 14:21:57 pulp3 gunicorn[13691]: [2019-07-05 14:21:57 +0000] [30031] [INFO] Booting worker with pid: 30031

You can raise this timeout or also you can pass in the checksums when creating the artifact[0]. I think the best solution though might be to make artifact creation a background task.

[0] http POST :24817/pulp/api/v3/artifacts/ upload=$UPLOAD sha256=abc...

Actions #3

Updated by bmbouter almost 5 years ago

+1 to moving this to a task. It's there to allow for long-running workloads like this one.

Actions #4

Updated by daviddavis almost 5 years ago

  • Related to Issue #4998: Artifact size is limited to 2 GB added
Actions #5

Updated by dkliban@redhat.com almost 5 years ago

We should calculate the checksums of each chunk and then simply add tehm up at the end. That way the final request can be performed quickly.

Actions #6

Updated by amacdona@redhat.com almost 5 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 55
Actions #7

Updated by daviddavis almost 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
Actions #8

Updated by dkliban@redhat.com almost 5 years ago

  • Sprint changed from Sprint 55 to Sprint 56
Actions #9

Updated by dkliban@redhat.com almost 5 years ago

Artifact creation API calculates the checksums of the upload as it is being received. So this call can stay synchronous. However, we should make the 'upload_commit] operation[0] asynchronous. The checksums calculated during that task should then be saved to the db so they can be used for creating an artifact from the upload.

[0] https://docs.pulpproject.org/en/3.0/nightly/restapi.html#operation/uploads_commit

Actions #10

Updated by daviddavis almost 5 years ago

The upload commit action only calculates the sha256 checksum. We'd have to duplicate the logic that calculates checksums from artifact creation to upload commit. Why avoid having a background task for artifact creation?

Actions #11

Updated by dkliban@redhat.com almost 5 years ago

@daviddavis and I discussed this some more on IRC and here is the plan we came up with:

Make the 'uploads_commit'[0] return a 202 and calculate the checksum of a file in a task. The created_resource of that task will be an Artifact.

Remove the ability of the user to submit an upload href when creating an Artifact with 'artifacts_create'[1].

[0] https://docs.pulpproject.org/en/3.0/nightly/restapi.html#operation/uploads_commit
[1] https://docs.pulpproject.org/en/3.0/nightly/restapi.html#operation/artifacts_create

Actions #12

Updated by daviddavis almost 5 years ago

  • Assignee changed from daviddavis to fao89
Actions #13

Updated by daviddavis almost 5 years ago

Regarding the design in https://pulp.plan.io/issues/5087#note-11, we have a PUT /uploads/<uuid>/commit/ endpoint that dispatches a task that (among other things) creates an artifact. This artifact is set as a created_resource in the task.

The problem is that pulp-smash is not set up to handle such a case currently as it expects an endpoint that creates a resource to use POST[0]. I lean towards keeping it PUT since the main action is to commit the upload and the artifact creation is a side effect.

Looking for feedback.

[0] https://git.io/fjMjP

Actions #14

Updated by dkliban@redhat.com almost 5 years ago

pulp-smash should not drive our design. However, I always associate PUT requests with specific resources. In this case the user is making a request on an action URL for the resource. So doing a POST to /pulp/api/v3/uploads/<id>/commit/' seems most appropriate.

Actions #15

Updated by daviddavis almost 5 years ago

  • Status changed from ASSIGNED to POST

Added by Fabricio Aguiar almost 5 years ago

Revision 28b80238 | View on GitHub

change UploadViewSet.commit to POST?

ref #5087

Added by Fabricio Aguiar almost 5 years ago

Revision 95e51304 | View on GitHub

async artifact creation

closes #5087

Actions #16

Updated by Anonymous almost 5 years ago

  • Status changed from POST to MODIFIED
Actions #17

Updated by bmbouter over 4 years ago

  • Sprint/Milestone set to 3.0.0
Actions #18

Updated by bmbouter over 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF