Project

Profile

Help

Story #4456

closed

As a user I can use S3 as alternative storage

Added by daviddavis about 5 years ago. Updated almost 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 73
Quarter:

Description

Make sure that the pulp docker plugin works when using Pulp with S3. I see no reason why it shouldn't but it's worth testing to confirm.


Related issues

Related to Pulp - Story #3900: As a user, I can use Pulp3 on S3CLOSED - CURRENTRELEASEdaviddavis

Actions
Blocks Container Support - Task #6733: Enable S3 tests on CICLOSED - CURRENTRELEASEmdellweg

Actions
Actions #1

Updated by daviddavis about 5 years ago

  • Related to Story #3900: As a user, I can use Pulp3 on S3 added
Actions #2

Updated by bmbouter almost 5 years ago

  • Tags deleted (Pulp 3)
Actions #3

Updated by ipanova@redhat.com over 4 years ago

  • Tags Pulp 3 docker blocker added
Actions #4

Updated by fao89 over 4 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to fao89
Actions #5

Updated by ipanova@redhat.com over 4 years ago

  • Sprint set to Sprint 60
Actions #6

Updated by fao89 over 4 years ago

Followed the docs till reach here: https://pulp-docker.readthedocs.io/en/latest/workflows/host.html#pull-and-run-an-image-from-pulp
for docker:

(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ sudo docker pull localhost:24816/test
Using default tag: latest
Trying to pull repository localhost:24816/test ... 
error parsing HTTP 404 response body: invalid character ':' after top-level value: "404: Not Found"

On the server:

Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: [2019-10-15 19:37:12 +0000] [12574] [ERROR] Error handling request
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: Traceback (most recent call last):
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:   File "/usr/local/lib/pulp/lib64/python3.7/site-packages/aiohttp/web_protocol.py", line 275, in data_received
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:     messages, upgraded, tail = self._request_parser.feed_data(data)
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:   File "aiohttp/_http_parser.pyx", line 523, in aiohttp._http_parser.HttpParser.feed_data
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: aiohttp.http_exceptions.BadStatusLine: invalid HTTP method
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: 127.0.0.1 [15/Oct/2019:19:37:12 +0000] "GET /v2/ HTTP/1.1" 200 224 "-" "docker/1.13.1 go/go1.12.7 kernel/5.0.9-30
1.fc30.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))"
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: [2019-10-15 19:37:12 +0000] [12574] [ERROR] Error handling request
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: Traceback (most recent call last):
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:   File "/usr/local/lib/pulp/lib64/python3.7/site-packages/aiohttp/web_protocol.py", line 275, in data_received
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:     messages, upgraded, tail = self._request_parser.feed_data(data)
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:   File "aiohttp/_http_parser.pyx", line 523, in aiohttp._http_parser.HttpParser.feed_data
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: aiohttp.http_exceptions.BadStatusLine: invalid HTTP method
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: 127.0.0.1 [15/Oct/2019:19:37:12 +0000] "GET /v2/ HTTP/1.1" 200 224 "-" "docker/1.13.1 go/go1.12.7 kernel/5.0.9-30
1.fc30.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))"
Oct 15 19:37:12 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: 127.0.0.1 [15/Oct/2019:19:37:12 +0000] "GET /v2/test/manifests/latest HTTP/1.1" 404 191 "-" "docker/1.13.1 go/go1
.12.7 kernel/5.0.9-301.fc30.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))"

for podman:

(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ podman pull localhost:24816/test
Trying to pull localhost:24816/test...
  error parsing HTTP 404 response body: invalid character ':' after top-level value: "404: Not Found"
Error: error pulling image "localhost:24816/test": unable to pull localhost:24816/test: unable to pull image: Error initializing source docker://localhost:24816/test:latest: Error reading m$
nifest latest in localhost:24816/test: error parsing HTTP 404 response body: invalid character ':' after top-level value: "404: Not Found"

On the server:

Oct 15 19:36:37 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: [2019-10-15 19:36:37 +0000] [12575] [ERROR] Error handling request
Oct 15 19:36:37 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: Traceback (most recent call last):
Oct 15 19:36:37 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:   File "/usr/local/lib/pulp/lib64/python3.7/site-packages/aiohttp/web_protocol.py", line 275, in data_received
Oct 15 19:36:37 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:     messages, upgraded, tail = self._request_parser.feed_data(data)
Oct 15 19:36:37 pulp3-source-fedora30.localhost.example.com gunicorn[12571]:   File "aiohttp/_http_parser.pyx", line 523, in aiohttp._http_parser.HttpParser.feed_data
Oct 15 19:36:37 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: aiohttp.http_exceptions.BadStatusLine: invalid HTTP method
Oct 15 19:36:37 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: 127.0.0.1 [15/Oct/2019:19:36:37 +0000] "GET /v2/ HTTP/1.1" 200 224 "-" "libpod/1.6.1"
Oct 15 19:36:37 pulp3-source-fedora30.localhost.example.com gunicorn[12571]: 127.0.0.1 [15/Oct/2019:19:36:37 +0000] "GET /v2/test/manifests/latest HTTP/1.1" 404 191 "-" "libpod/1.6.1"
Actions #7

Updated by fao89 over 4 years ago

I destroyed vagrant and started again, but still having problems:

(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ podman pull localhost:24816/test
Trying to pull localhost:24816/test...
  Get https://localhost:24816/v2/: http: server gave HTTP response to HTTPS client
Error: error pulling image "localhost:24816/test": unable to pull localhost:24816/test: unable to pull image: Error initializing source docker://localhost:24816/test:latest: pinging docker r
egistry returned: Get https://localhost:24816/v2/: http: server gave HTTP response to HTTPS client

following the docs I edited: /etc/containers/registries.conf
and then:

(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ podman pull localhost:24816/test
Trying to pull localhost:24816/test...
  error parsing HTTP 404 response body: invalid character ':' after top-level value: "404: Not Found"
Error: error pulling image "localhost:24816/test": unable to pull localhost:24816/test: unable to pull image: Error initializing source docker://localhost:24816/test:latest: Error reading ma
nifest latest in localhost:24816/test: error parsing HTTP 404 response body: invalid character ':' after top-level value: "404: Not Found"
Actions #8

Updated by fao89 over 4 years ago

PS: I've been pinning host when self.context['request'] is None:

class RegistryPathField(serializers.CharField):
    """
    Serializer Field for the registry_path field of the DockerDistribution.
    """

    def to_representation(self, value):
        """
        Converts a base_path into a registry path.
        """
        if settings.CONTENT_HOST:
            host = settings.CONTENT_HOST
        else:
            try:
                host = self.context['request'].get_host()
            except:
                host = "http://localhost:24817"
        return ''.join([host, '/', value])
Actions #9

Updated by fao89 over 4 years ago

went to following script on the docs:

#!/usr/bin/env bash

DOCKER_TAG='manifest_a'

echo "Setting REGISTRY_PATH, which can be used directly with the Docker Client."
export REGISTRY_PATH=$(http $BASE_ADDR$DISTRIBUTION_HREF | jq -r '.registry_path')

echo "Next we pull the image from pulp and run it."
echo "$REGISTRY_PATH:$DOCKER_TAG"
sudo docker run $REGISTRY_PATH:$DOCKER_TAG

but I ran on podman:

(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ echo "$REGISTRY_PATH:$DOCKER_TAG"
localhost:24817/test:manifest_a
(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ podman run $REGISTRY_PATH:$DOCKER_TAG
Trying to pull localhost:24817/test:manifest_a...
  Get https://localhost:24817/v2/: net/http: TLS handshake timeout
Error: unable to pull localhost:24817/test:manifest_a: unable to pull image: Error initializing source docker://localhost:24817/test:manifest_a: pinging docker registry returned: Get https:/
/localhost:24817/v2/: net/http: TLS handshake timeout

changed the port:

(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ podman run localhost:24816/test:manifest_a                                                                                               
Trying to pull localhost:24816/test:manifest_a...
  received unexpected HTTP status: 500 Internal Server Error
Error: unable to pull localhost:24816/test:manifest_a: unable to pull image: Error initializing source docker://localhost:24816/test:manifest_a: Error reading manifest manifest_a in localhos
t:24816/test: received unexpected HTTP status: 500 Internal Server Error

On the server:

Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]: [2019-10-15 21:15:12 +0000] [1883] [ERROR] Error handling request
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]: Traceback (most recent call last):
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:   File "/usr/local/lib/pulp/lib64/python3.7/site-packages/aiohttp/web_protocol.py", line 275, in data_received
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:     messages, upgraded, tail = self._request_parser.feed_data(data)
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:   File "aiohttp/_http_parser.pyx", line 523, in aiohttp._http_parser.HttpParser.feed_data
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]: aiohttp.http_exceptions.BadStatusLine: invalid HTTP method
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]: 127.0.0.1 [15/Oct/2019:21:15:12 +0000] "GET /v2/ HTTP/1.1" 200 224 "-" "libpod/1.6.1"
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]: [2019-10-15 21:15:12 +0000] [1883] [ERROR] Error handling request
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]: Traceback (most recent call last):
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:   File "/usr/local/lib/pulp/lib64/python3.7/site-packages/aiohttp/web_protocol.py", line 418, in start
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:     resp = await task
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:   File "/usr/local/lib/pulp/lib64/python3.7/site-packages/aiohttp/web_app.py", line 458, in _handle
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:     resp = await handler(request)
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:   File "/home/vagrant/devel/pulp_docker/pulp_docker/app/registry.py", line 168, in get_tag
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:     return await Registry.dispatch_tag(tag, response_headers)
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:   File "/home/vagrant/devel/pulp_docker/pulp_docker/app/registry.py", line 191, in dispatch_tag
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:     response_headers)
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:   File "/home/vagrant/devel/pulp_docker/pulp_docker/app/registry.py", line 89, in _dispatch
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:     full_headers['Content-Length'] = os.path.getsize(path)
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:   File "/usr/lib64/python3.7/genericpath.py", line 50, in getsize
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]:     return os.stat(filename).st_size
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]: FileNotFoundError: [Errno 2] No such file or directory: 'artifact/21/e3caae28758329318c8a868a80daa37ad8851705155fc
28767852c73d36af5'
Oct 15 21:15:12 pulp3-source-fedora30.localhost.example.com gunicorn[1870]: 127.0.0.1 [15/Oct/2019:21:15:12 +0000] "GET /v2/test/manifests/manifest_a HTTP/1.1" 500 244 "-" "libpod/1.6.1"

But the artifact is there:
https://test-pulp3.s3.us-east-2.amazonaws.com/artifact/21/e3caae28758329318c8a868a80daa37ad8851705155fc28767852c73d36af5

I believe it is related to MEDIA_ROOT, which I set as MEDIA_ROOT = '' like on the docs: https://docs.pulpproject.org/en/3.0/nightly/installation/storage.html#configuring-pulp

Actions #10

Updated by fao89 over 4 years ago

with these changes:
https://github.com/pulp/pulp_docker/pull/433

(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ http GET $CONTENT_ADDR/v2/test/manifests/manifest_a "Accept:application/vnd.docker.distribution.manifest.v2+json"
HTTP/1.1 302 Found
Content-Disposition: attachment; filename=e3caae28758329318c8a868a80daa37ad8851705155fc28767852c73d36af5
Content-Length: 524
Content-Type: application/vnd.docker.distribution.manifest.v2+json; charset=utf-8
Date: Wed, 16 Oct 2019 15:20:20 GMT
Docker-Content-Digest: sha256:21e3caae28758329318c8a868a80daa37ad8851705155fc28767852c73d36af5
Docker-Distribution-API-Version: registry/2.0
Location: https://s3.us-east-2.amazonaws.com/test-pulp3/artifact/21/e3caae28758329318c8a868a80daa37ad8851705155fc28767852c73d36af5?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=********/20191016/us-east-2/s3/aws4_request&X-Amz-Date=20191016T152020Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=2ece0668eff58a64d2f8abbeb3bbe0be58bda69bb8bad3d7a26c0da0191d81d8
Server: Python/3.7 aiohttp/3.6.2

302: Found
(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ sudo docker pull localhost:24816/test:manifest_a
Trying to pull repository localhost:24816/test ... 
unsupported schema version 2
(pulp) [vagrant@pulp3-source-fedora30 pulp_docker]$ sudo docker run $REGISTRY_PATH:$DOCKER_TAG                                                                                                
Unable to find image 'localhost:24816/test:manifest_a' locally                                                                                                                                
Trying to pull repository localhost:24816/test ...                                                                                                                                            
/usr/bin/docker-current: unsupported schema version 2.    
Actions #11

Updated by ipanova@redhat.com over 4 years ago

We figured out that problem is that fact that the headers are not set on S3. Docker clients checks for content-type, digest, etc and other headers. otherwise it fails the pull.
There is a way how to set headers in s3 https://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html but in our case for each served file we need to somehow figure out its content-type and digest.

Actions #12

Updated by dkliban@redhat.com over 4 years ago

When Pulp is using S3, it needs to set these headers on the file when the Artifact is being created. That way S3 knows right away how to serve that file. Not sure how to achieve this though.

Actions #13

Updated by fao89 over 4 years ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (fao89)
Actions #14

Updated by daviddavis over 4 years ago

When Pulp is using S3, it needs to set these headers on the file when the Artifact is being created. That way S3 knows right away how to serve that file. Not sure how to achieve this though.

Does docker allow artifacts to be shared between content units? If so, what will it do in that case where an artifact could have different header info?

Another option might be to have the content app stream the artifact (although this is not really optimal).

Actions #15

Updated by bmbouter over 4 years ago

Isn't there a way to have AWS hand specific headers to the client by settings them upon the redirect?

We could manipulate those headers here maybe? https://github.com/pulp/pulpcore/blob/master/pulpcore/content/handler.py#L429

Actions #16

Updated by daviddavis over 4 years ago

Looks like maybe. Googling a bit lead me to this page: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html. See the section "Overriding Response Header Values". It looks like it's only a small subset of headers that can be specified.

Actions #17

Updated by ipanova@redhat.com over 4 years ago

bmbouter wrote:

Isn't there a way to have AWS hand specific headers to the client by settings them upon the redirect?

We could manipulate those headers here maybe? https://github.com/pulp/pulpcore/blob/master/pulpcore/content/handler.py#L429

This will only ensure that the 302 redirect contains proper headers.[0]
Problem consists in the fact that when the client follows the redirect through the "location' header it gets the file,stored on S3 that has no headers set in the response. These are docker registry specific headers and without them docker/podman client will refuse to pull.

[0] https://github.com/pulp/pulp_docker/pull/433/files#diff-1f37e1bf95e24a173326983f481027a7R103

Actions #18

Updated by ipanova@redhat.com over 4 years ago

wrote:

When Pulp is using S3, it needs to set these headers on the file when the Artifact is being created. That way S3 knows right away how to serve that file. Not sure how to achieve this though.

I still don't understand how that will help S3 to know how to serve the file? once we trigger 302 redirect we loose control by directing the client to S3 to fetch the file from there. Unless we can set somehow those headers directly in the response from S3, we won't be able to make docker pull work.

Actions #19

Updated by ipanova@redhat.com over 4 years ago

There is a way to specify headers when uploading file to S3
via direct s3 api https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html#API_PutObject_RequestSyntax

We need to take a look how to do it via boto3 or django-storages.

Actions #20

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 60 to Sprint 61
Actions #21

Updated by ipanova@redhat.com over 4 years ago

  • Sprint deleted (Sprint 61)
  • Tags deleted (Pulp 3 docker blocker)
Actions #23

Updated by ipanova@redhat.com over 4 years ago

  • Project changed from Docker Support to Container Support
Actions #24

Updated by ipanova@redhat.com almost 4 years ago

  • Tracker changed from Task to Story
  • Subject changed from Test out docker with S3 to As a user I can use S3 as alternative storage
Actions #25

Updated by ipanova@redhat.com almost 4 years ago

Actions #26

Updated by ipanova@redhat.com almost 4 years ago

  • Sprint set to Sprint 73
Actions #27

Updated by mdellweg almost 4 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to mdellweg
Actions #28

Updated by pulpbot almost 4 years ago

  • Status changed from ASSIGNED to POST
Actions #29

Updated by ipanova@redhat.com almost 4 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #30

Updated by ipanova@redhat.com almost 4 years ago

  • Sprint/Milestone set to 1.4.0
Actions #31

Updated by ipanova@redhat.com almost 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF