Project

Profile

Help

Issue #2618

closed

"blob" files are delivered with incorrect content headers

Added by Ichimonji10 about 7 years ago. Updated almost 5 years ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version - Docker:
Platform Release:
Target Release - Docker:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Related to https://pulp.plan.io/issues/1781.

Let's say you create, populate and publish a docker repository. That done, "blob" files will be available at paths in the form /pulp/docker/v2/{repo_id}/blobs/{blob_sum}. As a concrete example, one repository I worked with made the following URLs available:

  • https://rhel-6-8-pulp-2-12/pulp/docker/v2/8f12187a-8f95-488c-8f1b-c627f404f809/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
  • https://rhel-6-8-pulp-2-12/pulp/docker/v2/8f12187a-8f95-488c-8f1b-c627f404f809/blobs/sha256:ffc8a12d3678ba8f82b54c3a9ca8260f56ce4be47748743658d89d8f39e80a04

These "blob" files are gzip-encoded binary files. When a client requests a blob, they expect to receive a gzip-encoded file. A client can verify that they've received a valid file by calculating the checksum of the downloaded file and asserting that it matches the checksum embedded in the file name:

$ wget --server-response --no-check-certificate 'https://rhel-7-3-pulp-2-12/pulp/docker/v2/ff380cd9-f931-4bc7-9198-eec900f19610/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'
--2017-03-01 18:18:46--  https://rhel-7-3-pulp-2-12/pulp/docker/v2/ff380cd9-f931-4bc7-9198-eec900f19610/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Resolving rhel-7-3-pulp-2-12... 192.168.100.177
Connecting to rhel-7-3-pulp-2-12|192.168.100.177|:443... connected.
WARNING: cannot verify rhel-7-3-pulp-2-12's certificate, issued by ‘CN=PulpCA,OU=Development,O=Pulp,L=Raleigh,ST=North Carolina,C=US’:
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Wed, 01 Mar 2017 18:18:21 GMT
  Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5
  Last-Modified: Wed, 01 Mar 2017 18:12:54 GMT
  ETag: "20-549af42ef90f3"
  Accept-Ranges: bytes
  Content-Length: 32
  Docker-Distribution-API-Version: registry/2.0
  Keep-Alive: timeout=5, max=10000
  Connection: Keep-Alive
Length: 32
Saving to: ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’

sha256:a3ed95caeb02ffe68cdd9fd844 100%[===========================================================>]      32  --.-KB/s    in 0s      

2017-03-01 18:18:46 (4.23 MB/s) - ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ saved [32/32]

$ ls -1
sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
$ file 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 
sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
$ sha256sum 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

So far, so good. The trouble is with how RHEL 6 handles these requests. Some background information:

  • The 'Content-Type' header states the type of the file as requested by the client application. In this case, a client requested a gzip archive. As a result, the Content-Type: application/x-gzip header should be set, or the 'Content-Type' header should be omitted entirely.
  • The 'Content-Encoding' header states which additional encoding, if any, has been applied on top of what the client requested. In this case, any encoding supported by both wget and the server may be applied, but given that the file is already gzip-encoded, it doesn't make sense to further encode the file, and the 'Content-Encoding' header should be omitted.

Here's what RHEL 6 actually does:

$ wget --server-response --no-check-certificate 'https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f62ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'
--2017-03-01 18:24:32--  https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f62ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Resolving rhel-6-8-pulp-2-12... 192.168.100.79
Connecting to rhel-6-8-pulp-2-12|192.168.100.79|:443... connected.
WARNING: cannot verify rhel-6-8-pulp-2-12's certificate, issued by ‘CN=PulpCA,OU=Development,O=Pulp,L=Raleigh,ST=North Carolina,C=US’:
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Wed, 01 Mar 2017 18:24:31 GMT
  Server: Apache/2.2.15 (Red Hat)
  Last-Modified: Wed, 01 Mar 2017 18:13:11 GMT
  ETag: "9ff30-20-549af43f32d1f"
  Accept-Ranges: bytes
  Content-Length: 32
  Docker-Distribution-API-Version: registry/2.0
  Connection: close
  Content-Type: text/plain; charset=UTF-8
  Content-Encoding: x-gzip
Length: 32 [text/plain]
Saving to: ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’

sha256:a3ed95caeb02ffe68cdd9fd844 100%[===========================================================>]      32  --.-KB/s    in 0s      

2017-03-01 18:24:32 (3.40 MB/s) - ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ saved [32/32]

$ ls -1
sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
$ file 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 
sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
$ sha256sum 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
$ gunzip --to-stdout 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' | sha256sum
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  -

Notice how the 'Content-Encoding' header is set? That's wrong. In this case, wget ignores the header and doesn't gunzip the file before saving it to disk. This behaviour is regressive, and I've gunzipped and checksummed the file to show what will happen with other, more compliant libraries. One such example is Python's "requests", which complies with the 'Content-Encoding' header. For example, check out this simple script:

#!/usr/bin/env python3
import requests

def main():
    url = (
        'https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f6'
        '2ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb164'
        '22d00e8a7c22955b46d4'
    )
    with open('blob-decoded', 'wb') as handle:
        handle.write(requests.get(url, verify=False).content)

if __name__ == '__main__':
    exit(main())

The blob-decoded file written to disk is gunzipped, as suggested by the 'Content-Encoding' header:

$ ls -1
get_decoded.py
$ ./get_decoded.py 
/usr/lib/python3.6/site-packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
$ ls -1
blob-decoded
get_decoded.py
$ file blob-decoded 
blob-decoded: data
$ sha256sum blob-decoded 
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  blob-decoded

Notice the checksum of the file? It's the same as in the case where the file is fetched with wget and then manually gunzipped.

The long and short of it is that RHEL 6 is incorrectly adding a Content-Encoding: x-gzip header to docker blobs. No other platforms do this. This is true for the current Pulp 2.12 nightlies. Here's the packages installed on my current RHEL 6 test system:

[root@rhel-6-8-pulp-2-12 ~]# rpm -qa | grep -i httpd
httpd-tools-2.2.15-56.el6_8.3.x86_64
httpd-2.2.15-56.el6_8.3.x86_64
[root@rhel-6-8-pulp-2-12 ~]# rpm -qa | grep -i pulp | sort
mod_wsgi-3.4-2.pulp.el6.x86_64
pulp-admin-client-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
pulp-docker-admin-extensions-2.3.1-0.1.alpha.git.5.052c506.el6.noarch
pulp-docker-plugins-2.3.1-0.1.alpha.git.5.052c506.el6.noarch
pulp-puppet-admin-extensions-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch
pulp-puppet-plugins-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch
pulp-python-admin-extensions-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch
pulp-python-plugins-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch
pulp-rpm-admin-extensions-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch
pulp-rpm-plugins-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch
pulp-selinux-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
pulp-server-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-isodate-0.5.0-4.pulp.el6.noarch
python-kombu-3.0.33-6.pulp.el6.noarch
python-pulp-bindings-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-client-lib-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-common-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-docker-common-2.3.1-0.1.alpha.git.5.052c506.el6.noarch
python-pulp-oid_validation-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-puppet-common-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch
python-pulp-python-common-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch
python-pulp-repoauth-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-rpm-common-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch
python-pulp-streamer-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch

Also available in: Atom PDF