Issue #2618
Updated by Ichimonji10 over 7 years ago
Related to https://pulp.plan.io/issues/1781. Let's say you create, populate and publish a docker repository. That done, "blob" files will be available at paths in the form <code>/pulp/docker/v2/{repo_id}/blobs/{blob_sum}</code>. As a concrete example, one repository I worked with made the following URLs available: * <code>https://rhel-6-8-pulp-2-12/pulp/docker/v2/8f12187a-8f95-488c-8f1b-c627f404f809/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4</code> * <code>https://rhel-6-8-pulp-2-12/pulp/docker/v2/8f12187a-8f95-488c-8f1b-c627f404f809/blobs/sha256:ffc8a12d3678ba8f82b54c3a9ca8260f56ce4be47748743658d89d8f39e80a04</code> These "blob" files are gzip-encoded binary files. **When a client requests a blob, they expect to receive a gzip-encoded file**. A client can verify that they've received a valid file by calculating the checksum of the downloaded file and asserting that it matches the checksum embedded in the file name: <pre>$ wget --server-response --no-check-certificate 'https://rhel-7-3-pulp-2-12/pulp/docker/v2/ff380cd9-f931-4bc7-9198-eec900f19610/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' --2017-03-01 18:18:46-- https://rhel-7-3-pulp-2-12/pulp/docker/v2/ff380cd9-f931-4bc7-9198-eec900f19610/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Resolving rhel-7-3-pulp-2-12... 192.168.100.177 Connecting to rhel-7-3-pulp-2-12|192.168.100.177|:443... connected. WARNING: cannot verify rhel-7-3-pulp-2-12's certificate, issued by ‘CN=PulpCA,OU=Development,O=Pulp,L=Raleigh,ST=North Carolina,C=US’: Unable to locally verify the issuer's authority. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Wed, 01 Mar 2017 18:18:21 GMT Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5 Last-Modified: Wed, 01 Mar 2017 18:12:54 GMT ETag: "20-549af42ef90f3" Accept-Ranges: bytes Content-Length: 32 Docker-Distribution-API-Version: registry/2.0 Keep-Alive: timeout=5, max=10000 Connection: Keep-Alive Length: 32 Saving to: ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ sha256:a3ed95caeb02ffe68cdd9fd844 100%[===========================================================>] 32 --.-KB/s in 0s 2017-03-01 18:18:46 (4.23 MB/s) - ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ saved [32/32] $ ls -1 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 $ file 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data $ sha256sum 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 </pre> So far, so good. The trouble is with how RHEL 6 handles these requests. Some background information: * The 'Content-Type' header states the type of the file as requested by the client application. In this case, a client requested a gzip archive. As a result, the <code>Content-Type: application/x-gzip</code> header should be set, or the 'Content-Type' header should be omitted entirely. * The 'Content-Encoding' header states which additional encoding, if any, has been applied **on top of what the client requested**. In this case, any encoding supported by both wget and the server may be applied, but given that the file is already gzip-encoded, it doesn't make sense to further encode the file, and the 'Content-Encoding' header should be omitted. Here's what RHEL 6 actually does: <pre>$ wget --server-response --no-check-certificate 'https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f62ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' --2017-03-01 18:24:32-- https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f62ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Resolving rhel-6-8-pulp-2-12... 192.168.100.79 Connecting to rhel-6-8-pulp-2-12|192.168.100.79|:443... connected. WARNING: cannot verify rhel-6-8-pulp-2-12's certificate, issued by ‘CN=PulpCA,OU=Development,O=Pulp,L=Raleigh,ST=North Carolina,C=US’: Unable to locally verify the issuer's authority. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Wed, 01 Mar 2017 18:24:31 GMT Server: Apache/2.2.15 (Red Hat) Last-Modified: Wed, 01 Mar 2017 18:13:11 GMT ETag: "9ff30-20-549af43f32d1f" Accept-Ranges: bytes Content-Length: 32 Docker-Distribution-API-Version: registry/2.0 Connection: close Content-Type: text/plain; charset=UTF-8 Content-Encoding: x-gzip Length: 32 [text/plain] Saving to: ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ sha256:a3ed95caeb02ffe68cdd9fd844 100%[===========================================================>] 32 --.-KB/s in 0s 2017-03-01 18:24:32 (3.40 MB/s) - ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ saved [32/32] $ ls -1 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 $ file 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data $ sha256sum 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 $ gunzip --to-stdout 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' | sha256sum 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef - </pre> Notice how the 'Content-Encoding' header is set? That's wrong. In this case, wget ignores the header and doesn't gunzip the file before saving it to disk. This behaviour is regressive, and I've gunzipped and checksummed the file to show what will happen with other, more compliant libraries. One such example is Python's "requests", which complies with the 'Content-Encoding' header. For example, check out this simple script: <pre><code class="python">#!/usr/bin/env python3 from contextlib import closing import requests def main(): url = ( 'https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f6' '2ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb164' '22d00e8a7c22955b46d4' ) with open('blob-decoded', 'wb') as handle: handle.write(requests.get(url, verify=False).content) with closing(requests.get(url, verify=False)) as response: handle.write(response.content) if __name__ == '__main__': exit(main()) </code></pre> The <code>blob-decoded</code> file written to disk is gunzipped, as suggested by the 'Content-Encoding' header: <pre>$ ls -1 get_decoded.py $ ./get_decoded.py /usr/lib/python3.6/site-packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning) $ ls -1 blob-decoded get_decoded.py $ file blob-decoded blob-decoded: data $ sha256sum blob-decoded 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef blob-decoded </pre> Notice the checksum of the file? It's the same as in the case where the file is fetched with <code>wget</code> and then manually gunzipped. The long and short of it is that RHEL 6 is incorrectly adding a <code>Content-Encoding: x-gzip</code> header to docker blobs. No other platforms do this. This is true for the current Pulp 2.12 nightlies. Here's the packages installed on my current RHEL 6 test system: <pre>[root@rhel-6-8-pulp-2-12 ~]# rpm -qa | grep -i httpd httpd-tools-2.2.15-56.el6_8.3.x86_64 httpd-2.2.15-56.el6_8.3.x86_64 [root@rhel-6-8-pulp-2-12 ~]# rpm -qa | grep -i pulp | sort mod_wsgi-3.4-2.pulp.el6.x86_64 pulp-admin-client-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch pulp-docker-admin-extensions-2.3.1-0.1.alpha.git.5.052c506.el6.noarch pulp-docker-plugins-2.3.1-0.1.alpha.git.5.052c506.el6.noarch pulp-puppet-admin-extensions-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch pulp-puppet-plugins-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch pulp-python-admin-extensions-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch pulp-python-plugins-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch pulp-rpm-admin-extensions-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch pulp-rpm-plugins-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch pulp-selinux-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch pulp-server-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch python-isodate-0.5.0-4.pulp.el6.noarch python-kombu-3.0.33-6.pulp.el6.noarch python-pulp-bindings-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch python-pulp-client-lib-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch python-pulp-common-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch python-pulp-docker-common-2.3.1-0.1.alpha.git.5.052c506.el6.noarch python-pulp-oid_validation-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch python-pulp-puppet-common-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch python-pulp-python-common-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch python-pulp-repoauth-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch python-pulp-rpm-common-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch python-pulp-streamer-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch </pre>