Project

Profile

Help

Issue #2618

Updated by Ichimonji10 over 7 years ago

Related to https://pulp.plan.io/issues/1781. 

 Let's say you create, populate and publish a docker repository. That done, "blob" files will be available at paths in the form <code>/pulp/docker/v2/{repo_id}/blobs/{blob_sum}</code>. As a concrete example, one repository I worked with made the following URLs available: 

 * <code>https://rhel-6-8-pulp-2-12/pulp/docker/v2/8f12187a-8f95-488c-8f1b-c627f404f809/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4</code> 
 * <code>https://rhel-6-8-pulp-2-12/pulp/docker/v2/8f12187a-8f95-488c-8f1b-c627f404f809/blobs/sha256:ffc8a12d3678ba8f82b54c3a9ca8260f56ce4be47748743658d89d8f39e80a04</code> 

 These "blob" files are gzip-encoded binary files. **When a client requests a blob, they expect to receive a gzip-encoded file**. A client can verify that they've received a valid file by calculating the checksum of the downloaded file and asserting that it matches the checksum embedded in the file name: 

 <pre>$ wget --server-response --no-check-certificate 'https://rhel-7-3-pulp-2-12/pulp/docker/v2/ff380cd9-f931-4bc7-9198-eec900f19610/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 
 --2017-03-01 18:18:46--    https://rhel-7-3-pulp-2-12/pulp/docker/v2/ff380cd9-f931-4bc7-9198-eec900f19610/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 
 Resolving rhel-7-3-pulp-2-12... 192.168.100.177 
 Connecting to rhel-7-3-pulp-2-12|192.168.100.177|:443... connected. 
 WARNING: cannot verify rhel-7-3-pulp-2-12's certificate, issued by ‘CN=PulpCA,OU=Development,O=Pulp,L=Raleigh,ST=North Carolina,C=US’: 
   Unable to locally verify the issuer's authority. 
 HTTP request sent, awaiting response...  
   HTTP/1.1 200 OK 
   Date: Wed, 01 Mar 2017 18:18:21 GMT 
   Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5 
   Last-Modified: Wed, 01 Mar 2017 18:12:54 GMT 
   ETag: "20-549af42ef90f3" 
   Accept-Ranges: bytes 
   Content-Length: 32 
   Docker-Distribution-API-Version: registry/2.0 
   Keep-Alive: timeout=5, max=10000 
   Connection: Keep-Alive 
 Length: 32 
 Saving to: ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ 

 sha256:a3ed95caeb02ffe68cdd9fd844 100%[===========================================================>]        32    --.-KB/s      in 0s       

 2017-03-01 18:18:46 (4.23 MB/s) - ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ saved [32/32] 

 $ ls -1 
 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 
 $ file 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'  
 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data 
 $ sha256sum 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'  
 a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4    sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 
 </pre> 

 So far, so good. The trouble is with how RHEL 6 handles these requests. Some background information: 

 * The 'Content-Type' header states the type of the file as requested by the client application. In this case, a client requested a gzip archive. As a result, the <code>Content-Type: application/x-gzip</code> header should be set, or the 'Content-Type' header should be omitted entirely. 
 * The 'Content-Encoding' header states which additional encoding, if any, has been applied **on top of what the client requested**. In this case, any encoding supported by both wget and the server may be applied, but given that the file is already gzip-encoded, it doesn't make sense to further encode the file, and the 'Content-Encoding' header should be omitted. 

 Here's what RHEL 6 actually does: 

 <pre>$ wget --server-response --no-check-certificate 'https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f62ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 
 --2017-03-01 18:24:32--    https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f62ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 
 Resolving rhel-6-8-pulp-2-12... 192.168.100.79 
 Connecting to rhel-6-8-pulp-2-12|192.168.100.79|:443... connected. 
 WARNING: cannot verify rhel-6-8-pulp-2-12's certificate, issued by ‘CN=PulpCA,OU=Development,O=Pulp,L=Raleigh,ST=North Carolina,C=US’: 
   Unable to locally verify the issuer's authority. 
 HTTP request sent, awaiting response...  
   HTTP/1.1 200 OK 
   Date: Wed, 01 Mar 2017 18:24:31 GMT 
   Server: Apache/2.2.15 (Red Hat) 
   Last-Modified: Wed, 01 Mar 2017 18:13:11 GMT 
   ETag: "9ff30-20-549af43f32d1f" 
   Accept-Ranges: bytes 
   Content-Length: 32 
   Docker-Distribution-API-Version: registry/2.0 
   Connection: close 
   Content-Type: text/plain; charset=UTF-8 
   Content-Encoding: x-gzip 
 Length: 32 [text/plain] 
 Saving to: ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ 

 sha256:a3ed95caeb02ffe68cdd9fd844 100%[===========================================================>]        32    --.-KB/s      in 0s       

 2017-03-01 18:24:32 (3.40 MB/s) - ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ saved [32/32] 

 $ ls -1 
 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 
 $ file 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'  
 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data 
 $ sha256sum 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'  
 a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4    sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 
 $ gunzip --to-stdout 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' | sha256sum 
 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef    - 
 </pre> 

 Notice how the 'Content-Encoding' header is set? That's wrong. In this case, wget ignores the header and doesn't gunzip the file before saving it to disk. This behaviour is regressive, and I've gunzipped and checksummed the file to show what will happen with other, more compliant libraries. One such example is Python's "requests", which complies with the 'Content-Encoding' header. For example, check out this simple script: 

 <pre><code class="python">#!/usr/bin/env python3 
 from contextlib import closing 
 import requests 


 def main(): 
     url = ( 
         'https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f6' 
         '2ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb164' 
         '22d00e8a7c22955b46d4' 
     ) 
     with open('blob-decoded', 'wb') as handle: 
         handle.write(requests.get(url, verify=False).content) with closing(requests.get(url, verify=False)) as response: 
             handle.write(response.content) 


 if __name__ == '__main__': 
     exit(main()) 
 </code></pre> 

 The <code>blob-decoded</code> file written to disk is gunzipped, as suggested by the 'Content-Encoding' header: 

 <pre>$ ls -1 
 get_decoded.py 
 $ ./get_decoded.py  
 /usr/lib/python3.6/site-packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings 
   InsecureRequestWarning) 
 $ ls -1 
 blob-decoded 
 get_decoded.py 
 $ file blob-decoded  
 blob-decoded: data 
 $ sha256sum blob-decoded  
 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef    blob-decoded 
 </pre> 

 Notice the checksum of the file? It's the same as in the case where the file is fetched with <code>wget</code> and then manually gunzipped. 

 The long and short of it is that RHEL 6 is incorrectly adding a <code>Content-Encoding: x-gzip</code> header to docker blobs. No other platforms do this. This is true for the current Pulp 2.12 nightlies. Here's the packages installed on my current RHEL 6 test system: 

 <pre>[root@rhel-6-8-pulp-2-12 ~]# rpm -qa | grep -i httpd 
 httpd-tools-2.2.15-56.el6_8.3.x86_64 
 httpd-2.2.15-56.el6_8.3.x86_64 
 [root@rhel-6-8-pulp-2-12 ~]# rpm -qa | grep -i pulp | sort 
 mod_wsgi-3.4-2.pulp.el6.x86_64 
 pulp-admin-client-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 pulp-docker-admin-extensions-2.3.1-0.1.alpha.git.5.052c506.el6.noarch 
 pulp-docker-plugins-2.3.1-0.1.alpha.git.5.052c506.el6.noarch 
 pulp-puppet-admin-extensions-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch 
 pulp-puppet-plugins-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch 
 pulp-python-admin-extensions-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch 
 pulp-python-plugins-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch 
 pulp-rpm-admin-extensions-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch 
 pulp-rpm-plugins-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch 
 pulp-selinux-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 pulp-server-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 python-isodate-0.5.0-4.pulp.el6.noarch 
 python-kombu-3.0.33-6.pulp.el6.noarch 
 python-pulp-bindings-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 python-pulp-client-lib-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 python-pulp-common-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 python-pulp-docker-common-2.3.1-0.1.alpha.git.5.052c506.el6.noarch 
 python-pulp-oid_validation-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 python-pulp-puppet-common-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch 
 python-pulp-python-common-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch 
 python-pulp-repoauth-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 python-pulp-rpm-common-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch 
 python-pulp-streamer-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch 
 </pre>

Back