Issue #2618
closed"blob" files are delivered with incorrect content headers
Description
Related to https://pulp.plan.io/issues/1781.
Let's say you create, populate and publish a docker repository. That done, "blob" files will be available at paths in the form /pulp/docker/v2/{repo_id}/blobs/{blob_sum}
. As a concrete example, one repository I worked with made the following URLs available:
https://rhel-6-8-pulp-2-12/pulp/docker/v2/8f12187a-8f95-488c-8f1b-c627f404f809/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
https://rhel-6-8-pulp-2-12/pulp/docker/v2/8f12187a-8f95-488c-8f1b-c627f404f809/blobs/sha256:ffc8a12d3678ba8f82b54c3a9ca8260f56ce4be47748743658d89d8f39e80a04
These "blob" files are gzip-encoded binary files. When a client requests a blob, they expect to receive a gzip-encoded file. A client can verify that they've received a valid file by calculating the checksum of the downloaded file and asserting that it matches the checksum embedded in the file name:
$ wget --server-response --no-check-certificate 'https://rhel-7-3-pulp-2-12/pulp/docker/v2/ff380cd9-f931-4bc7-9198-eec900f19610/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'
--2017-03-01 18:18:46-- https://rhel-7-3-pulp-2-12/pulp/docker/v2/ff380cd9-f931-4bc7-9198-eec900f19610/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Resolving rhel-7-3-pulp-2-12... 192.168.100.177
Connecting to rhel-7-3-pulp-2-12|192.168.100.177|:443... connected.
WARNING: cannot verify rhel-7-3-pulp-2-12's certificate, issued by ‘CN=PulpCA,OU=Development,O=Pulp,L=Raleigh,ST=North Carolina,C=US’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 01 Mar 2017 18:18:21 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5
Last-Modified: Wed, 01 Mar 2017 18:12:54 GMT
ETag: "20-549af42ef90f3"
Accept-Ranges: bytes
Content-Length: 32
Docker-Distribution-API-Version: registry/2.0
Keep-Alive: timeout=5, max=10000
Connection: Keep-Alive
Length: 32
Saving to: ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’
sha256:a3ed95caeb02ffe68cdd9fd844 100%[===========================================================>] 32 --.-KB/s in 0s
2017-03-01 18:18:46 (4.23 MB/s) - ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ saved [32/32]
$ ls -1
sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
$ file 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'
sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
$ sha256sum 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
So far, so good. The trouble is with how RHEL 6 handles these requests. Some background information:
- The 'Content-Type' header states the type of the file as requested by the client application. In this case, a client requested a gzip archive. As a result, the
Content-Type: application/x-gzip
header should be set, or the 'Content-Type' header should be omitted entirely. - The 'Content-Encoding' header states which additional encoding, if any, has been applied on top of what the client requested. In this case, any encoding supported by both wget and the server may be applied, but given that the file is already gzip-encoded, it doesn't make sense to further encode the file, and the 'Content-Encoding' header should be omitted.
Here's what RHEL 6 actually does:
$ wget --server-response --no-check-certificate 'https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f62ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'
--2017-03-01 18:24:32-- https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f62ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Resolving rhel-6-8-pulp-2-12... 192.168.100.79
Connecting to rhel-6-8-pulp-2-12|192.168.100.79|:443... connected.
WARNING: cannot verify rhel-6-8-pulp-2-12's certificate, issued by ‘CN=PulpCA,OU=Development,O=Pulp,L=Raleigh,ST=North Carolina,C=US’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 01 Mar 2017 18:24:31 GMT
Server: Apache/2.2.15 (Red Hat)
Last-Modified: Wed, 01 Mar 2017 18:13:11 GMT
ETag: "9ff30-20-549af43f32d1f"
Accept-Ranges: bytes
Content-Length: 32
Docker-Distribution-API-Version: registry/2.0
Connection: close
Content-Type: text/plain; charset=UTF-8
Content-Encoding: x-gzip
Length: 32 [text/plain]
Saving to: ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’
sha256:a3ed95caeb02ffe68cdd9fd844 100%[===========================================================>] 32 --.-KB/s in 0s
2017-03-01 18:24:32 (3.40 MB/s) - ‘sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4’ saved [32/32]
$ ls -1
sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
$ file 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'
sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
$ sha256sum 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4'
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
$ gunzip --to-stdout 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' | sha256sum
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef -
Notice how the 'Content-Encoding' header is set? That's wrong. In this case, wget ignores the header and doesn't gunzip the file before saving it to disk. This behaviour is regressive, and I've gunzipped and checksummed the file to show what will happen with other, more compliant libraries. One such example is Python's "requests", which complies with the 'Content-Encoding' header. For example, check out this simple script:
#!/usr/bin/env python3
import requests
def main():
url = (
'https://rhel-6-8-pulp-2-12/pulp/docker/v2/d8d948e9-e87d-4fa9-be83-f6'
'2ba91210b8/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb164'
'22d00e8a7c22955b46d4'
)
with open('blob-decoded', 'wb') as handle:
handle.write(requests.get(url, verify=False).content)
if __name__ == '__main__':
exit(main())
The blob-decoded
file written to disk is gunzipped, as suggested by the 'Content-Encoding' header:
$ ls -1
get_decoded.py
$ ./get_decoded.py
/usr/lib/python3.6/site-packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
$ ls -1
blob-decoded
get_decoded.py
$ file blob-decoded
blob-decoded: data
$ sha256sum blob-decoded
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef blob-decoded
Notice the checksum of the file? It's the same as in the case where the file is fetched with wget
and then manually gunzipped.
The long and short of it is that RHEL 6 is incorrectly adding a Content-Encoding: x-gzip
header to docker blobs. No other platforms do this. This is true for the current Pulp 2.12 nightlies. Here's the packages installed on my current RHEL 6 test system:
[root@rhel-6-8-pulp-2-12 ~]# rpm -qa | grep -i httpd
httpd-tools-2.2.15-56.el6_8.3.x86_64
httpd-2.2.15-56.el6_8.3.x86_64
[root@rhel-6-8-pulp-2-12 ~]# rpm -qa | grep -i pulp | sort
mod_wsgi-3.4-2.pulp.el6.x86_64
pulp-admin-client-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
pulp-docker-admin-extensions-2.3.1-0.1.alpha.git.5.052c506.el6.noarch
pulp-docker-plugins-2.3.1-0.1.alpha.git.5.052c506.el6.noarch
pulp-puppet-admin-extensions-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch
pulp-puppet-plugins-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch
pulp-python-admin-extensions-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch
pulp-python-plugins-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch
pulp-rpm-admin-extensions-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch
pulp-rpm-plugins-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch
pulp-selinux-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
pulp-server-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-isodate-0.5.0-4.pulp.el6.noarch
python-kombu-3.0.33-6.pulp.el6.noarch
python-pulp-bindings-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-client-lib-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-common-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-docker-common-2.3.1-0.1.alpha.git.5.052c506.el6.noarch
python-pulp-oid_validation-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-puppet-common-2.12.2-0.1.alpha.git.2.f338f5d.el6.noarch
python-pulp-python-common-2.0.1-0.1.alpha.git.6.8c46f3f.el6.noarch
python-pulp-repoauth-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
python-pulp-rpm-common-2.12.2-0.1.alpha.git.19.da51b5f.el6.noarch
python-pulp-streamer-2.12.2-0.1.alpha.git.17.b101ff0.el6.noarch
Updated by pthomas@redhat.com over 7 years ago
Adding the smash test from
https://pulp.plan.io/issues/1781 to cover this issue as well.
Updated by bmbouter over 5 years ago
- Status changed from NEW to CLOSED - WONTFIX
Updated by bmbouter over 5 years ago
Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.