Issue #1868
closedPulp on RHEL 6 serves wrong files
Description
Summary of issues:
- On RHEL 6, binary blobs are incorrectly being `gzip` decompressed before being served to the user.
- On at least Fedora 22 and Fedora 23, the "content-encoding" header is missing. (It should be set to "gzip".)
Let's say you create a Docker repository with an ID and feed, sync it and publish it, with a script like this:
#!/usr/bin/env bash
set -euo pipefail
pulp-admin docker repo create --repo-id ichi10 \
--feed https://registry-1.docker.io \
--upstream-name 'library/busybox'
pulp-admin docker repo sync run --repo-id ichi10
This done, it's possible to fetch the published files from Pulp. The published files will include:
- Tags:
'/pulp/docker/v2/ichi10/tags/list'
- Manifest files:
'/pulp/docker/v2/ichi10/manifests/{tag_name}'
- Binary blobs:
'/pulp/docker/v2/ichi10/blobs/{blob_name}'
Here's a Python script that will walk through all of the manifests in the published repository and fetch the blobs listed in each manifest. It depends only on Requests.
#!/usr/bin/env python
from __future__ import print_function, unicode_literals
import hashlib
import requests
KWARGS = {'auth': ('admin', 'admin'), 'verify': False}
PULP = 'https://localhost'
# get tags
path = PULP + '/pulp/docker/v2/ichi10/tags/list'
response = requests.get(path, **KWARGS)
response.raise_for_status()
tags = response.json()['tags']
tags.sort()
print('tags: {0}'.format(tags))
for tag in tags:
# get manifest
path = PULP + '/pulp/docker/v2/ichi10/manifests/{0}'.format(tag)
response = requests.get(path, **KWARGS)
response.raise_for_status()
manifest = response.json()
print()
print('tag: ' + tag)
for i, fs_layer in enumerate(manifest['fsLayers']):
path = PULP + '/pulp/docker/v2/ichi10/blobs/{0}'.format(fs_layer['blobSum'])
response = requests.get(path, **KWARGS)
response.raise_for_status()
checksum = hashlib.sha256(response.content).hexdigest()
print('fs layer: {0}'.format(i))
print('advertised checksum: ' + fs_layer['blobSum'])
print('calculated checksum: ' + checksum)
# The following is extremely useful for debugging, but not necessary for
# initial illustration of the issue.
# pprint(dict(response.headers)) # add `from pprint import pprint`
# with open(fs_layer['blobSum'], 'wb') as handle:
# handle.write(response.content)
This produces output like the following on RHEL 6:
tags: [u'1', u'1-glibc', u'1-musl', u'1-ubuntu', u'1-uclibc', u'1.21-ubuntu', u'1.21.0-ubuntu', u'1.23', u'1.23.2', u'1.24', u'1.24-glibc', u'1.24-musl', u'1.24-uclibc', u'1.24.0', u'1.24.1', u'1.24.1-glibc', u'1.24.1-musl', u'1.24.1-uclibc', u'1.24.2', u'1.24.2-glibc', u'1.24.2-musl', u'1.24.2-uclibc', u'buildroot-2013.08.1', u'buildroot-2014.02', u'glibc', u'latest', u'musl', u'ubuntu', u'ubuntu-12.04', u'ubuntu-14.04', u'uclibc']
[…]
tag: ubuntu-14.04
fs layer: 0
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum: 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
fs layer: 1
advertised checksum: sha256:300273678d063c0a817349518a059c2635fc72f159dd25112ccb92ed5a22ca05
calculated checksum: 5dbcf0efe4f2d6851aed9becc810370b6c7ebf62857dcc2046561bedf59f125a
fs layer: 2
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum: 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
tag: uclibc
fs layer: 0
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum: 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
fs layer: 1
advertised checksum: sha256:385e281300cc6d88bdd155e0931fbdfbb1801c2b0265340a40481ee2b733ae66
calculated checksum: 1834950e52ce4d5a88a1bbd131c537f4d0e56d10ff0dd69e66be3b7dfa9df7e6
As you can see, the advertised and calculated checksums differ. That's not good! Let's take a look at uclibc layer 0 across the three platforms.
Fedora 22
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum: a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Fedora 23
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum: a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
RHEL 6.7
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum: 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
As you can see, the actual files served up by RHEL 6.7 are wrong. Now, is this because the on-disk files are corrupt? No. The on-disk files are OK.
$ sha256sum $(find / -name 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 2>/dev/null | sort)
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 /var/lib/pulp/content/units/docker_blob/12/86843087e8774c31f670616e3c7e693a725d5615fe02d1043030e936d0e4f9/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 /var/lib/pulp/published/docker/v2/master/ichi10/1461686630.64/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
This indicates that corruption occurs somewhere after the file is read off disk and before it's delivered to the client. What do the files look like? I downloaded each of these files and placed them into directories:
$ tree
.
├── f22
│ └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
├── f23
│ └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
└── r67
└── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
3 directories, 3 files
$ file */*
f22/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
f23/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
r67/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: data
Aha! Now we're getting somewhere. What happens if I decompress them and calculate their checksums then?
$ gzip --decompress < f22/sha256\:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 > f22/decompressed
$ gzip --decompress < f23/sha256\:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 > f23/decompressed
$ tree
.
├── f22
│ ├── decompressed
│ └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
├── f23
│ ├── decompressed
│ └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
└── r67
└── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
3 directories, 5 files
$ sha256sum */* | sort
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef f22/decompressed
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef f23/decompressed
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef r67/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 f22/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 f23/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
As you can see, RHEL 6.7 decompresses files off disk before serving them to clients. This is problematic because:
- I expect Pulp's behaviour to be consistent across platforms. If a gzip file is served up by Fedora 22, Fedora 23 and RHEL 7, then RHEL 6 should also serve up a gzip file.
- The file name of "sha256:some-checksum" indicates that one can take a sha256 checksum of file file and produce "some-checksum", but that isn't so on RHEL 6.
Now, all this might be excusable if the HTTP headers correctly indicated what's going on. That's not the case - and in fact, all three platforms illustrated here have incorrect content-encoding headers! Check it out:
# Fedora 22
# Should have 'content-encoding': 'gzip'
{'Accept-Ranges': 'bytes',
'Connection': 'Keep-Alive',
'Content-Length': '32',
'Date': 'Tue, 26 Apr 2016 18:55:08 GMT',
'Docker-Distribution-API-Version': 'registry/2.0',
'ETag': '"20-53162a9c8674b"',
'Keep-Alive': 'timeout=5, max=10000',
'Last-Modified': 'Tue, 26 Apr 2016 12:45:08 GMT',
'Server': 'Apache/2.4.18 (Fedora) OpenSSL/1.0.1k-fips mod_wsgi/4.4.8 Python/2.7.10'}
# Fedora 23
# should have 'content-encoding': 'gzip'
{'Accept-Ranges': 'bytes',
'Connection': 'Keep-Alive',
'Content-Length': '32',
'Date': 'Tue, 26 Apr 2016 18:55:06 GMT',
'Docker-Distribution-API-Version': 'registry/2.0',
'ETag': '"20-53162aa268f5b"',
'Keep-Alive': 'timeout=5, max=10000',
'Last-Modified': 'Tue, 26 Apr 2016 12:45:14 GMT',
'Server': 'Apache/2.4.18 (Fedora) OpenSSL/1.0.2g-fips mod_wsgi/4.4.8 Python/2.7.11'}
# RHEL 6.7
# should not have 'content-encoding' header
{'accept-ranges': 'bytes',
'connection': 'close',
'content-encoding': 'gzip',
'content-length': '32',
'content-type': 'text/plain; charset=UTF-8',
'date': 'Tue, 26 Apr 2016 18:55:10 GMT',
'docker-distribution-api-version': 'registry/2.0',
'etag': '"41422-20-5316386c0159f"',
'last-modified': 'Tue, 26 Apr 2016 13:46:55 GMT',
'server': 'Apache/2.2.15 (Red Hat)'}
Finally, for debugging purposes:
# rpm -qa | grep docker # fedora 22
python-pulp-docker-common-2.0.1-0.1.beta.fc22.noarch
pulp-docker-plugins-2.0.1-0.1.beta.fc22.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.fc22.noarch
# rpm -qa | grep docker # fedora 23
pulp-docker-plugins-2.0.1-0.1.beta.fc23.noarch
python-pulp-docker-common-2.0.1-0.1.beta.fc23.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.fc23.noarch
$ rpm -qa | grep docker # rhel 6
python-pulp-docker-common-2.0.1-0.1.beta.el6.noarch
pulp-docker-plugins-2.0.1-0.1.beta.el6.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.el6.noarch
[jenkins@rhel6-vanilla-np-qeos-104560 ~]$
Related issues