Project

Profile

Help

Issue #1868

closed

Pulp on RHEL 6 serves wrong files

Added by Ichimonji10 over 8 years ago. Updated over 5 years ago.

Status:
CLOSED - DUPLICATE
Priority:
Normal
Assignee:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version - Docker:
master
Platform Release:
Target Release - Docker:
OS:
RHEL 6
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Summary of issues:

  • On RHEL 6, binary blobs are incorrectly being `gzip` decompressed before being served to the user.
  • On at least Fedora 22 and Fedora 23, the "content-encoding" header is missing. (It should be set to "gzip".)

Let's say you create a Docker repository with an ID and feed, sync it and publish it, with a script like this:

#!/usr/bin/env bash
set -euo pipefail

pulp-admin docker repo create --repo-id ichi10 \
    --feed https://registry-1.docker.io \
    --upstream-name 'library/busybox'
pulp-admin docker repo sync run --repo-id ichi10

This done, it's possible to fetch the published files from Pulp. The published files will include:

  • Tags: '/pulp/docker/v2/ichi10/tags/list'
  • Manifest files: '/pulp/docker/v2/ichi10/manifests/{tag_name}'
  • Binary blobs: '/pulp/docker/v2/ichi10/blobs/{blob_name}'

Here's a Python script that will walk through all of the manifests in the published repository and fetch the blobs listed in each manifest. It depends only on Requests.

#!/usr/bin/env python
from __future__ import print_function, unicode_literals

import hashlib
import requests

KWARGS = {'auth': ('admin', 'admin'), 'verify': False}
PULP = 'https://localhost'

# get tags
path = PULP + '/pulp/docker/v2/ichi10/tags/list'
response = requests.get(path, **KWARGS)
response.raise_for_status()
tags = response.json()['tags']
tags.sort()
print('tags: {0}'.format(tags))

for tag in tags:
    # get manifest
    path = PULP + '/pulp/docker/v2/ichi10/manifests/{0}'.format(tag)
    response = requests.get(path, **KWARGS)
    response.raise_for_status()
    manifest = response.json()

    print()
    print('tag: ' + tag)
    for i, fs_layer in enumerate(manifest['fsLayers']):
        path = PULP + '/pulp/docker/v2/ichi10/blobs/{0}'.format(fs_layer['blobSum'])
        response = requests.get(path, **KWARGS)
        response.raise_for_status()
        checksum = hashlib.sha256(response.content).hexdigest()
        print('fs layer: {0}'.format(i))
        print('advertised checksum: ' + fs_layer['blobSum'])
        print('calculated checksum:        ' + checksum)

        # The following is extremely useful for debugging, but not necessary for
        # initial illustration of the issue.
        # pprint(dict(response.headers))  # add `from pprint import pprint`
        # with open(fs_layer['blobSum'], 'wb') as handle:
        #     handle.write(response.content)

This produces output like the following on RHEL 6:

tags: [u'1', u'1-glibc', u'1-musl', u'1-ubuntu', u'1-uclibc', u'1.21-ubuntu', u'1.21.0-ubuntu', u'1.23', u'1.23.2', u'1.24', u'1.24-glibc', u'1.24-musl', u'1.24-uclibc', u'1.24.0', u'1.24.1', u'1.24.1-glibc', u'1.24.1-musl', u'1.24.1-uclibc', u'1.24.2', u'1.24.2-glibc', u'1.24.2-musl', u'1.24.2-uclibc', u'buildroot-2013.08.1', u'buildroot-2014.02', u'glibc', u'latest', u'musl', u'ubuntu', u'ubuntu-12.04', u'ubuntu-14.04', u'uclibc']

[…]

tag: ubuntu-14.04
fs layer: 0
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
fs layer: 1
advertised checksum: sha256:300273678d063c0a817349518a059c2635fc72f159dd25112ccb92ed5a22ca05
calculated checksum:        5dbcf0efe4f2d6851aed9becc810370b6c7ebf62857dcc2046561bedf59f125a
fs layer: 2
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef

tag: uclibc
fs layer: 0
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
fs layer: 1
advertised checksum: sha256:385e281300cc6d88bdd155e0931fbdfbb1801c2b0265340a40481ee2b733ae66
calculated checksum:        1834950e52ce4d5a88a1bbd131c537f4d0e56d10ff0dd69e66be3b7dfa9df7e6

As you can see, the advertised and calculated checksums differ. That's not good! Let's take a look at uclibc layer 0 across the three platforms.

Fedora 22
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

Fedora 23
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

RHEL 6.7
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef

As you can see, the actual files served up by RHEL 6.7 are wrong. Now, is this because the on-disk files are corrupt? No. The on-disk files are OK.

$ sha256sum $(find / -name 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 2>/dev/null | sort)
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  /var/lib/pulp/content/units/docker_blob/12/86843087e8774c31f670616e3c7e693a725d5615fe02d1043030e936d0e4f9/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  /var/lib/pulp/published/docker/v2/master/ichi10/1461686630.64/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

This indicates that corruption occurs somewhere after the file is read off disk and before it's delivered to the client. What do the files look like? I downloaded each of these files and placed them into directories:

$ tree
.
├── f22
│   └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
├── f23
│   └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
└── r67
    └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

3 directories, 3 files
$ file */*
f22/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
f23/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
r67/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: data

Aha! Now we're getting somewhere. What happens if I decompress them and calculate their checksums then?

$ gzip --decompress < f22/sha256\:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 > f22/decompressed
$ gzip --decompress < f23/sha256\:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 > f23/decompressed
$ tree
.
├── f22
│   ├── decompressed
│   └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
├── f23
│   ├── decompressed
│   └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
└── r67
    └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

3 directories, 5 files
$ sha256sum */* | sort
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  f22/decompressed
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  f23/decompressed
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  r67/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  f22/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  f23/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

As you can see, RHEL 6.7 decompresses files off disk before serving them to clients. This is problematic because:

  • I expect Pulp's behaviour to be consistent across platforms. If a gzip file is served up by Fedora 22, Fedora 23 and RHEL 7, then RHEL 6 should also serve up a gzip file.
  • The file name of "sha256:some-checksum" indicates that one can take a sha256 checksum of file file and produce "some-checksum", but that isn't so on RHEL 6.

Now, all this might be excusable if the HTTP headers correctly indicated what's going on. That's not the case - and in fact, all three platforms illustrated here have incorrect content-encoding headers! Check it out:

# Fedora 22
# Should have 'content-encoding': 'gzip'
{'Accept-Ranges': 'bytes',
 'Connection': 'Keep-Alive',
 'Content-Length': '32',
 'Date': 'Tue, 26 Apr 2016 18:55:08 GMT',
 'Docker-Distribution-API-Version': 'registry/2.0',
 'ETag': '"20-53162a9c8674b"',
 'Keep-Alive': 'timeout=5, max=10000',
 'Last-Modified': 'Tue, 26 Apr 2016 12:45:08 GMT',
 'Server': 'Apache/2.4.18 (Fedora) OpenSSL/1.0.1k-fips mod_wsgi/4.4.8 Python/2.7.10'}

# Fedora 23
# should have 'content-encoding': 'gzip'
{'Accept-Ranges': 'bytes',
 'Connection': 'Keep-Alive',
 'Content-Length': '32',
 'Date': 'Tue, 26 Apr 2016 18:55:06 GMT',
 'Docker-Distribution-API-Version': 'registry/2.0',
 'ETag': '"20-53162aa268f5b"',
 'Keep-Alive': 'timeout=5, max=10000',
 'Last-Modified': 'Tue, 26 Apr 2016 12:45:14 GMT',
 'Server': 'Apache/2.4.18 (Fedora) OpenSSL/1.0.2g-fips mod_wsgi/4.4.8 Python/2.7.11'}

# RHEL 6.7
# should not have 'content-encoding' header
{'accept-ranges': 'bytes',
 'connection': 'close',
 'content-encoding': 'gzip',
 'content-length': '32',
 'content-type': 'text/plain; charset=UTF-8',
 'date': 'Tue, 26 Apr 2016 18:55:10 GMT',
 'docker-distribution-api-version': 'registry/2.0',
 'etag': '"41422-20-5316386c0159f"',
 'last-modified': 'Tue, 26 Apr 2016 13:46:55 GMT',
 'server': 'Apache/2.2.15 (Red Hat)'}

Finally, for debugging purposes:

# rpm -qa | grep docker  # fedora 22
python-pulp-docker-common-2.0.1-0.1.beta.fc22.noarch
pulp-docker-plugins-2.0.1-0.1.beta.fc22.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.fc22.noarch

# rpm -qa | grep docker  # fedora 23
pulp-docker-plugins-2.0.1-0.1.beta.fc23.noarch
python-pulp-docker-common-2.0.1-0.1.beta.fc23.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.fc23.noarch

$ rpm -qa | grep docker  # rhel 6
python-pulp-docker-common-2.0.1-0.1.beta.el6.noarch
pulp-docker-plugins-2.0.1-0.1.beta.el6.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.el6.noarch
[jenkins@rhel6-vanilla-np-qeos-104560 ~]$ 

Related issues

Is duplicate of Pulp - Issue #1781: Files ending in .gz are delivered with incorrect content headersCLOSED - CURRENTRELEASEsemyersActions

Also available in: Atom PDF