Project

Profile

Help

Issue #1868

closed

Pulp on RHEL 6 serves wrong files

Added by Ichimonji10 over 8 years ago. Updated over 5 years ago.

Status:
CLOSED - DUPLICATE
Priority:
Normal
Assignee:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version - Docker:
master
Platform Release:
Target Release - Docker:
OS:
RHEL 6
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Summary of issues:

  • On RHEL 6, binary blobs are incorrectly being `gzip` decompressed before being served to the user.
  • On at least Fedora 22 and Fedora 23, the "content-encoding" header is missing. (It should be set to "gzip".)

Let's say you create a Docker repository with an ID and feed, sync it and publish it, with a script like this:

#!/usr/bin/env bash
set -euo pipefail

pulp-admin docker repo create --repo-id ichi10 \
    --feed https://registry-1.docker.io \
    --upstream-name 'library/busybox'
pulp-admin docker repo sync run --repo-id ichi10

This done, it's possible to fetch the published files from Pulp. The published files will include:

  • Tags: '/pulp/docker/v2/ichi10/tags/list'
  • Manifest files: '/pulp/docker/v2/ichi10/manifests/{tag_name}'
  • Binary blobs: '/pulp/docker/v2/ichi10/blobs/{blob_name}'

Here's a Python script that will walk through all of the manifests in the published repository and fetch the blobs listed in each manifest. It depends only on Requests.

#!/usr/bin/env python
from __future__ import print_function, unicode_literals

import hashlib
import requests

KWARGS = {'auth': ('admin', 'admin'), 'verify': False}
PULP = 'https://localhost'

# get tags
path = PULP + '/pulp/docker/v2/ichi10/tags/list'
response = requests.get(path, **KWARGS)
response.raise_for_status()
tags = response.json()['tags']
tags.sort()
print('tags: {0}'.format(tags))

for tag in tags:
    # get manifest
    path = PULP + '/pulp/docker/v2/ichi10/manifests/{0}'.format(tag)
    response = requests.get(path, **KWARGS)
    response.raise_for_status()
    manifest = response.json()

    print()
    print('tag: ' + tag)
    for i, fs_layer in enumerate(manifest['fsLayers']):
        path = PULP + '/pulp/docker/v2/ichi10/blobs/{0}'.format(fs_layer['blobSum'])
        response = requests.get(path, **KWARGS)
        response.raise_for_status()
        checksum = hashlib.sha256(response.content).hexdigest()
        print('fs layer: {0}'.format(i))
        print('advertised checksum: ' + fs_layer['blobSum'])
        print('calculated checksum:        ' + checksum)

        # The following is extremely useful for debugging, but not necessary for
        # initial illustration of the issue.
        # pprint(dict(response.headers))  # add `from pprint import pprint`
        # with open(fs_layer['blobSum'], 'wb') as handle:
        #     handle.write(response.content)

This produces output like the following on RHEL 6:

tags: [u'1', u'1-glibc', u'1-musl', u'1-ubuntu', u'1-uclibc', u'1.21-ubuntu', u'1.21.0-ubuntu', u'1.23', u'1.23.2', u'1.24', u'1.24-glibc', u'1.24-musl', u'1.24-uclibc', u'1.24.0', u'1.24.1', u'1.24.1-glibc', u'1.24.1-musl', u'1.24.1-uclibc', u'1.24.2', u'1.24.2-glibc', u'1.24.2-musl', u'1.24.2-uclibc', u'buildroot-2013.08.1', u'buildroot-2014.02', u'glibc', u'latest', u'musl', u'ubuntu', u'ubuntu-12.04', u'ubuntu-14.04', u'uclibc']

[…]

tag: ubuntu-14.04
fs layer: 0
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
fs layer: 1
advertised checksum: sha256:300273678d063c0a817349518a059c2635fc72f159dd25112ccb92ed5a22ca05
calculated checksum:        5dbcf0efe4f2d6851aed9becc810370b6c7ebf62857dcc2046561bedf59f125a
fs layer: 2
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef

tag: uclibc
fs layer: 0
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
fs layer: 1
advertised checksum: sha256:385e281300cc6d88bdd155e0931fbdfbb1801c2b0265340a40481ee2b733ae66
calculated checksum:        1834950e52ce4d5a88a1bbd131c537f4d0e56d10ff0dd69e66be3b7dfa9df7e6

As you can see, the advertised and calculated checksums differ. That's not good! Let's take a look at uclibc layer 0 across the three platforms.

Fedora 22
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

Fedora 23
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

RHEL 6.7
advertised checksum: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
calculated checksum:        5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef

As you can see, the actual files served up by RHEL 6.7 are wrong. Now, is this because the on-disk files are corrupt? No. The on-disk files are OK.

$ sha256sum $(find / -name 'sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4' 2>/dev/null | sort)
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  /var/lib/pulp/content/units/docker_blob/12/86843087e8774c31f670616e3c7e693a725d5615fe02d1043030e936d0e4f9/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  /var/lib/pulp/published/docker/v2/master/ichi10/1461686630.64/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

This indicates that corruption occurs somewhere after the file is read off disk and before it's delivered to the client. What do the files look like? I downloaded each of these files and placed them into directories:

$ tree
.
├── f22
│   └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
├── f23
│   └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
└── r67
    └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

3 directories, 3 files
$ file */*
f22/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
f23/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: gzip compressed data
r67/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4: data

Aha! Now we're getting somewhere. What happens if I decompress them and calculate their checksums then?

$ gzip --decompress < f22/sha256\:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 > f22/decompressed
$ gzip --decompress < f23/sha256\:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 > f23/decompressed
$ tree
.
├── f22
│   ├── decompressed
│   └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
├── f23
│   ├── decompressed
│   └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
└── r67
    └── sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

3 directories, 5 files
$ sha256sum */* | sort
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  f22/decompressed
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  f23/decompressed
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef  r67/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  f22/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4  f23/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

As you can see, RHEL 6.7 decompresses files off disk before serving them to clients. This is problematic because:

  • I expect Pulp's behaviour to be consistent across platforms. If a gzip file is served up by Fedora 22, Fedora 23 and RHEL 7, then RHEL 6 should also serve up a gzip file.
  • The file name of "sha256:some-checksum" indicates that one can take a sha256 checksum of file file and produce "some-checksum", but that isn't so on RHEL 6.

Now, all this might be excusable if the HTTP headers correctly indicated what's going on. That's not the case - and in fact, all three platforms illustrated here have incorrect content-encoding headers! Check it out:

# Fedora 22
# Should have 'content-encoding': 'gzip'
{'Accept-Ranges': 'bytes',
 'Connection': 'Keep-Alive',
 'Content-Length': '32',
 'Date': 'Tue, 26 Apr 2016 18:55:08 GMT',
 'Docker-Distribution-API-Version': 'registry/2.0',
 'ETag': '"20-53162a9c8674b"',
 'Keep-Alive': 'timeout=5, max=10000',
 'Last-Modified': 'Tue, 26 Apr 2016 12:45:08 GMT',
 'Server': 'Apache/2.4.18 (Fedora) OpenSSL/1.0.1k-fips mod_wsgi/4.4.8 Python/2.7.10'}

# Fedora 23
# should have 'content-encoding': 'gzip'
{'Accept-Ranges': 'bytes',
 'Connection': 'Keep-Alive',
 'Content-Length': '32',
 'Date': 'Tue, 26 Apr 2016 18:55:06 GMT',
 'Docker-Distribution-API-Version': 'registry/2.0',
 'ETag': '"20-53162aa268f5b"',
 'Keep-Alive': 'timeout=5, max=10000',
 'Last-Modified': 'Tue, 26 Apr 2016 12:45:14 GMT',
 'Server': 'Apache/2.4.18 (Fedora) OpenSSL/1.0.2g-fips mod_wsgi/4.4.8 Python/2.7.11'}

# RHEL 6.7
# should not have 'content-encoding' header
{'accept-ranges': 'bytes',
 'connection': 'close',
 'content-encoding': 'gzip',
 'content-length': '32',
 'content-type': 'text/plain; charset=UTF-8',
 'date': 'Tue, 26 Apr 2016 18:55:10 GMT',
 'docker-distribution-api-version': 'registry/2.0',
 'etag': '"41422-20-5316386c0159f"',
 'last-modified': 'Tue, 26 Apr 2016 13:46:55 GMT',
 'server': 'Apache/2.2.15 (Red Hat)'}

Finally, for debugging purposes:

# rpm -qa | grep docker  # fedora 22
python-pulp-docker-common-2.0.1-0.1.beta.fc22.noarch
pulp-docker-plugins-2.0.1-0.1.beta.fc22.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.fc22.noarch

# rpm -qa | grep docker  # fedora 23
pulp-docker-plugins-2.0.1-0.1.beta.fc23.noarch
python-pulp-docker-common-2.0.1-0.1.beta.fc23.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.fc23.noarch

$ rpm -qa | grep docker  # rhel 6
python-pulp-docker-common-2.0.1-0.1.beta.el6.noarch
pulp-docker-plugins-2.0.1-0.1.beta.el6.noarch
pulp-docker-admin-extensions-2.0.1-0.1.beta.el6.noarch
[jenkins@rhel6-vanilla-np-qeos-104560 ~]$ 

Related issues

Is duplicate of Pulp - Issue #1781: Files ending in .gz are delivered with incorrect content headersCLOSED - CURRENTRELEASEsemyersActions
Actions #1

Updated by Ichimonji10 over 8 years ago

The title should read "Pulp Docker on RHEL 6 serves wrong files". I don't have permissions to edit the issue title.

Actions #2

Updated by Ichimonji10 over 8 years ago

That "content-encoding" comment at the top of the bug report might be a little obscure. To state things a little differently, RHEL 6 does set a "content-encoding: gzip" header even though the served-up content is not compressed. which causes the receiving client to gunzip the response.

Actions #3

Updated by dkliban@redhat.com over 8 years ago

  • Is duplicate of Issue #1781: Files ending in .gz are delivered with incorrect content headers added
Actions #4

Updated by dkliban@redhat.com over 8 years ago

  • Status changed from NEW to CLOSED - DUPLICATE
  • Triaged changed from No to Yes
Actions #5

Updated by mhrivnak over 8 years ago

FWIW it was actually the "requests" library on the client side doing the decompressing. This was the correct action for it to take, based on the server incorrectly setting the Content-Encoding header.

Actions #6

Updated by Ichimonji10 over 8 years ago

Understood. Makes sense.

Actions #7

Updated by bmbouter over 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF