Project

Profile

Help

Issue #6831

closed

Content App incorrectly sets "content-encoding" headers

Added by dalley almost 4 years ago. Updated almost 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

Some background: https://docs.microsoft.com/en-us/archive/blogs/wndp/content-encoding-content-type

Python package source distributions use the .tar.gz file extension. When the pulp content app sees this extension it automatically [0] sets the content-encoding header with value "gzip", which is inappropriate. As described in the article:

Content-Encoding is used solely to specify any additional encoding done by the server before the content was transmitted to the client. Although the HTTP RFC outlines these rules pretty clearly, some web sites respond with "gzip" as the Content-Encoding even though the server has not gzipped the content.

Our testing has shown this problem to be limited to some sites that serve Unix/Linux style "tarball" files. Tarballs are gzip compressed archives files. By setting the Content-Encoding header to "gzip" on a tarball, the server is specifying that it has additionally gzipped the gzipped file. This, of course, is unlikely but not impossible or non-compliant.

Unlike the provided example which is double-gzipped, what is happening here is that we are telling the client we have gzipped a (.tar.gz) file which we have not gzipped, thus, the client tries to automatically decode the response by un-gzipping it rather than accepting it unmodified.

Because we are serving unmodified files, we should not be setting the content-encoding header.

[0] https://github.com/pulp/pulpcore/blob/44bb0623a01b7578e2ec442844c7f5849754b237/pulpcore/content/handler.py#L218

Also available in: Atom PDF