Issue #8875
closedpulp fails downloading packages with special symbols from Amazon Linux repositories
Description
While syncing Amazon Linux repository, pulp 2 (katello 3.15) and pulp 3 (katello 4) fail to download packages with special symbols, like "+":
Jun 6 11:57:44 foreman pulpcore-worker-1[19215]: aiohttp.client_exceptions.ClientResponseError: 403, message='Forbidden', url=URL('http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm')
I have opened case with AWS and they told me that S3 backend requires special symbols to be encoded: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines-special-handling
Characters that might require special handling
The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX. Some of these are non-printable characters that your browser might not handle, which also requires special handling:
Ampersand ("&")
Dollar ("$")
ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)
'At' symbol ("@")
Equals ("=")
Semicolon (";")
Colon (":")
Plus ("+")
Space – Significant sequences of spaces might be lost in some uses (especially multiple spaces)
Comma (",")
Question mark ("?")
How to reproduce: Add Amazon Core 2 repository: http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list
Related issues
Updated by vchepkov over 3 years ago
Related to https://pulp.plan.io/issues/8873 and https://pulp.plan.io/issues/7995
Updated by ggainey over 3 years ago
Fun.
The URL specified above does fail with a 403. Urlencoding the RPM name does work:
~/Downloads $ wget http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc%2B%2B-7.2.1-2.amzn2.0.1.x86_64.rpm
--2021-06-10 14:54:03-- http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc%2B%2B-7.2.1-2.amzn2.0.1.x86_64.rpm
Resolving amazonlinux.us-east-1.amazonaws.com (amazonlinux.us-east-1.amazonaws.com)... 2600:1fa0:8041:4251:34d8:8c8e::, 52.217.105.46
Connecting to amazonlinux.us-east-1.amazonaws.com (amazonlinux.us-east-1.amazonaws.com)|2600:1fa0:8041:4251:34d8:8c8e::|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 447944 (437K) [binary/octet-stream]
Saving to: ‘libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm.1’
libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm.1 100%[====================================================================================================================================================================================================>] 437.45K --.-KB/s in 0.04s
2021-06-10 14:54:03 (9.75 MB/s) - ‘libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm.1’ saved [447944/447944]
~/Downloads $
No other RPM repository has this issue (Fedora, RHEL, CentOS, SUSE) - altho a quick test suggests that Fedora, at least, would continue to work in the face of urlencoded-rpm-names.
Updated by vchepkov over 3 years ago
The code that yum uses seems to be handling URL properly as well: I tried this from CentOS 7.9
# yum install --downloadonly --skip-broken http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm
libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm | 437 kB 00:00:00
Examining /var/tmp/yum-root-tNxpzd/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm: libstdc++-7.2.1-2.amzn2.0.1.x86_64
Marking /var/tmp/yum-root-tNxpzd/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm as an update to libstdc++-4.8.5-44.el7.x86_64
Marking /var/tmp/yum-root-tNxpzd/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm as an update to libstdc++-4.8.5-44.el7.i686
# ls -lAtr /var/tmp/yum-root-tNxpzd/
total 444
-rw-r--r-- 1 root root 447944 Nov 28 2017 libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm
Updated by vchepkov over 3 years ago
I suppose yum uses urlgrabber?
# urlgrabber -v http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm
grabbing: http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm
# ls -l libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm
-rw-r--r-- 1 root root 447944 Nov 28 2017 libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm
Updated by dalley over 3 years ago
- Category deleted (
Operator - Moved to Github Issues)
Updated by dkliban@redhat.com over 3 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 100
Updated by ipanova@redhat.com over 3 years ago
- Sprint changed from Sprint 101 to Sprint 102
Updated by pulpbot over 3 years ago
- Status changed from NEW to POST
Updated by ggainey over 3 years ago
- Project changed from Pulp to RPM Support
- Assignee set to ggainey
Updated by ggainey over 3 years ago
The PR I just created, fails in CI on regular-test, but passes on S3. Failure-mode is "something results in double-encode of the URL" (which of course fails).
Reset dev-env to master/core master/pulp_rpm (ie, without the associated PR) and ran the following on my dev-machine:
pulp rpm remote create --name az --url "http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list" --policy on_demand
pulp rpm repository create --name az --remote az --autopublish
pulp rpm repository sync --name az
On core/master and rpm/master, running on CentOS7, this sync works. The actual mirror selected can be seen here:
Aug 02 21:44:22 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com pulpcore-worker[8400]: pulp [035bb16793044aa4824d7a23d8978648]: pulp_rpm.app.tasks.synchronizing:INFO: Using url 'http://amazonlinux.us-east-1.amazonaws.com/2/core/2.0/x86_64/0c4b5094bba8d46b07c60e3d85cd8baac5f75d07af6a33086b6d0cd9eb2e13f1' from mirrorlist in place of the provided url http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list
Without the PR submitted, Amazon-2 syncs...fine?
Updated by vchepkov over 3 years ago
Most likely on_demand
policy is hiding the problem. it downloads only metadata, right?
Updated by ggainey over 3 years ago
vchepkov wrote:
Most likely
on_demand
policy is hiding the problem. it downloads only metadata, right?
Exactly, I was just coming here to smack myself in the forehead. Continuing investigation into why this PR passes CI for S3 and fails non-S3.
Updated by dalley over 3 years ago
- Copied to Backport #9198: Backport #8875 "pulp fails downloading packages with special symbols from Amazon Linux repositories" to 3.14.z added
Added by ggainey over 3 years ago
Updated by ggainey over 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset db55c27a9e78a18b64849915bc33628c8a9456fd.
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Taught downloader to url-encode so odd backends can find content.
Amazon (for example) will reject RPMs with '+' in the filename if it's not url-encoded.
fixes #8875.