Project

Profile

Help

Issue #8875

closed

pulp fails downloading packages with special symbols from Amazon Linux repositories

Added by vchepkov over 3 years ago. Updated about 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 102
Quarter:

Description

While syncing Amazon Linux repository, pulp 2 (katello 3.15) and pulp 3 (katello 4) fail to download packages with special symbols, like "+":

Jun  6 11:57:44 foreman pulpcore-worker-1[19215]: aiohttp.client_exceptions.ClientResponseError: 403, message='Forbidden', url=URL('http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm')

I have opened case with AWS and they told me that S3 backend requires special symbols to be encoded: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines-special-handling

Characters that might require special handling

The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX. Some of these are non-printable characters that your browser might not handle, which also requires special handling:

Ampersand ("&")
Dollar ("$")
ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)
'At' symbol ("@")
Equals ("=")
Semicolon (";")
Colon (":")
Plus ("+")
Space – Significant sequences of spaces might be lost in some uses (especially multiple spaces)
Comma (",")
Question mark ("?")

How to reproduce: Add Amazon Core 2 repository: http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list


Related issues

Copied to RPM Support - Backport #9198: Backport #8875 "pulp fails downloading packages with special symbols from Amazon Linux repositories" to 3.14.zCLOSED - CURRENTRELEASE

Actions
Actions #2

Updated by ggainey over 3 years ago

Fun.

The URL specified above does fail with a 403. Urlencoding the RPM name does work:

~/Downloads $ wget http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc%2B%2B-7.2.1-2.amzn2.0.1.x86_64.rpm
--2021-06-10 14:54:03--  http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc%2B%2B-7.2.1-2.amzn2.0.1.x86_64.rpm
Resolving amazonlinux.us-east-1.amazonaws.com (amazonlinux.us-east-1.amazonaws.com)... 2600:1fa0:8041:4251:34d8:8c8e::, 52.217.105.46
Connecting to amazonlinux.us-east-1.amazonaws.com (amazonlinux.us-east-1.amazonaws.com)|2600:1fa0:8041:4251:34d8:8c8e::|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 447944 (437K) [binary/octet-stream]
Saving to: ‘libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm.1’

libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm.1                                        100%[====================================================================================================================================================================================================>] 437.45K  --.-KB/s    in 0.04s   

2021-06-10 14:54:03 (9.75 MB/s) - ‘libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm.1’ saved [447944/447944]

~/Downloads $ 

No other RPM repository has this issue (Fedora, RHEL, CentOS, SUSE) - altho a quick test suggests that Fedora, at least, would continue to work in the face of urlencoded-rpm-names.

Actions #3

Updated by vchepkov over 3 years ago

The code that yum uses seems to be handling URL properly as well: I tried this from CentOS 7.9

# yum install --downloadonly --skip-broken http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm

libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm                                                                                                                                                   | 437 kB  00:00:00     
Examining /var/tmp/yum-root-tNxpzd/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm: libstdc++-7.2.1-2.amzn2.0.1.x86_64
Marking /var/tmp/yum-root-tNxpzd/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm as an update to libstdc++-4.8.5-44.el7.x86_64
Marking /var/tmp/yum-root-tNxpzd/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm as an update to libstdc++-4.8.5-44.el7.i686

# ls -lAtr /var/tmp/yum-root-tNxpzd/
total 444
-rw-r--r-- 1 root root 447944 Nov 28  2017 libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm

Actions #4

Updated by vchepkov over 3 years ago

I suppose yum uses urlgrabber?

# urlgrabber -v http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm
grabbing: http://amazonlinux.us-east-1.amazonaws.com/blobstore/844a030e99d8e563fb9e83fa59b7c9dd76eea3f2a6e9ef71ed02e9420fec4f0c/libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm

# ls -l libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm 
-rw-r--r-- 1 root root 447944 Nov 28  2017 libstdc++-7.2.1-2.amzn2.0.1.x86_64.rpm

Actions #5

Updated by dalley over 3 years ago

  • Category deleted (Operator - Moved to Github Issues)
Actions #6

Updated by dkliban@redhat.com over 3 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 100
Actions #7

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 100 to Sprint 101
Actions #8

Updated by ipanova@redhat.com over 3 years ago

  • Sprint changed from Sprint 101 to Sprint 102
Actions #9

Updated by pulpbot over 3 years ago

  • Status changed from NEW to POST
Actions #10

Updated by ggainey over 3 years ago

  • Project changed from Pulp to RPM Support
  • Assignee set to ggainey
Actions #11

Updated by ggainey over 3 years ago

The PR I just created, fails in CI on regular-test, but passes on S3. Failure-mode is "something results in double-encode of the URL" (which of course fails).

Reset dev-env to master/core master/pulp_rpm (ie, without the associated PR) and ran the following on my dev-machine:

pulp rpm remote create --name az --url "http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list" --policy on_demand
pulp rpm repository create --name az --remote az --autopublish
pulp rpm repository sync --name az

On core/master and rpm/master, running on CentOS7, this sync works. The actual mirror selected can be seen here:

Aug 02 21:44:22 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com pulpcore-worker[8400]: pulp [035bb16793044aa4824d7a23d8978648]: pulp_rpm.app.tasks.synchronizing:INFO: Using url 'http://amazonlinux.us-east-1.amazonaws.com/2/core/2.0/x86_64/0c4b5094bba8d46b07c60e3d85cd8baac5f75d07af6a33086b6d0cd9eb2e13f1' from mirrorlist in place of the provided url http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list

Without the PR submitted, Amazon-2 syncs...fine?

Actions #12

Updated by vchepkov over 3 years ago

Most likely on_demand policy is hiding the problem. it downloads only metadata, right?

Actions #13

Updated by ggainey over 3 years ago

vchepkov wrote:

Most likely on_demand policy is hiding the problem. it downloads only metadata, right?

Exactly, I was just coming here to smack myself in the forehead. Continuing investigation into why this PR passes CI for S3 and fails non-S3.

Actions #14

Updated by dalley over 3 years ago

  • Sprint/Milestone set to 3.15.0
Actions #15

Updated by dalley over 3 years ago

  • Copied to Backport #9198: Backport #8875 "pulp fails downloading packages with special symbols from Amazon Linux repositories" to 3.14.z added

Added by ggainey over 3 years ago

Revision db55c27a | View on GitHub

Taught downloader to url-encode so odd backends can find content.

Amazon (for example) will reject RPMs with '+' in the filename if it's not url-encoded.

fixes #8875.

Actions #16

Updated by ggainey over 3 years ago

  • Status changed from POST to MODIFIED
Actions #17

Updated by pulpbot about 3 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF