Issue #8981
closedSyncing mirrolist based remote fails
Added by cityofships over 3 years ago. Updated over 3 years ago.
Description
Step to reproduce - create a remote with a mirrorlist URL and try syncing it - it fails, see this paste: https://paste.centos.org/view/eeff7112
Related issues
Updated by ggainey over 3 years ago
Moving reproducer here before the pastebin expires:
1560 pulp rpm remote create --name epel-mirror --url "https://mirrors.fedoraproject.org/mirrorlist?repo=epel-8&arch=x86_64&infra=stock&content=centos"
1561 pulp rpm repository create --name epel-mirror --remote epel-mirror --autopublish
1562 pulp rpm repository sync --name epel-mirror --mirror
failure log:
pulp [375c8cbc96ae40cebff0b113b206dd70]: pulp_rpm.app.tasks.synchronizing:INFO: Synchronizing: repository=epel-mirror remote=epel-mirror
pulp [f79a9dcd7b53489c85d3bd3296cb82dd]: 127.0.0.1 - admin [28/Jun/2021:18:16:10 +0000] "GET /pulp/api/v3/tasks/2fa8ea54-d7f3-4a70-a8fb-ec8574c0d89d/ HTTP/1.1" 200 655 "-" "python-requests/2.25.1"
cr_xml_parser_generic: parsing error '/var/lib/pulp/tmp/17561@pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com/2fa8ea54-d7f3-4a70-a8fb-ec8574c0d89d/tmp6qtbrxs1': Document is empty
pulp [375c8cbc96ae40cebff0b113b206dd70]: rq.worker:ERROR: Traceback (most recent call last):
File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py", line 1013, in perform_job
rv = job.perform()
File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 709, in perform
self._result = self._execute()
File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 732, in _execute
result = self.func(*self.args, **self.kwargs)
File "/home/vagrant/devel/pulp_rpm/pulp_rpm/app/tasks/synchronizing.py", line 340, in synchronize
if optimize and is_optimized_sync(repository, remote, remote_url):
File "/home/vagrant/devel/pulp_rpm/pulp_rpm/app/tasks/synchronizing.py", line 265, in is_optimized_sync
repomd = cr.Repomd(repomd_path)
File "/usr/local/lib/pulp/lib64/python3.6/site-packages/createrepo_c/__init__.py", line 155, in __init__
xml_parse_repomd(path, self)
File "/usr/local/lib/pulp/lib64/python3.6/site-packages/createrepo_c/__init__.py", line 346, in xml_parse_repomd
return _createrepo_c.xml_parse_repomd(path, repomdobj, warningcb)
createrepo_c.CreaterepoCError: Parse error '/var/lib/pulp/tmp/17561@pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com/2fa8ea54-d7f3-4a70-a8fb-ec8574c0d89d/tmp6qtbrxs1' at line: 1 (Document is empty
)
pulp [44eeb4dec48c40979879b5971b9e6ff2]: 127.0.0.1 - admin [28/Jun/2021:18:16:11 +0000] "GET /pulp/api/v3/tasks/2fa8ea54-d7f3-4a70-a8fb-ec8574c0d89d/ HTTP/1.1" 200 1943 "-" "python-requests/2.25.1"
Updated by ggainey over 3 years ago
Problem was introduced when we fixed the misuse of urljoin in commit fd130b . Workaround patch:
(master) ~/github/Pulp3/pulp_rpm $ git diff
diff --git a/pulp_rpm/app/tasks/synchronizing.py b/pulp_rpm/app/tasks/synchronizing.py
index d6d62e3b..e1ce7337 100644
--- a/pulp_rpm/app/tasks/synchronizing.py
+++ b/pulp_rpm/app/tasks/synchronizing.py
@@ -186,7 +186,9 @@ def get_repomd_file(remote, url):
pulpcore.plugin.download.DownloadResult: downloaded repomd.xml
"""
- downloader = remote.get_downloader(url=urlpath_sanitize(url, "repodata/repomd.xml"))
+ from urllib.parse import urljoin
+ downloader = remote.get_downloader(url=urljoin(url, "repodata/repomd.xml"))
+ # downloader = remote.get_downloader(url=urlpath_sanitize(url, "repodata/repomd.xml"))
try:
result = downloader.fetch()
Updated by ggainey over 3 years ago
Description of what happened here:
- Incoming URL is https://mirrors.fedoraproject.org/mirrorlist?repo=epel-8&arch=x86_64&infra=stock&content=centos
- Old code
- creates repomd.xml URL using
urljoin(remote_url, 'repodata/repomd.xml')
-
fetch_remote_url()
attempts to get_repomd_file(broken-url) -
get_repomd_file()
interprets the resulting 404 as "return None" -
fetch_remote_url()
interprets "get_repomd_file() == None
" as "must be a mirror, try fetch_mirror()" -
fetch_mirror()
uses full-remote-url, which works.
- creates repomd.xml URL using
- Replacing
urljoin()
- new code generates a different URL
- This, too, is wrong - but fetching that URL actually returns the mirrorlist file
-
get_repomd_file()
assumes " I got a 200, must be repomd.xml", and returns it instead of tryingfetch_mirror()
- Caller of
get_repomd_file()
assumes it has a repomd.xml, passes it to createrepo_c, and Chaos Ensues.
Updated by cityofships over 3 years ago
I'm guessing the mirrorlist scenario is not covered by any CI tests?
Updated by dalley over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ggainey
- Sprint set to Sprint 99
Updated by ggainey over 3 years ago
cityofships wrote:
I'm guessing the mirrorlist scenario is not covered by any CI tests?
It was, but not mirrorlist-url-with-params. It is, as of https://github.com/pulp/pulp_rpm/pull/2032
Updated by dalley over 3 years ago
- Status changed from ASSIGNED to POST
- Triaged changed from No to Yes
- Sprint changed from Sprint 99 to Sprint 100
Updated by pulpbot over 3 years ago
Added by ggainey over 3 years ago
Updated by ggainey over 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset 440155ba24c296c982c329e76cb3768c0bac53e9.
Updated by dalley over 3 years ago
- Copied to Backport #9026: Backport 8981 (Syncing mirrolist based remote fails) to 3.13.3 added
Updated by dalley over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Taught get_repomd_file() to recognize that some URLs come with parameters.
Necessary to drive the ability to recognize/respond to mirrorlist URLs.
fixes #8981