Issue #3306
closedUnable to sync Lazy Atomic KS repos from CDN due to missing '/'
Description
1) Rhel Kickstart repositories that point to CDN and are created via Katello or RHUI usually have the following feed url structure
Feed: https://cdn.redhat.com/content/..../server/7/7.0/x86_64/kickstart
note no '/' at the end.
Atomic KS Repos available via CDN are treated the same as a yum ks repo by pulp. Pulp creates distributions from them after parsing the tree info file. Now an atomic KS repo has a similar feed url
Feed: https://cdn.redhat.com/content/..../rhel/atomic/7/7Server/x86_64/kickstart
note no '/' at the end.
2) Now when the KS repos are "lazy" synced a catalog entry is created in db.lazy_content_catalog
>db.lazy_content_catalog.find({"unit_type_id": "distribution"})
.....
{ "_id" : ObjectId("5a60dc6370be6f25290fb8f5"), "_ns" : "lazy_content_catalog",
"path" : "......./images/pxeboot/initrd.img.....", "importer_id" : "5a60da1d70be6f37511634c6",
"unit_id" : "......", "unit_type_id" : "distribution",
"url" : "https://cdn.redhat.com/content/dist/rhel/server/7/7.0/x86_64/kickstart/images/pxeboot/initrd.img",
"checksum" : ".....", "checksum_algorithm" : "sha256", "revision" : 1, "data" : { } }
.....
Note the url entry thats generated. This is generated by this code
https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/parse/treeinfo.py#L235
....
def update_catalog_entries(self, unit, files):
.....
.....
entry.path = os.path.join(root, _file[RELATIVE_PATH])
entry.url = urljoin(self.feed, _file[RELATIVE_PATH])
......
So the url is the joined as feed + relative path.
For atomic KS repos however this code generates the wrong value as noticed by this mongo entry
> db.lazy_content_catalog.find({"url":"https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/vmlinuz"}).limit(1)
{ "_id" : ObjectId("5a60e63e70be6f25290fc30f"), "_ns" : "lazy_content_catalog",
"path" : "/var/lib/pulp/content/units/distribution/6d/7aa6882904de7b10c3d87991fd2b2ae896a40cb89aeb9915e274db07e2f101/images/pxeboot/vmlinuz", "importer_id" : "5a60da5a70be6f37522b5410", "unit_id" : "59366fab-0380-427b-a49b-e80bc4043e14", "unit_type_id" : "distribution",
"url" : "https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/vmlinuz",
"revision" : 1, "data" : { } }
Note the url field is missing the word "kickstart" making it the wrong URL.
This is due to the non-intutive behaviour of urljoin.
>>> from urlparse import urljoin
>>> urljoin("https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart",
"images/pxeboot/initrd.img")
'https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/initrd.img'
see how it chopped off the word kickstart in the combined url. This causes 404's when the lazy sync tries to connect to the actual source.
urljoin works if a "/" is added to the feed URL.
>>> from urlparse import urljoin
>>> urljoin("https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart/",
"images/pxeboot/initrd.img")
'https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart/images/pxeboot/initrd.img'
This code path is used by both RHEL KS and Atomic KS repos. However the rhel one works because the "feed" passed to it already has a slash on it.
That slash is appended much earlier during the sync as a part of https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/sync.py#L130-L131.
def sync_feed(self):
.................
...........
if repo_url:
repo_url_slash = self._url_modify(repo_url, ensure_trailing_slash=True)
....
try:
# it returns None if it can't download repomd.xml
if self.check_metadata(repo_url_slash):
return [repo_url_slash]
except PulpCodedException:
pass
.......
.......
return [repo_url]
For regualr RHEL ks repos the above code works well since https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/sync.py#L136-L137 returns "repo_url_slash"
However atomic KS repos do not have any repodata or repomd files. This causes the 'check_metadata' to fail and the last line of that method just returns "repo_url". i.e. variable without slash
This issue can be fixed at multiple places
a)
....
def update_catalog_entries(self, unit, files):
.....
.....
entry.path = os.path.join(root, _file[RELATIVE_PATH])
entry.url = urljoin(self.feed, _file[RELATIVE_PATH])
......
Make sure the self.feed has a '/' at the end before joining
b)
def sync_feed(self):
.................
...........
return [repo_url]
make sure that returns repo that ends with a slash.
Updated by ttereshc about 5 years ago
- Project changed from Infrastructure to RPM Support
- Status changed from NEW to ASSIGNED
- Assignee set to ttereshc
Added by ttereshc about 5 years ago
Updated by ttereshc about 5 years ago
- Status changed from ASSIGNED to POST
- Sprint/Milestone set to 53
Updated by ttereshc about 5 years ago
- Status changed from POST to MODIFIED
Applied in changeset cc776c6b69cc5e4b1dcfd80a5e50500f81561852.
Updated by ttereshc about 5 years ago
A workaround for older versions of Pulp RPM which don't contain this fix: update the feed of your repository so it has a slash /
at the end of the provided URL:
https://cdn.redhat.com/content/..../server/7/7.0/x86_64/kickstart/
Added by ttereshc about 5 years ago
Ensure trailing slash is always added to the feed
closes #3306 https://pulp.plan.io/issues/3306
(cherry picked from commit cc776c6b69cc5e4b1dcfd80a5e50500f81561852)
Updated by ttereshc about 5 years ago
Applied in changeset 88860c2d7b3d536638f31d17ba7a6805aefd1e21.
Updated by pcreech about 5 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Ensure trailing slash is always added to the feed
closes #3306 https://pulp.plan.io/issues/3306