Project

Profile

Help

Issue #3306

closed

Unable to sync Lazy Atomic KS repos from CDN due to missing '/'

Added by paji@redhat.com over 6 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
2.15.1
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 31
Quarter:

Description

1) Rhel Kickstart repositories that point to CDN and are created via Katello or RHUI usually have the following feed url structure

Feed: https://cdn.redhat.com/content/..../server/7/7.0/x86_64/kickstart  

note no '/' at the end.
Atomic KS Repos available via CDN are treated the same as a yum ks repo by pulp. Pulp creates distributions from them after parsing the tree info file. Now an atomic KS repo has a similar feed url

Feed: https://cdn.redhat.com/content/..../rhel/atomic/7/7Server/x86_64/kickstart

note no '/' at the end.

2) Now when the KS repos are "lazy" synced a catalog entry is created in db.lazy_content_catalog

>db.lazy_content_catalog.find({"unit_type_id": "distribution"})
.....
{ "_id" : ObjectId("5a60dc6370be6f25290fb8f5"), "_ns" : "lazy_content_catalog", 
"path" : "......./images/pxeboot/initrd.img.....", "importer_id" : "5a60da1d70be6f37511634c6", 
"unit_id" : "......", "unit_type_id" : "distribution", 
"url" : "https://cdn.redhat.com/content/dist/rhel/server/7/7.0/x86_64/kickstart/images/pxeboot/initrd.img",
 "checksum" : ".....", "checksum_algorithm" : "sha256", "revision" : 1, "data" : {  } }
.....

Note the url entry thats generated. This is generated by this code
https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/parse/treeinfo.py#L235

....
def update_catalog_entries(self, unit, files):
.....
.....
          entry.path = os.path.join(root, _file[RELATIVE_PATH])
          entry.url = urljoin(self.feed, _file[RELATIVE_PATH])
......

So the url is the joined as feed + relative path.

For atomic KS repos however this code generates the wrong value as noticed by this mongo entry

> db.lazy_content_catalog.find({"url":"https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/vmlinuz"}).limit(1)
{ "_id" : ObjectId("5a60e63e70be6f25290fc30f"), "_ns" : "lazy_content_catalog", 
"path" : "/var/lib/pulp/content/units/distribution/6d/7aa6882904de7b10c3d87991fd2b2ae896a40cb89aeb9915e274db07e2f101/images/pxeboot/vmlinuz", "importer_id" : "5a60da5a70be6f37522b5410", "unit_id" : "59366fab-0380-427b-a49b-e80bc4043e14", "unit_type_id" : "distribution", 
"url" : "https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/vmlinuz", 
"revision" : 1, "data" : {  } }

Note the url field is missing the word "kickstart" making it the wrong URL.

This is due to the non-intutive behaviour of urljoin.

>>> from urlparse import urljoin
>>> urljoin("https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart",
"images/pxeboot/initrd.img")

'https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/initrd.img'

see how it chopped off the word kickstart in the combined url. This causes 404's when the lazy sync tries to connect to the actual source.

urljoin works if a "/" is added to the feed URL.

>>> from urlparse import urljoin
>>> urljoin("https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart/",
"images/pxeboot/initrd.img")

'https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart/images/pxeboot/initrd.img'

This code path is used by both RHEL KS and Atomic KS repos. However the rhel one works because the "feed" passed to it already has a slash on it.

That slash is appended much earlier during the sync as a part of https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/sync.py#L130-L131.

    def sync_feed(self):
       .................
       ...........
        if repo_url:
            repo_url_slash = self._url_modify(repo_url, ensure_trailing_slash=True)
      ....
            try:
                # it returns None if it can't download repomd.xml
                if self.check_metadata(repo_url_slash):
                    return [repo_url_slash]
            except PulpCodedException:
               pass
      .......
      .......
        return [repo_url]

For regualr RHEL ks repos the above code works well since https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/sync.py#L136-L137 returns "repo_url_slash"
However atomic KS repos do not have any repodata or repomd files. This causes the 'check_metadata' to fail and the last line of that method just returns "repo_url". i.e. variable without slash

This issue can be fixed at multiple places
a)

https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/parse/treeinfo.py#L235

....
def update_catalog_entries(self, unit, files):
.....
.....
          entry.path = os.path.join(root, _file[RELATIVE_PATH])
          entry.url = urljoin(self.feed, _file[RELATIVE_PATH])
......

Make sure the self.feed has a '/' at the end before joining

b)

https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/sync.py#L130-L131.

    def sync_feed(self):
       .................
       ...........
        return [repo_url]

make sure that returns repo that ends with a slash.

Also available in: Atom PDF