Project

Profile

Help

Issue #3306

closed

Unable to sync Lazy Atomic KS repos from CDN due to missing '/'

Added by paji@redhat.com over 6 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
2.15.1
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 31
Quarter:

Description

1) Rhel Kickstart repositories that point to CDN and are created via Katello or RHUI usually have the following feed url structure

Feed: https://cdn.redhat.com/content/..../server/7/7.0/x86_64/kickstart  

note no '/' at the end.
Atomic KS Repos available via CDN are treated the same as a yum ks repo by pulp. Pulp creates distributions from them after parsing the tree info file. Now an atomic KS repo has a similar feed url

Feed: https://cdn.redhat.com/content/..../rhel/atomic/7/7Server/x86_64/kickstart

note no '/' at the end.

2) Now when the KS repos are "lazy" synced a catalog entry is created in db.lazy_content_catalog

>db.lazy_content_catalog.find({"unit_type_id": "distribution"})
.....
{ "_id" : ObjectId("5a60dc6370be6f25290fb8f5"), "_ns" : "lazy_content_catalog", 
"path" : "......./images/pxeboot/initrd.img.....", "importer_id" : "5a60da1d70be6f37511634c6", 
"unit_id" : "......", "unit_type_id" : "distribution", 
"url" : "https://cdn.redhat.com/content/dist/rhel/server/7/7.0/x86_64/kickstart/images/pxeboot/initrd.img",
 "checksum" : ".....", "checksum_algorithm" : "sha256", "revision" : 1, "data" : {  } }
.....

Note the url entry thats generated. This is generated by this code
https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/parse/treeinfo.py#L235

....
def update_catalog_entries(self, unit, files):
.....
.....
          entry.path = os.path.join(root, _file[RELATIVE_PATH])
          entry.url = urljoin(self.feed, _file[RELATIVE_PATH])
......

So the url is the joined as feed + relative path.

For atomic KS repos however this code generates the wrong value as noticed by this mongo entry

> db.lazy_content_catalog.find({"url":"https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/vmlinuz"}).limit(1)
{ "_id" : ObjectId("5a60e63e70be6f25290fc30f"), "_ns" : "lazy_content_catalog", 
"path" : "/var/lib/pulp/content/units/distribution/6d/7aa6882904de7b10c3d87991fd2b2ae896a40cb89aeb9915e274db07e2f101/images/pxeboot/vmlinuz", "importer_id" : "5a60da5a70be6f37522b5410", "unit_id" : "59366fab-0380-427b-a49b-e80bc4043e14", "unit_type_id" : "distribution", 
"url" : "https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/vmlinuz", 
"revision" : 1, "data" : {  } }

Note the url field is missing the word "kickstart" making it the wrong URL.

This is due to the non-intutive behaviour of urljoin.

>>> from urlparse import urljoin
>>> urljoin("https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart",
"images/pxeboot/initrd.img")

'https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/images/pxeboot/initrd.img'

see how it chopped off the word kickstart in the combined url. This causes 404's when the lazy sync tries to connect to the actual source.

urljoin works if a "/" is added to the feed URL.

>>> from urlparse import urljoin
>>> urljoin("https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart/",
"images/pxeboot/initrd.img")

'https://cdn.redhat.com/content/dist/rhel/atomic/7/7Server/x86_64/kickstart/images/pxeboot/initrd.img'

This code path is used by both RHEL KS and Atomic KS repos. However the rhel one works because the "feed" passed to it already has a slash on it.

That slash is appended much earlier during the sync as a part of https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/sync.py#L130-L131.

    def sync_feed(self):
       .................
       ...........
        if repo_url:
            repo_url_slash = self._url_modify(repo_url, ensure_trailing_slash=True)
      ....
            try:
                # it returns None if it can't download repomd.xml
                if self.check_metadata(repo_url_slash):
                    return [repo_url_slash]
            except PulpCodedException:
               pass
      .......
      .......
        return [repo_url]

For regualr RHEL ks repos the above code works well since https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/sync.py#L136-L137 returns "repo_url_slash"
However atomic KS repos do not have any repodata or repomd files. This causes the 'check_metadata' to fail and the last line of that method just returns "repo_url". i.e. variable without slash

This issue can be fixed at multiple places
a)

https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/parse/treeinfo.py#L235

....
def update_catalog_entries(self, unit, files):
.....
.....
          entry.path = os.path.join(root, _file[RELATIVE_PATH])
          entry.url = urljoin(self.feed, _file[RELATIVE_PATH])
......

Make sure the self.feed has a '/' at the end before joining

b)

https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/importers/yum/sync.py#L130-L131.

    def sync_feed(self):
       .................
       ...........
        return [repo_url]

make sure that returns repo that ends with a slash.

Actions #2

Updated by ttereshc over 6 years ago

  • Project changed from Infrastructure to RPM Support
  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc

Added by ttereshc over 6 years ago

Revision cc776c6b | View on GitHub

Ensure trailing slash is always added to the feed

closes #3306 https://pulp.plan.io/issues/3306

Actions #3

Updated by ttereshc over 6 years ago

  • Status changed from ASSIGNED to POST
  • Sprint/Milestone set to 53
Actions #4

Updated by ttereshc over 6 years ago

  • Status changed from POST to MODIFIED
Actions #5

Updated by pcreech over 6 years ago

  • Platform Release set to 2.15.1
Actions #6

Updated by ttereshc over 6 years ago

A workaround for older versions of Pulp RPM which don't contain this fix: update the feed of your repository so it has a slash / at the end of the provided URL:
https://cdn.redhat.com/content/..../server/7/7.0/x86_64/kickstart/

Added by ttereshc over 6 years ago

Revision 88860c2d | View on GitHub

Ensure trailing slash is always added to the feed

closes #3306 https://pulp.plan.io/issues/3306

(cherry picked from commit cc776c6b69cc5e4b1dcfd80a5e50500f81561852)

Actions #7

Updated by ttereshc over 6 years ago

Actions #8

Updated by pcreech about 6 years ago

  • Status changed from MODIFIED to 5
Actions #9

Updated by pcreech about 6 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #10

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 31
Actions #11

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (53)
Actions #12

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF