Project

Profile

Help

Issue #7141

closed

lazy sync does not properly handle upstream repos with duplicate content but different repo layouts

Added by jsherril@redhat.com over 3 years ago. Updated over 3 years ago.

Status:
CLOSED - WORKSFORME
Priority:
Normal
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 77
Quarter:

Description

Say you have two repos that contain the same rpm, but at different paths:

os /Packages/f/foo.rpm

ks /Packages/foo.rpm

Now you sync them both using 'on_demand' , but lets say the os repo gets the unit imported first. The rpm unit gets created with a relativepath of:

/Packages/f/foo.rpm

and then a lazy_catalog_content entry gets created with a url of: https://server.example.com/os//Packages/f/foo.rpm

This is all correct, now the unit gets processsed for the ks repo. It correctly reuses the same unit, but then creates a 2nd lazy_catalog_content entry with a url of: https://server.example.com/ks/Packages/f/foo.rpm

Its using the relativepath of the rpm unit to build the lazy_catalog_content's url attribute. In reality this looks like:

> db.lazy_content_catalog.find({"path": {$regex: '.*libXxf86vm\-devel\-1\.1\.4\-9\.el8\.i686\.rpm'}})
{ "_id" : ObjectId("5f07ee48cc531034cce38acc"), "_ns" : "lazy_content_catalog", "path" : "/var/lib/pulp/content/units/rpm/8a/cd9d02545dff8fab381aaa6185a778a26cacbec1585bcd8f7b2f6509f254a2/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "importer_id" : "5f07ed47cc53103b7b1f02c9", "unit_id" : "305ec066-9d0f-46a7-a198-6b966218a40e", "unit_type_id" : "rpm", "url" : "https://cdn.redhat.com/content/dist/rhel8/8.2/x86_64/appstream/kickstart/Packages/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "checksum" : "e375334723b40b39a407d243d1dab859a6edf1b2b383faa68c257c1afb399e2f", "checksum_algorithm" : "sha256", "revision" : 1, "data" : {  } }
{ "_id" : ObjectId("5f07ef17cc531034b8afd793"), "_ns" : "lazy_content_catalog", "path" : "/var/lib/pulp/content/units/rpm/8a/cd9d02545dff8fab381aaa6185a778a26cacbec1585bcd8f7b2f6509f254a2/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "importer_id" : "5f07ed0dcc53103b7b1f02b5", "unit_id" : "305ec066-9d0f-46a7-a198-6b966218a40e", "unit_type_id" : "rpm", "url" : "https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/os/Packages/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "checksum" : "e375334723b40b39a407d243d1dab859a6edf1b2b383faa68c257c1afb399e2f", "checksum_algorithm" : "sha256", "revision" : 1, "data" : {  } }

Directions to reproduce:

  1. Sync the rhel 8 base os repo using on_demand
  2. Sync the rhel 8 kickstart repo using on_demand

attempt to fetch each rpm from the kickstart repo or base os repo (maybe a random assortment of each)

Results, you will get a lot of 404s from the streamer app:

Jul 13 17:19:35 dhcp-8-30-46 pulp_streamer: pulp.streamer.server:INFO: Download failed [404]: https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/os/Packages/texlive-luatex85-20180414-14.el8.noarch.rpm

This is because its using the wrong relative path when fetching rpms from the kickstart repo. Its non-deterministic as to which lazy_content_catalogue entry it will pick, so some will get a 404 and some won't. Re-trying to download an rpm again, may result it in working.

Also available in: Atom PDF