Issue #7141
closedlazy sync does not properly handle upstream repos with duplicate content but different repo layouts
Description
Say you have two repos that contain the same rpm, but at different paths:
os /Packages/f/foo.rpm
ks /Packages/foo.rpm
Now you sync them both using 'on_demand' , but lets say the os repo gets the unit imported first. The rpm unit gets created with a relativepath of:
/Packages/f/foo.rpm
and then a lazy_catalog_content entry gets created with a url of: https://server.example.com/os//Packages/f/foo.rpm
This is all correct, now the unit gets processsed for the ks repo. It correctly reuses the same unit, but then creates a 2nd lazy_catalog_content entry with a url of: https://server.example.com/ks/Packages/f/foo.rpm
Its using the relativepath of the rpm unit to build the lazy_catalog_content's url attribute. In reality this looks like:
> db.lazy_content_catalog.find({"path": {$regex: '.*libXxf86vm\-devel\-1\.1\.4\-9\.el8\.i686\.rpm'}})
{ "_id" : ObjectId("5f07ee48cc531034cce38acc"), "_ns" : "lazy_content_catalog", "path" : "/var/lib/pulp/content/units/rpm/8a/cd9d02545dff8fab381aaa6185a778a26cacbec1585bcd8f7b2f6509f254a2/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "importer_id" : "5f07ed47cc53103b7b1f02c9", "unit_id" : "305ec066-9d0f-46a7-a198-6b966218a40e", "unit_type_id" : "rpm", "url" : "https://cdn.redhat.com/content/dist/rhel8/8.2/x86_64/appstream/kickstart/Packages/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "checksum" : "e375334723b40b39a407d243d1dab859a6edf1b2b383faa68c257c1afb399e2f", "checksum_algorithm" : "sha256", "revision" : 1, "data" : { } }
{ "_id" : ObjectId("5f07ef17cc531034b8afd793"), "_ns" : "lazy_content_catalog", "path" : "/var/lib/pulp/content/units/rpm/8a/cd9d02545dff8fab381aaa6185a778a26cacbec1585bcd8f7b2f6509f254a2/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "importer_id" : "5f07ed0dcc53103b7b1f02b5", "unit_id" : "305ec066-9d0f-46a7-a198-6b966218a40e", "unit_type_id" : "rpm", "url" : "https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/os/Packages/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "checksum" : "e375334723b40b39a407d243d1dab859a6edf1b2b383faa68c257c1afb399e2f", "checksum_algorithm" : "sha256", "revision" : 1, "data" : { } }
Directions to reproduce:
- Sync the rhel 8 base os repo using on_demand
- Sync the rhel 8 kickstart repo using on_demand
attempt to fetch each rpm from the kickstart repo or base os repo (maybe a random assortment of each)
Results, you will get a lot of 404s from the streamer app:
Jul 13 17:19:35 dhcp-8-30-46 pulp_streamer: pulp.streamer.server:INFO: Download failed [404]: https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/os/Packages/texlive-luatex85-20180414-14.el8.noarch.rpm
This is because its using the wrong relative path when fetching rpms from the kickstart repo. Its non-deterministic as to which lazy_content_catalogue entry it will pick, so some will get a 404 and some won't. Re-trying to download an rpm again, may result it in working.