Project

Profile

Help

Issue #7141

closed

lazy sync does not properly handle upstream repos with duplicate content but different repo layouts

Added by jsherril@redhat.com almost 4 years ago. Updated over 3 years ago.

Status:
CLOSED - WORKSFORME
Priority:
Normal
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 77
Quarter:

Description

Say you have two repos that contain the same rpm, but at different paths:

os /Packages/f/foo.rpm

ks /Packages/foo.rpm

Now you sync them both using 'on_demand' , but lets say the os repo gets the unit imported first. The rpm unit gets created with a relativepath of:

/Packages/f/foo.rpm

and then a lazy_catalog_content entry gets created with a url of: https://server.example.com/os//Packages/f/foo.rpm

This is all correct, now the unit gets processsed for the ks repo. It correctly reuses the same unit, but then creates a 2nd lazy_catalog_content entry with a url of: https://server.example.com/ks/Packages/f/foo.rpm

Its using the relativepath of the rpm unit to build the lazy_catalog_content's url attribute. In reality this looks like:

> db.lazy_content_catalog.find({"path": {$regex: '.*libXxf86vm\-devel\-1\.1\.4\-9\.el8\.i686\.rpm'}})
{ "_id" : ObjectId("5f07ee48cc531034cce38acc"), "_ns" : "lazy_content_catalog", "path" : "/var/lib/pulp/content/units/rpm/8a/cd9d02545dff8fab381aaa6185a778a26cacbec1585bcd8f7b2f6509f254a2/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "importer_id" : "5f07ed47cc53103b7b1f02c9", "unit_id" : "305ec066-9d0f-46a7-a198-6b966218a40e", "unit_type_id" : "rpm", "url" : "https://cdn.redhat.com/content/dist/rhel8/8.2/x86_64/appstream/kickstart/Packages/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "checksum" : "e375334723b40b39a407d243d1dab859a6edf1b2b383faa68c257c1afb399e2f", "checksum_algorithm" : "sha256", "revision" : 1, "data" : {  } }
{ "_id" : ObjectId("5f07ef17cc531034b8afd793"), "_ns" : "lazy_content_catalog", "path" : "/var/lib/pulp/content/units/rpm/8a/cd9d02545dff8fab381aaa6185a778a26cacbec1585bcd8f7b2f6509f254a2/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "importer_id" : "5f07ed0dcc53103b7b1f02b5", "unit_id" : "305ec066-9d0f-46a7-a198-6b966218a40e", "unit_type_id" : "rpm", "url" : "https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/os/Packages/libXxf86vm-devel-1.1.4-9.el8.i686.rpm", "checksum" : "e375334723b40b39a407d243d1dab859a6edf1b2b383faa68c257c1afb399e2f", "checksum_algorithm" : "sha256", "revision" : 1, "data" : {  } }

Directions to reproduce:

  1. Sync the rhel 8 base os repo using on_demand
  2. Sync the rhel 8 kickstart repo using on_demand

attempt to fetch each rpm from the kickstart repo or base os repo (maybe a random assortment of each)

Results, you will get a lot of 404s from the streamer app:

Jul 13 17:19:35 dhcp-8-30-46 pulp_streamer: pulp.streamer.server:INFO: Download failed [404]: https://cdn.redhat.com/content/dist/rhel8/8/x86_64/appstream/os/Packages/texlive-luatex85-20180414-14.el8.noarch.rpm

This is because its using the wrong relative path when fetching rpms from the kickstart repo. Its non-deterministic as to which lazy_content_catalogue entry it will pick, so some will get a 404 and some won't. Re-trying to download an rpm again, may result it in working.

Actions #1

Updated by jsherril@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #3

Updated by rchan almost 4 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com
  • Sprint set to Sprint 77

dkliban says: investigated how to fix ^ and got a patch working - will make a PR tomorrow - we will also need to write a script for users to cleanup their DB and fix existing systems.

Actions #4

Updated by ttereshc almost 4 years ago

  • Triaged changed from No to Yes
Actions #5

Updated by pulpbot almost 4 years ago

  • Status changed from ASSIGNED to POST
Actions #6

Updated by dkliban@redhat.com over 3 years ago

  • Status changed from POST to CLOSED - WORKSFORME

Even though I said that this bug exists in Pulp, it was only based on my initial reading of the code. I've tried to reproduce the bug and I was not able to. Lazy Catalog Entries are being created correctly for each repository layout. After reading the code again, the correct behavior makes sense. The problem experienced by Katello users is most likely related to the fact that repositories created by Katello use custom importers and/or distributors. I'll help investigate it from the Katello side.

Also available in: Atom PDF