Project

Profile

Help

Story #4295

Updated by ttereshc almost 5 years ago

h3. Motivation 

 Currently it is possible to have multiple Advisories(UpdateRecord UpdateRecord content units) units (previously known as "errata") with the same @id@ in one repo version. Those content units are not full duplicates, they have the same @id@ but different content. That leads to a creation of a bad repository at a publication time (@id@ of an advisory update should be unique per yum repo, so Pulp needs to publish only one advisory update per @id@). 

 h3. Possible cases 

 At sync time in additive mode: 
 # Repo version 1 contains an updateA, in repo version 2 a newer version of the updateA is being added 
 # Repo version 1 contains an updateA, in repo version 2 an older version of the updateA is being added 
 # Repo version 1 contains an updateA, in repo version 2 an alternative version of the updateA (e.g. for different distribution) is being added 

 In case of a mirror mode for sync, mode, always the incoming version of the updateA is taken, regardless of any criteria. the `updated_date`. 

 h3. Suggested solution 

 Each repo version should have no more than one UpdateRecord with the same @id@. 

 Decide which update to keep based on the criteria defined below. 

 Proposal for sync case: `updated_date` (it's already among required fields on the model). 
 Technical proposal: 
  - create a new stage @AdvisoryContentUnitMerger@ @ErratumContentUnitFilter@ which searches for UpdateRecords with the same id same-id-but-older UpdateRecord and apply if the criteria, then existing one should remain do *not* pass the needed UpdateRecord further down the pipeline and remove old UpdateRecord from a repo if needed.. pipeline. 
  - the new stage should precede the @ErratumContentUnitSaver@ stage (which should be renamed to @AdvisoryContentUnitSaver@) 

 h4. Criteria 

 In case  
 - add RemoveDuplicates stage which would remove duplicated UpdateRecord by @id@.  

 Expected result: 
  - case1 - newer version of mirror sync, just pick the incoming UpdateRecord. 

 In all other cases: 

     * updated_dates are updateA is in the same, pkglist intersection is empty (e.g. base repo merged version 2 
  - case2 - no changes in repo version 2 with debuginfo repo) 
 -> *new* UpdateRecord content unit with combined pkglist is created and added regards to a repo, old UpdateRecord the updateA 
  - case3 - result is removed form a repo. 

     * updated_dates differ, pkglist intersection based purely on the `updated` timestamp, no other logic is non-empty (update/re-sync/upload-new case) -> UpdateRecord with newer updated_date should involved, the alternative version of the updateA will be in a repo. 

     * updated_dates differ, pkglist intersection repo version 2 if its timestamp is empty - ERROR CONDITION! (base and -debuginfo buit repos are from different versions, newer. NOTE: this case is probably not at same date) 
     ** tell to go make sure that very common but still worrying if we mutate the merging repos are up-to-date, and then retry 

     * update_dates are the same, pkglist intersection is non-empty and updateA silently. We can add logging for any mutation of an update but it will cover not equal to either pkglist - ERROR CONDITION! 
     ** never-happen case - "something is Terribly Wrong Here" only case3 but case1 as well. 

 "A relevant functional fuctional test":https://github.com/pulp/pulp_rpm/blob/3e89fa57e3e5b63a90dd18f5c29d5086c17f0ce8/pulp_rpm/tests/functional/api/test_sync.py#L212 which is currently being skipped.

Back