Story #4295
Updated by ttereshc over 5 years ago
h3. Motivation Currently it is possible to have multiple Advisories(UpdateRecord UpdateRecord content units) units (previously known as "errata") with the same @id@ in one repo version. Those content units are not full duplicates, they have the same @id@ but different content. That leads to a creation of a bad repository at a publication time (@id@ of an advisory update should be unique per yum repo, so Pulp needs to publish only one advisory update per @id@). h3. Possible cases At sync time in additive mode: # Repo version 1 contains an updateA, in repo version 2 a newer version of the updateA is being added # Repo version 1 contains an updateA, in repo version 2 an older version of the updateA is being added # Repo version 1 contains an updateA, in repo version 2 an alternative version of the updateA (e.g. for different distribution) is being added In case of a mirror mode for sync, mode, always the incoming version of the updateA is taken, regardless of any criteria. the `updated_date`. h3. Suggested solution Each repo version should have no more than one UpdateRecord with the same @id@. Decide which update to keep based on the criteria defined below. Proposal for sync case: `updated_date` (it's already among required fields on the model). Technical proposal: - create a new stage @AdvisoryContentUnitMerger@ @ErratumContentUnitFilter@ which searches for UpdateRecords with the same id same-id-but-older UpdateRecord and apply if the criteria, then existing one should remain do *not* pass the needed UpdateRecord further down the pipeline and remove old UpdateRecord from a repo if needed.. pipeline. - the new stage should precede the @ErratumContentUnitSaver@ stage (which should be renamed to @AdvisoryContentUnitSaver@) h4. Criteria In case - add RemoveDuplicates stage which would remove duplicated UpdateRecord by @id@. Expected result: - case1 - newer version of mirror sync, just pick the incoming UpdateRecord. In all other cases: * updated_dates are updateA is in the same, pkglist intersection is empty (e.g. base repo merged version 2 - case2 - no changes in repo version 2 with debuginfo repo) -> *new* UpdateRecord content unit with combined pkglist is created and added regards to a repo, old UpdateRecord the updateA - case3 - result is removed form a repo. * updated_dates differ, pkglist intersection based purely on the `updated` timestamp, no other logic is non-empty (update/re-sync/upload-new case) -> UpdateRecord with newer updated_date should involved, the alternative version of the updateA will be in a repo. * updated_dates differ, pkglist intersection repo version 2 if its timestamp is empty - ERROR CONDITION! (base and -debuginfo buit repos are from different versions, newer. NOTE: this case is probably not at same date) ** tell to go make sure that very common but still worrying if we mutate the merging repos are up-to-date, and then retry * update_dates are the same, pkglist intersection is non-empty and updateA silently. We can add logging for any mutation of an update but it will cover not equal to either pkglist - ERROR CONDITION! ** never-happen case - "something is Terribly Wrong Here" only case3 but case1 as well. "A relevant functional fuctional test":https://github.com/pulp/pulp_rpm/blob/3e89fa57e3e5b63a90dd18f5c29d5086c17f0ce8/pulp_rpm/tests/functional/api/test_sync.py#L212 which is currently being skipped.