Story #4295
closedAs a user, a repository version has no advisories with the same id
100%
Description
Motivation¶
Currently it is possible to have multiple Advisories(UpdateRecord content units) with the same id
in one repo version. Those content units are not full duplicates, they have the same id
but different content. That leads to a creation of a bad repository at a publication time (id
of an advisory should be unique per yum repo, so Pulp needs to publish only one advisory per id
).
Possible cases¶
At sync time in additive mode:
- Repo version 1 contains an updateA, in repo version 2 a newer version of the updateA is being added
- Repo version 1 contains an updateA, in repo version 2 an older version of the updateA is being added
- Repo version 1 contains an updateA, in repo version 2 an alternative version of the updateA (e.g. for different distribution) is being added
In case of a mirror mode for sync, always the incoming version of the updateA is taken, regardless of any criteria.
Suggested solution¶
For any addition of content to an RPM repo version ensure that there is no more than one UpdateRecord with the same id
.
Decide which UpdateRecord to keep based on the criteria defined below.
Criteria¶
In case of mirror sync, just pick the incoming UpdateRecord.
In all other cases:
- updated_dates are the same, pkglist intersection is empty (e.g. base repo merged with debuginfo repo)
-> new UpdateRecord content unit with combined pkglist is created and added to a repo, old UpdateRecord is removed form a repo.
- updated_dates differ, pkglist intersection is non-empty (update/re-sync/upload-new case) -> UpdateRecord with newer updated_date should be in a repo.
- updated_dates differ, pkglist intersection is empty - ERROR CONDITION! (base and -debuginfo buit repos are from different versions, not at same date)
- tell to go make sure that the merging repos are up-to-date, and then retry
- update_dates are the same, pkglist intersection is non-empty and not equal to either pkglist - ERROR CONDITION!
- never-happen case - "something is Terribly Wrong Here"
A relevant functional test which is currently being skipped.
Related issues
Updated by ttereshc almost 6 years ago
- Subject changed from Remove UpdateRecord duplicates based on the `updated` timestamp to Remove UpdateRecord duplicates based on the `updated_date`
- Description updated (diff)
Updated by bmbouter almost 6 years ago
The technical proposal confuses me a bit, but I probably don't understand it. If the original duplicate is already associated with the repo version the pipeline will only handle the newly mutated content unit which will be a distinct content unit. So the only one you could "not pass along" would be the new one.
In thinking a bit about this, it seems the RemoveDuplicates stage isn't able to provide a "comparison based" removal. A Q() object is effectively built for each content unit, one-by-one.
In considering how we could generalize RemoveDuplicates a bit to allow the plugin writer to have more control over the Q() object there are a few options, but at that point RemoveDuplicates would have almost no code in it. So that gets met thinking that this issue should just make a brand-new custom stage for handling this that just builds the Q() object that RPM needs and use that in the custom pipeline for RPM.
Updated by ttereshc almost 6 years ago
- Description updated (diff)
If the original duplicate is already associated with the repo version the pipeline will only handle the newly mutated content unit which will be a distinct content unit. So the only one you could "not pass along" would be the new one.
Yes, that sounds right to me. You "pass along" the mutated content unit if you want to have it in a new repo version instead of the original one.
You "don't pass along" the mutated content unit if you want to keep the original one.
So that gets met thinking that this issue should just make a brand-new custom stage for handling this that just builds the Q() object that RPM needs and use that in the custom pipeline for RPM.
+1 to a new custom stage
I updated the description.
Updated by bmbouter almost 6 years ago
- Description updated (diff)
The description looks right. I touched it up a bit for clarity. I also added the part where the RemoveDuplicates stage was still planned to be used.
Maybe see the diff if it's right?
Updated by bmbouter almost 6 years ago
- Description updated (diff)
After some IRC discussion we are making the new stage pass or not pass along UpdateRecords so that all unassociating can happen in one place in the pipeline, i.e. RemoveDuplicates.
Updated by daviddavis almost 6 years ago
- Groomed changed from No to Yes
- Sprint Candidate changed from No to Yes
Updated by daviddavis over 5 years ago
- Sprint changed from Sprint 51 to Sprint 52
Updated by amacdona@redhat.com over 5 years ago
- Sprint changed from Sprint 53 to Sprint 54
Updated by ttereshc over 5 years ago
- Sprint changed from Sprint 54 to Sprint 55
Updated by ttereshc over 5 years ago
- Subject changed from Remove UpdateRecord duplicates based on the `updated_date` to As a user, after sync a repository version has no advisories with the same id
- Description updated (diff)
- Groomed changed from Yes to No
Updated by dkliban@redhat.com over 5 years ago
- Sprint changed from Sprint 55 to Sprint 56
Updated by daviddavis about 5 years ago
- Related to Story #5084: As a user, after copy or repo version creation there are no advisories with the same id added
Updated by ttereshc about 5 years ago
- Subject changed from As a user, after sync a repository version has no advisories with the same id to As a user, a repository version has no advisories with the same id
- Description updated (diff)
Updated by ttereshc about 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ttereshc
Updated by ttereshc about 5 years ago
- Related to deleted (Story #5084: As a user, after copy or repo version creation there are no advisories with the same id)
Updated by ttereshc about 5 years ago
- Has duplicate Story #5084: As a user, after copy or repo version creation there are no advisories with the same id added
Added by ttereshc about 5 years ago
Updated by ttereshc about 5 years ago
- Status changed from ASSIGNED to POST
Added by ttereshc about 5 years ago
Revision 1d507db4 | View on GitHub
Extend finalize_new_version to resolve advisory duplicates
Also move createrepo_c object generation for advisory to corresponding models. Add ability to generate createrpeo_c object with additional collecitons.
Updated by ttereshc about 5 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset 1d507db453d4e6a91518beb4981a434a29cc3c01.
Updated by ttereshc about 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Identify previous repo version to use in finalization steps
re #4295 https://pulp.plan.io/issues/4295