Story #3934

Updated by bmbouter about 3 years ago

h2. Problem

Both rpm and docker have a situation where content units It's possible that erratum records can mutate. The DeclarativeVersion will treat We need to support this by storing a mutated content unit as a new unit and add it to a RepositoryVersion with sync. This is erratum record in addition to the previous version (the unmutated one). This effectively adds database when this happens. These erratum records will have the same unit erratum id (e.g. RHSA-2018:2557). However, we also need to the RepositoryVersion twice. ensure that erratum ids are unique per repository.

What would be great is if h2. Solution

* Define
the older one would be removed by DeclarativeVersion natural key for UpdateRecords as part of all the pipeline in some kind of configurable way.

h2. Solution

fields on the UpdateRecord (except pk).
Make sure we're storing a new stage called RemoveDuplicates that takes two parameters 'type' and 'field_name'. 'type' is record when an erratum changes. This includes changes to the content unit type that the stage erratum's collections or packages. The <code>digest</code> field on UpdateRecord should inspect. 'field_name' is indicate if two erratum records are the name of the field that needs same or not.
* Add a step during sync
to be unique within the RepositoryVersion. For example remove duplicate errata for RPM it will configure this stage with type=pulp_rpm.UpdateRecord and field_name='id'.

This new stage will unassociate any units
a repository. Duplicates are errata that are of type=type with shares the same field name as one of the units emitted in the DeclarativeContent stream. It will errata id. This could be a batching stage, handling batches of units at a time.

The stage can be used directly by plugin writers. This funcitonality will also be added as an option to
new DeclarativeVersion called <code>remove_duplicates</code> which will take the following form: stage.

'type': 'pulp_rpm.UpdateRecord',
'field_name': 'id'

Notice how the stage takes only 1 duplicate type, but the DeclarativeVersion takes
While it's not a list of them. The DeclarativeVersion will create one RemoveDuplicates stage requirement for each item in the list, making the pipeline a variable length depending on the data passed into <code>DeclarativeVersion</code>.

These extra stages
this task, we should be run before the AssociateContent stage. think about how we can generalize this solution for other plugins.