Story #3934

Updated by almost 3 years ago

h2. Problem

Both rpm and docker have a situation where content units can mutate. The DeclarativeVersion will treat a mutated content unit as a new unit and add it to a RepositoryVersion with sync. This is in addition to the previous version (the unmutated one). This effectively adds the same unit to the RepositoryVersion twice.

What would be great is if the older one would be removed by DeclarativeVersion as part of the pipeline in some kind of configurable way.

h2. Solution

Make a new stage called RemoveDuplicates that takes two parameters 'type' and 'field_list' (or tuple). 'field_name'. 'type' is the content unit type that the stage should inspect. 'field_list' 'field_name' is the list name of the field names that needs to be unique within the RepositoryVersion. For example for RPM it will configure this stage with type=pulp_rpm.UpdateRecord and field_list=['id']. A Docker example would use the stage twice, first: `type=pulp_docker.Tag`, `field_list=['name', 'manifest']`; second: `type=pulp_docker.Tag`, field_list=`['name', 'manifest_list']` field_name='id'.

This new stage will unassociate any units that are of type=type with the same field names name as one of the units emitted in the DeclarativeContent stream. It will be a batching stage, handling batches of units at a time. (Note, batches might perform poorly here, since multiple types may be flowing through the stream.)

The stage can be used directly by plugin writers. This funcitonality will also be added as an option to DeclarativeVersion called <code>remove_duplicates</code> which will take the following form:

'type': 'pulp_rpm.UpdateRecord',
'field_names': ['id'] 'field_name': 'id'

Notice how the stage takes only 1 duplicate type, but the DeclarativeVersion takes a list of them. The DeclarativeVersion will create one RemoveDuplicates stage for each item in the list, making the pipeline a variable length depending on the data passed into <code>DeclarativeVersion</code>.

These extra stages should be run before the AssociateContent stage.