Project

Profile

Help

Story #3934

closed

RPM Support - Story #3202: As a user, I can sync RPM/SRPM/Erratum from a remote Yum/DNF repository

As a plugin writer, I can have a stage that removes duplicates

Added by daviddavis over 6 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Sprint:
Sprint 46
Quarter:

Description

Problem

Both rpm and docker have a situation where content units can mutate. The DeclarativeVersion will treat a mutated content unit as a new unit and add it to a RepositoryVersion with sync. This is in addition to the previous version (the unmutated one). This effectively adds the same unit to the RepositoryVersion twice.

What would be great is if the older one would be removed by DeclarativeVersion as part of the pipeline in some kind of configurable way.

Solution

Make a new stage called RemoveDuplicates that takes two parameters 'type' and 'field_list' (or tuple). 'type' is the content unit type that the stage should inspect. 'field_list' is the list of field names that needs to be unique within the RepositoryVersion. For example for RPM it will configure this stage with type=pulp_rpm.UpdateRecord and field_list=['id']. A Docker example would use the stage twice, first: `type=pulp_docker.Tag`, `field_list=['name', 'manifest']`; second: `type=pulp_docker.Tag`, field_list=`['name', 'manifest_list']`

This new stage will unassociate any units that are of type=type with the same field names as one of the units emitted in the DeclarativeContent stream. It will be a batching stage, handling batches of units at a time. (Note, batches might perform poorly here, since multiple types may be flowing through the stream.)

The stage can be used directly by plugin writers. This funcitonality will also be added as an option to DeclarativeVersion called remove_duplicates which will take the following form:

[{
    'type': 'pulp_rpm.UpdateRecord',
    'field_names': ['id']
}]

Notice how the stage takes only 1 duplicate type, but the DeclarativeVersion takes a list of them. The DeclarativeVersion will create one RemoveDuplicates stage for each item in the list, making the pipeline a variable length depending on the data passed into DeclarativeVersion.

These extra stages should be run before the AssociateContent stage.


Related issues

Blocks RPM Support - Task #3954: Prevent duplicate Package content in reposCLOSED - CURRENTRELEASEttereshc

Actions
Blocks File Support - Task #4028: Prevent duplicate files in repositoriesCLOSED - CURRENTRELEASEbmbouter

Actions
Blocks Container Support - Story #4172: Remove duplicate tags from repository during syncCLOSED - CURRENTRELEASEamacdona@redhat.com

Actions
Actions #1

Updated by daviddavis over 6 years ago

  • Description updated (diff)
Actions #2

Updated by daviddavis over 6 years ago

  • Description updated (diff)
Actions #3

Updated by daviddavis over 6 years ago

  • Subject changed from Remove duplicate UpdateRecords after performing sync to Remove duplicate UpdateRecords for repos after performing sync
Actions #4

Updated by daviddavis over 6 years ago

  • Description updated (diff)
Actions #5

Updated by bmbouter over 6 years ago

  • Tracker changed from Task to Story
  • Project changed from RPM Support to Pulp
  • Subject changed from Remove duplicate UpdateRecords for repos after performing sync to As a plugin writer, I can have a stage that removes duplicates
  • Description updated (diff)
  • Sprint/Milestone deleted (Pulp 3 RPM MVP)

Rewriting to be a generalized core stage.

Actions #6

Updated by bmbouter over 6 years ago

In order to work on this, it would be best if we could have a pulp-smash test committed that causes a mutated erratum associated to a repo version in addition to the original, unmutated erratum.

Actions #7

Updated by daviddavis over 6 years ago

bmbouter, agreed. Is there a pulp 2 smash test for this scenario that we could re-use?

Actions #8

Updated by daviddavis over 6 years ago

  • Blocks Task #3954: Prevent duplicate Package content in repos added
Actions #9

Updated by daviddavis over 6 years ago

  • Blocks Task #4028: Prevent duplicate files in repositories added
Actions #10

Updated by daviddavis about 6 years ago

Two comments on this:

  • I think that this needs to accept a list of fields instead of a single field. Consider the case of duplicate rpms which are unique by nevra (5 fields) or docker tags (3 fields: name, manifest__pk, manifest_list__pk) as a docker repo could have two tags with the same name (one for a manifest and one for a manifest_list).
  • Also, I wonder if this field_list should be defined on the content class kind of like how we define the natural key uniqueness on Content now.
Actions #11

Updated by amacdona@redhat.com about 6 years ago

  • Related to Story #4172: Remove duplicate tags from repository during sync added
Actions #12

Updated by amacdona@redhat.com about 6 years ago

  • Description updated (diff)
  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #13

Updated by jortel@redhat.com about 6 years ago

  • Sprint set to Sprint 46
Actions #14

Updated by amacdona@redhat.com about 6 years ago

  • Related to deleted (Story #4172: Remove duplicate tags from repository during sync)
Actions #15

Updated by amacdona@redhat.com about 6 years ago

  • Blocks Story #4172: Remove duplicate tags from repository during sync added
Actions #16

Updated by amacdona@redhat.com about 6 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to amacdona@redhat.com
Actions #17

Updated by amacdona@redhat.com about 6 years ago

  • Tags Pulp 3 RC Blocker added
Actions #18

Updated by amacdona@redhat.com about 6 years ago

  • Status changed from ASSIGNED to POST
Actions #19

Updated by amacdona@redhat.com about 6 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

Applied in changeset commit:pulpcore-plugin|c320a0d1bce8cd68ff5cc56d1f6fb023ff72ad64.

Actions #20

Updated by daviddavis over 5 years ago

  • Sprint/Milestone set to 3.0.0
Actions #21

Updated by bmbouter over 5 years ago

  • Tags deleted (Pulp 3, Pulp 3 RC Blocker)
Actions #22

Updated by bmbouter about 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF