Project

Profile

Help

Issue #4316

closed

Content with same natural may be shared when not completely identical.

Added by jortel@redhat.com over 5 years ago. Updated over 3 years ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

The Problem

During content creation stages, content is de-duplicated by comparing the natural key of the DeclarativeContent.content and of Content found in the DB. Although the matched content has the same natural key, there is no guarantee that the full content definition is the same. There could be differences in attributes and/or (number and/or rel-path of) artifacts. Although this is unlikely, it could happen. The concern is that content which may be created by multiple sources is (silently) shared without verification that it is 100% identical.

Example:

Content (name=apache, version=1.0)
   |__(one.json)__ Artifact (digest=A)
   |__(two.json)__ Artifact (digest=B)

Content (name=apache, version=1.0)
   |__(one.json)__ Artifact (digest=A)
   |__(two.json)__ Artifact (digest=B)
   |__(three.json)__ Artifact (digest=C)

Content (name=apache, version=1.0)
   |__(files/one.json)__ Artifact (digest=A)
   |__(files/two.json)__ Artifact (digest=B)

Detection

This is the tough part. The primary goal is to detect occurrences and alert users.

Perhaps the Content could provide a comparison method that is used by the stage. The base implementation could compare the number of artifacts and their rel-paths. Plugins writers would override in concrete content types to perform deeper comparison as needed.

This comparison will come with some cost.

Remedies

Currently, the user would need to remove the offending content from all repositories and delete it as part of orphan clean up. Other ideas?

Actions #1

Updated by mdellweg over 5 years ago

I know, this would radically change the data model, but we could stop reusing content across repositories at all. Then you would only assume the same combination of say [name, version, architecture] would need to be consistent per repository. Artifacts and data in storage would still be reused, of course.

I thought, this was (at least very similar) discussed elsewhere, but i cannot find the ticket.

Actions #2

Updated by CodeHeeler over 5 years ago

  • Triaged changed from No to Yes
Actions #3

Updated by bmbouter almost 5 years ago

  • Tags deleted (Pulp 3)
Actions #4

Updated by daviddavis over 3 years ago

  • Status changed from NEW to CLOSED - WONTFIX

Also available in: Atom PDF