Content with same natural may be shared when not completely identical.
During content creation stages, content is de-duplicated by comparing the natural key of the DeclarativeContent.content and of Content found in the DB. Although the matched content has the same natural key, there is no guarantee that the full content definition is the same. There could be differences in attributes and/or (number and/or rel-path of) artifacts. Although this is unlikely, it could happen. The concern is that content which may be created by multiple sources is (silently) shared without verification that it is 100% identical.
Content (name=apache, version=1.0) |__(one.json)__ Artifact (digest=A) |__(two.json)__ Artifact (digest=B) Content (name=apache, version=1.0) |__(one.json)__ Artifact (digest=A) |__(two.json)__ Artifact (digest=B) |__(three.json)__ Artifact (digest=C) Content (name=apache, version=1.0) |__(files/one.json)__ Artifact (digest=A) |__(files/two.json)__ Artifact (digest=B)
This is the tough part. The primary goal is to detect occurrences and alert users.
Perhaps the Content could provide a comparison method that is used by the stage. The base implementation could compare the number of artifacts and their rel-paths. Plugins writers would override in concrete content types to perform deeper comparison as needed.
This comparison will come with some cost.
Currently, the user would need to remove the offending content from all repositories and delete it as part of orphan clean up. Other ideas?
I know, this would radically change the data model, but we could stop reusing content across repositories at all. Then you would only assume the same combination of say [name, version, architecture] would need to be consistent per repository. Artifacts and data in storage would still be reused, of course.
I thought, this was (at least very similar) discussed elsewhere, but i cannot find the ticket.
Please register to edit this issue