Issue #6815
closedPulp - Story #6134: [EPIC] Pulp import/export
Not all advisory models are defined for import/export
Description
Advisory models:
class UpdateRecord(Content)
class UpdateCollection(BaseModel):
update_record = models.ManyToManyField(UpdateRecord, related_name="collections")
class UpdateCollectionPackage(BaseModel):
update_collection = models.ForeignKey(UpdateCollection, related_name='packages', on_delete=models.CASCADE)
class UpdateReference(BaseModel):
update_record = models.ForeignKey(UpdateRecord, related_name="references", on_delete=models.CASCADE)
Currently only UpdateRecord
is considered.
Related issues
Updated by ggainey over 4 years ago
@ttereshenc - A number of these have no 'natural key' supported at the DB level via uniqueness-constraints. I can work around that, but I think it might just be an oversight.
- UpdateCollection: is 'name' expected to be unique? And what about not-null - as currently defined, all fields except pulp_id/created are allowed to be null - is that right?
- UpdateCollectionPackage: is 'sum,sum_type' expected to be unique? sum_type can be null - is that right? What about 'filename' or 'name'?
- UpdateReference: is 'href,ref_type' expected to be unique?
Once I have answers to these, I can make the necessary changes and then make them all exportable.
Updated by ttereshc over 4 years ago
Yes, all those auxiliary models have no uniqueness constraint and are created/removed for every advisory.
digest
on the UpdateRecord model reflects all content of advisory, including those models.
- UpdateCollection: is 'name' expected to be unique? And what about not-null - as currently defined, all fields except pulp_id/created are allowed to be null - is that right?
In theory, it should be unique within an advisory with a specific id. It's not checked/enforced in pulp3 at the moment but we have a story filed.
At a DB level they are not required to be unique.
As for the null constraints, advisory can have anything/nothing, you know ;) I can check why we chose null
for the name
as well, probably name
is always present but it definitely can be an empty string. It also depends how createrepo_c works with it. Let me know if it's critical and I can dig deeper.
- UpdateCollectionPackage: is 'sum,sum_type' expected to be unique? sum_type can be null - is that right? What about 'filename' or 'name'?
sum
and sum_type
are often absent. filename
is potentially a good candidate. name
is not enough, should be all NEVRA then.
FWIW, if we choose the path of introducing uniqueness constraints, I think it should be for all 3 models. It also affects sync code significantly and requires data migration which needs to remove duplicated entries and create correct relations for all affected advisories.
- UpdateReference: is 'href,ref_type' expected to be unique?
Probably within the same advisory. I don't think there is any guarantee that it's this way and they can also be absent.
Updated by ggainey over 4 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ggainey
- Sprint set to Sprint 77
Updated by ggainey over 4 years ago
There are two issues with the current UpdateCollection model:
- name is nullable. Because all nulls are unique in postgres, this means that you could have two UpdateCollections null/update-id-1 and null/update-id-1, and the would be considered 'unique'. If a collection can truly be unnamed, we need to make "not-named' be an empty string instead of null.
- update_collection is a many-to-many relationship instead of a foreign-key relationship. This looks like just an error tha needs to be fixed.
Fixing this in existing databases will require:
- for all rpm_updatecollection where name is null, set name = ""
- create a foreign-key column, update_record_id, in rpm_updatecollection
- find all existing update-collections that are related to exactly one update-record
- for all elements found in step-2:
- fill in update_record_id-column with the update-record-id they are associated with
- remove these entries from rpm_updatecollection_update_record
- for all rpm_updatecollection rows where update-collection-id is still null:
- if none, break
- for each UpdateCollection, find all the UpdateRecords it is associated with
- create a duplicate UpdateCollection (and associated UpdateCollectionPackages), assigned each to a specific UpdateRecord
- remove records from rpm_updatecollection_update_record for the originating UpdateRecord
- if rpm_updatecollection_update_record is not empty - ERROR
- update rpm_updatecollection.name to NOT be nullable
- set a uniqueness-constraint on rpm_updatecollection of (name, update-record-id)
Definitely a Fun migration...
A thought : if the API doesn't expose creating/linking updatecollection to updaterecords directly, then we just have to understand how advisory-upload/create and sync use this model - the data may not be as broken as the db-schema allows it to be. It would let us skip steps 5-6 above.
Updated by ggainey over 4 years ago
- Related to Task #7195: Enforce correct uniqueness for UpdateCollection at the Model level added
Updated by pulpbot over 4 years ago
- Status changed from ASSIGNED to POST
Added by ggainey over 4 years ago
Updated by ggainey over 4 years ago
- Status changed from POST to MODIFIED
Applied in changeset db0a7926eba0c402271f2b45a2298447b35a6f3d.
Updated by pulpbot over 4 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Add remaining RPM-entities to import/export.
Includes UpdateReference, UpdateCollection, and UpdateCollectionPackage.
Added basic test-skeleton for import/export of pulp_rpm
Updated travis-configs to enable ALLOWED_(IMPORT|EXPORT)_PATHS to let CI run the import/export testcases successfully.
fixes #6815