Issue #6815: Not all advisory models are defined for import/export - RPM Support - Pulp

Actions

Send by e-mail Copy link

Issue #6815

closed

Pulp - Story #6134: [EPIC] Pulp import/export

Not all advisory models are defined for import/export

Added by ttereshc over 4 years ago. Updated over 4 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

ggainey

Sprint/Milestone:

3.6.0

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version:

Platform Release:

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Tags:

Sprint:

Sprint 79

Quarter:

Description

Advisory models:

class UpdateRecord(Content)

class UpdateCollection(BaseModel):
    update_record = models.ManyToManyField(UpdateRecord, related_name="collections")

class UpdateCollectionPackage(BaseModel):
   update_collection = models.ForeignKey(UpdateCollection, related_name='packages', on_delete=models.CASCADE) 

class UpdateReference(BaseModel):
    update_record = models.ForeignKey(UpdateRecord, related_name="references", on_delete=models.CASCADE)

Currently only UpdateRecord is considered.

Related issues

Actions

Copy link

Updated by ttereshc over 4 years ago

Description updated (diff)

Actions

Copy link

Updated by ggainey over 4 years ago

Parent issue set to #6134

Actions

Copy link

Updated by ggainey over 4 years ago

Triaged changed from No to Yes

Actions

Copy link

Updated by ggainey over 4 years ago

@ttereshenc - A number of these have no 'natural key' supported at the DB level via uniqueness-constraints. I can work around that, but I think it might just be an oversight.

UpdateCollection: is 'name' expected to be unique? And what about not-null - as currently defined, all fields except pulp_id/created are allowed to be null - is that right?
UpdateCollectionPackage: is 'sum,sum_type' expected to be unique? sum_type can be null - is that right? What about 'filename' or 'name'?
UpdateReference: is 'href,ref_type' expected to be unique?

Once I have answers to these, I can make the necessary changes and then make them all exportable.

Actions

Copy link

Updated by ttereshc over 4 years ago

Yes, all those auxiliary models have no uniqueness constraint and are created/removed for every advisory. digest on the UpdateRecord model reflects all content of advisory, including those models.

UpdateCollection: is 'name' expected to be unique? And what about not-null - as currently defined, all fields except pulp_id/created are allowed to be null - is that right?

In theory, it should be unique within an advisory with a specific id. It's not checked/enforced in pulp3 at the moment but we have a story filed. At a DB level they are not required to be unique. As for the null constraints, advisory can have anything/nothing, you know ;) I can check why we chose null for the name as well, probably name is always present but it definitely can be an empty string. It also depends how createrepo_c works with it. Let me know if it's critical and I can dig deeper.

UpdateCollectionPackage: is 'sum,sum_type' expected to be unique? sum_type can be null - is that right? What about 'filename' or 'name'?

sum and sum_type are often absent. filename is potentially a good candidate. name is not enough, should be all NEVRA then. FWIW, if we choose the path of introducing uniqueness constraints, I think it should be for all 3 models. It also affects sync code significantly and requires data migration which needs to remove duplicated entries and create correct relations for all affected advisories.

UpdateReference: is 'href,ref_type' expected to be unique?

Probably within the same advisory. I don't think there is any guarantee that it's this way and they can also be absent.

Actions

Copy link

Updated by ggainey over 4 years ago

Status changed from NEW to ASSIGNED
Assignee set to ggainey
Sprint set to Sprint 77

Actions

Copy link

Updated by ggainey over 4 years ago

There are two issues with the current UpdateCollection model:

name is nullable. Because all nulls are unique in postgres, this means that you could have two UpdateCollections null/update-id-1 and null/update-id-1, and the would be considered 'unique'. If a collection can truly be unnamed, we need to make "not-named' be an empty string instead of null.
update_collection is a many-to-many relationship instead of a foreign-key relationship. This looks like just an error tha needs to be fixed.

Fixing this in existing databases will require:

for all rpm_updatecollection where name is null, set name = ""
create a foreign-key column, update_record_id, in rpm_updatecollection
find all existing update-collections that are related to exactly one update-record
for all elements found in step-2:
1. fill in update_record_id-column with the update-record-id they are associated with
2. remove these entries from rpm_updatecollection_update_record
for all rpm_updatecollection rows where update-collection-id is still null:
1. if none, break
2. for each UpdateCollection, find all the UpdateRecords it is associated with
3. create a duplicate UpdateCollection (and associated UpdateCollectionPackages), assigned each to a specific UpdateRecord
4. remove records from rpm_updatecollection_update_record for the originating UpdateRecord
if rpm_updatecollection_update_record is not empty - ERROR
update rpm_updatecollection.name to NOT be nullable
set a uniqueness-constraint on rpm_updatecollection of (name, update-record-id)

Definitely a Fun migration...

A thought : if the API doesn't expose creating/linking updatecollection to updaterecords directly, then we just have to understand how advisory-upload/create and sync use this model - the data may not be as broken as the db-schema allows it to be. It would let us skip steps 5-6 above.

Actions

Copy link