Project

Profile

Help

Issue #8329

closed

pulp2content can have duplicate entries under certain re-migration conditions

Added by ttereshc about 3 years ago. Updated about 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 92
Quarter:

Description

Pulp2Content model has a FK to Pulp2Repository as a part of the uniqueness constraint on the model.
The FK can be null, because it is set only for content types which depend on a repo and can be associated only with one of them. Since each null is unique, it creates duplicated entries, if something changed about this content in pulp2.

To reproduce

  • migrate pulp2 rpm on_demand content
  • download that content in pulp2
  • run migration again

Solution

  • Remove FK from the uniqueness constraint
  • FOR ERRATA ONLY: set pulp2_subuid to a repo_id value where an FK to a repo has been set
  • this will maintain the necessary uniqueness for errata
  • subuid is currently only used by deb-migration
  • data migration is needed
  • find the dups; keep the one with the most-recent pulp2_last_updated, delete the others
  • for all pulp2_content_type_id=errata in pulp_2to3_migration_pulp2content, set pulp2_subid = pulp2_repo_id
Actions #1

Updated by ttereshc about 3 years ago

  • Subject changed from pulp2content can have duplicate entries under under certain re-migration conditions to pulp2content can have duplicate entries under certain re-migration conditions
  • Description updated (diff)
Actions #2

Updated by ggainey about 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ggainey
  • Sprint set to Sprint 91
Actions #3

Updated by ggainey about 3 years ago

Reproducer script. Assumes pulp-cli and jq installed/available. Migrates ALL Pulp2-rpm-repos - if you have a lot, it will take...a while. Spits out dups at end.

#!/usr/bin/bash                                                                              
REPO_NAME=rpm-signed                                                                         
pulp-admin login -u admin -p admin                                                           
#                                                                                            
pulp-admin rpm repo delete --repo-id ${REPO_NAME}                                            
pulp-admin orphan remove --all                                                               
#                                                                                            
pulp-admin rpm repo create --repo-id ${REPO_NAME} --relative-url ${REPO_NAME} --feed https://fixtures.pulpproject.org/rpm-signed/ --download-policy on_demand
pulp-admin rpm repo sync run --repo-id ${REPO_NAME}                                          
#                                                                                            
PLAN_HREF=$(pulp migration plan create --plan '{"plugins": [{"type": "rpm"}]}' | jq -r '.pulp_href')
echo "PLAN_HREF = ${PLAN_HREF}"                                                              
pulp migration plan run --href ${PLAN_HREF}                                                  
#                                                                                            
pulp-admin rpm repo update --repo-id ${REPO_NAME} --download-policy immediate                
pulp-admin rpm repo sync run --repo-id ${REPO_NAME}                                          
#                                                                                            
pulp migration plan run --href ${PLAN_HREF}                                                  
PGPASSWORD=pulp psql -U pulp -d pulp --host 127.0.0.1 \                                      
    -c 'select pulp2_id, pulp2_content_type_id, pulp2_repo_id, pulp2_subid, count(*) from pulp_2to3_migration_pulp2content group by pulp2_id, pulp2_content_type_id, pulp2_repo_id, pulp2_subid having count(*) > 1;'
Actions #4

Updated by ipanova@redhat.com about 3 years ago

  • Sprint/Milestone set to 0.9.0
Actions #5

Updated by ggainey about 3 years ago

The migration to fix this has to accomplish the following:

  • Find all the duplicates
  • for each duplicate:
    • pick one (most-recent pulp_created?)
    • fill in pulp2_subid with the pulp2_repo_id value
    • point everything (see tables below) pointing to any of the other duplicates to 'this' entry
    • delete the 'other' duplicates

Tables that have FK-references to pulp_2to3_migration_pulp2content.pulp_id :

  • pulp_2to3_migration_pulp2erratum
  • pulp_2to3_migration_pulp2srpm
  • pulp_2to3_migration_pulp2packagegroup
  • pulp_2to3_migration_pulp2modulemd
  • pulp_2to3_migration_pulp2debreleasearchitecture
  • pulp_2to3_migration_pulp2rpm
  • pulp_2to3_migration_pulp2debcomponent
  • pulp_2to3_migration_pulp2manifest
  • pulp_2to3_migration_pulp2modulemddefaults
  • pulp_2to3_migration_pulp2manifestlist
  • pulp_2to3_migration_pulp2debpackage
  • pulp_2to3_migration_pulp2packagelangpacks
  • pulp_2to3_migration_pulp2blob
  • pulp_2to3_migration_pulp2debcomponentpackage
  • pulp_2to3_migration_pulp2yumrepometadatafile
  • pulp_2to3_migration_pulp2packageenvironment
  • pulp_2to3_migration_pulp2packagecategory
  • pulp_2to3_migration_pulp2distribution
  • pulp_2to3_migration_pulp2tag
  • pulp_2to3_migration_pulp2iso
  • pulp_2to3_migration_pulp2debrelease
Actions #8

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 91 to Sprint 92
Actions #9

Updated by ggainey about 3 years ago

  • Description updated (diff)

Notes from collab w/ttereshc:

  • Deleting from Pulp2Content using Django "should" take care of the related detail-objects.
  • Only errata need to have subid filled in to maintain uniqueness corectly.
  • Choose newest pulp2_last_updated to keep
  • Setting up the uniqueness-constraint in a way that works around Postgres' treatment of NULL would require postgres-specific-code, which will cause us maintenance woes in the future - so, no.
Actions #10

Updated by ggainey about 3 years ago

PR https://github.com/pulp/pulp-2to3-migration/pull/331 fixes the unique-constraint issue. However, with the uniqueness actually being enforced, the reproducer above now fails trying to do the second migration, with the following stacktrace:

Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pulp [f38e75f950d6404da3f078824f53aa31]: rq.worker:ERROR: Traceback (most recent call last):
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: psycopg2.errors.ForeignKeyViolation: insert or update on table "pulp_2to3_migration_pulp2rpm" violates foreign key constraint "pulp_2to3_migration__pulp2content_id_508869c4_fk_pulp_2to3"
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: DETAIL:  Key (pulp2content_id)=(ec5e7c59-e1fb-497a-9b53-ac27c1fdc973) is not present in table "pulp_2to3_migration_pulp2content".
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: The above exception was the direct cause of the following exception:
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: Traceback (most recent call last):
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py", line 975, in perform_job
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: rv = job.perform()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 696, in perform
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: self._result = self._execute()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 719, in _execute
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.func(*self.args, **self.kwargs)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/tasks/migrate.py", line 77, in migrate_from_pulp2
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pre_migrate_all_content(plan)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/pre_migration.py", line 68, in pre_migrate_all_content
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pre_migrate_content_type(content_model, mutable_type, lazy_type, premigrate_hook)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/pre_migration.py", line 218, in pre_migrate_content_type
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: content_model.pulp_2to3_detail.pre_migrate_content_detail(pulp2content_batch)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/rpm/pulp_2to3_models.py", line 144, in pre_migrate_content_detail
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: batch_size=DEFAULT_BATCH_SIZE)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return getattr(self.get_queryset(), name)(*args, **kwargs)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 480, in bulk_create
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: obj_without_pk._state.db = self.db
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/transaction.py", line 240, in __exit__
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 262, in commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: self._commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: raise dj_exc_value.with_traceback(traceback) from exc_value
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: django.db.utils.IntegrityError: insert or update on table "pulp_2to3_migration_pulp2rpm" violates foreign key constraint "pulp_2to3_migration__pulp2content_id_508869c4_fk_pulp_2to3"
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: DETAIL:  Key (pulp2content_id)=(ec5e7c59-e1fb-497a-9b53-ac27c1fdc973) is not present in table "pulp_2to3_migration_pulp2content".
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: Traceback (most recent call last):
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: psycopg2.errors.ForeignKeyViolation: insert or update on table "pulp_2to3_migration_pulp2rpm" violates foreign key constraint "pulp_2to3_migration__pulp2content_id_508869c4_fk_pulp_2to3"
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: DETAIL:  Key (pulp2content_id)=(ec5e7c59-e1fb-497a-9b53-ac27c1fdc973) is not present in table "pulp_2to3_migration_pulp2content".
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: The above exception was the direct cause of the following exception:
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: Traceback (most recent call last):
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py", line 975, in perform_job
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: rv = job.perform()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 696, in perform
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: self._result = self._execute()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 719, in _execute
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.func(*self.args, **self.kwargs)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/tasks/migrate.py", line 77, in migrate_from_pulp2
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pre_migrate_all_content(plan)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/pre_migration.py", line 68, in pre_migrate_all_content
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pre_migrate_content_type(content_model, mutable_type, lazy_type, premigrate_hook)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/pre_migration.py", line 218, in pre_migrate_content_type
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: content_model.pulp_2to3_detail.pre_migrate_content_detail(pulp2content_batch)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/rpm/pulp_2to3_models.py", line 144, in pre_migrate_content_detail
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: batch_size=DEFAULT_BATCH_SIZE)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return getattr(self.get_queryset(), name)(*args, **kwargs)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 480, in bulk_create
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: obj_without_pk._state.db = self.db
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/transaction.py", line 240, in __exit__
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 262, in commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: self._commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: raise dj_exc_value.with_traceback(traceback) from exc_value
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: django.db.utils.IntegrityError: insert or update on table "pulp_2to3_migration_pulp2rpm" violates foreign key constraint "pulp_2to3_migration__pulp2content_id_508869c4_fk_pulp_2to3"
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: DETAIL:  Key (pulp2content_id)=(ec5e7c59-e1fb-497a-9b53-ac27c1fdc973) is not present in table "pulp_2to3_migration_pulp2content".

Making the constraint strict/correct is pointing out a problem elsewhere in the migration code. Investigation continues.

Added by ggainey about 3 years ago

Revision 86f0f99d | View on GitHub

Fixes a Pulp2Content unique-constraint that wasn't very unique.

In Postgres, all NULLs are unique - so if a nullable field is included in a UiniqueConstraint, it allows rows that differ only by the NULL.

Build two constraints, one when the 'optional' field is NULL, and one where it has content.

fixes #8329 [nocoverage]

Added by ggainey about 3 years ago

Revision 86f0f99d | View on GitHub

Fixes a Pulp2Content unique-constraint that wasn't very unique.

In Postgres, all NULLs are unique - so if a nullable field is included in a UiniqueConstraint, it allows rows that differ only by the NULL.

Build two constraints, one when the 'optional' field is NULL, and one where it has content.

fixes #8329 [nocoverage]

Added by ggainey about 3 years ago

Revision 86f0f99d | View on GitHub

Fixes a Pulp2Content unique-constraint that wasn't very unique.

In Postgres, all NULLs are unique - so if a nullable field is included in a UiniqueConstraint, it allows rows that differ only by the NULL.

Build two constraints, one when the 'optional' field is NULL, and one where it has content.

fixes #8329 [nocoverage]

Actions #11

Updated by ggainey about 3 years ago

  • Status changed from ASSIGNED to MODIFIED
Actions #12

Updated by pulpbot about 3 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF