Issue #8329
closedpulp2content can have duplicate entries under certain re-migration conditions
Description
Pulp2Content model has a FK to Pulp2Repository as a part of the uniqueness constraint on the model.
The FK can be null, because it is set only for content types which depend on a repo and can be associated only with one of them.
Since each null is unique, it creates duplicated entries, if something changed about this content in pulp2.
To reproduce¶
- migrate pulp2 rpm on_demand content
- download that content in pulp2
- run migration again
Solution¶
- Remove FK from the uniqueness constraint
- FOR ERRATA ONLY: set pulp2_subuid to a repo_id value where an FK to a repo has been set
- this will maintain the necessary uniqueness for errata
- subuid is currently only used by deb-migration
- data migration is needed
- find the dups; keep the one with the most-recent pulp2_last_updated, delete the others
- for all pulp2_content_type_id=errata in pulp_2to3_migration_pulp2content, set pulp2_subid = pulp2_repo_id
Updated by ttereshc over 3 years ago
- Subject changed from pulp2content can have duplicate entries under under certain re-migration conditions to pulp2content can have duplicate entries under certain re-migration conditions
- Description updated (diff)
Updated by ggainey over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ggainey
- Sprint set to Sprint 91
Updated by ggainey over 3 years ago
Reproducer script. Assumes pulp-cli and jq installed/available. Migrates ALL Pulp2-rpm-repos - if you have a lot, it will take...a while. Spits out dups at end.
#!/usr/bin/bash
REPO_NAME=rpm-signed
pulp-admin login -u admin -p admin
#
pulp-admin rpm repo delete --repo-id ${REPO_NAME}
pulp-admin orphan remove --all
#
pulp-admin rpm repo create --repo-id ${REPO_NAME} --relative-url ${REPO_NAME} --feed https://fixtures.pulpproject.org/rpm-signed/ --download-policy on_demand
pulp-admin rpm repo sync run --repo-id ${REPO_NAME}
#
PLAN_HREF=$(pulp migration plan create --plan '{"plugins": [{"type": "rpm"}]}' | jq -r '.pulp_href')
echo "PLAN_HREF = ${PLAN_HREF}"
pulp migration plan run --href ${PLAN_HREF}
#
pulp-admin rpm repo update --repo-id ${REPO_NAME} --download-policy immediate
pulp-admin rpm repo sync run --repo-id ${REPO_NAME}
#
pulp migration plan run --href ${PLAN_HREF}
PGPASSWORD=pulp psql -U pulp -d pulp --host 127.0.0.1 \
-c 'select pulp2_id, pulp2_content_type_id, pulp2_repo_id, pulp2_subid, count(*) from pulp_2to3_migration_pulp2content group by pulp2_id, pulp2_content_type_id, pulp2_repo_id, pulp2_subid having count(*) > 1;'
Updated by ggainey over 3 years ago
The migration to fix this has to accomplish the following:
- Find all the duplicates
- for each duplicate:
- pick one (most-recent pulp_created?)
- fill in pulp2_subid with the pulp2_repo_id value
- point everything (see tables below) pointing to any of the other duplicates to 'this' entry
- delete the 'other' duplicates
Tables that have FK-references to pulp_2to3_migration_pulp2content.pulp_id :
- pulp_2to3_migration_pulp2erratum
- pulp_2to3_migration_pulp2srpm
- pulp_2to3_migration_pulp2packagegroup
- pulp_2to3_migration_pulp2modulemd
- pulp_2to3_migration_pulp2debreleasearchitecture
- pulp_2to3_migration_pulp2rpm
- pulp_2to3_migration_pulp2debcomponent
- pulp_2to3_migration_pulp2manifest
- pulp_2to3_migration_pulp2modulemddefaults
- pulp_2to3_migration_pulp2manifestlist
- pulp_2to3_migration_pulp2debpackage
- pulp_2to3_migration_pulp2packagelangpacks
- pulp_2to3_migration_pulp2blob
- pulp_2to3_migration_pulp2debcomponentpackage
- pulp_2to3_migration_pulp2yumrepometadatafile
- pulp_2to3_migration_pulp2packageenvironment
- pulp_2to3_migration_pulp2packagecategory
- pulp_2to3_migration_pulp2distribution
- pulp_2to3_migration_pulp2tag
- pulp_2to3_migration_pulp2iso
- pulp_2to3_migration_pulp2debrelease
Updated by ggainey over 3 years ago
Promising paths to follow:
Updated by ggainey over 3 years ago
Accomplishing this with loops in plpgsql:
Example for how to do indices in postgres where one of the columns could be null:
Updated by ggainey over 3 years ago
- Description updated (diff)
Notes from collab w/ttereshc:
- Deleting from Pulp2Content using Django "should" take care of the related detail-objects.
- Only errata need to have subid filled in to maintain uniqueness corectly.
- Choose newest pulp2_last_updated to keep
- Setting up the uniqueness-constraint in a way that works around Postgres' treatment of NULL would require postgres-specific-code, which will cause us maintenance woes in the future - so, no.
Updated by ggainey over 3 years ago
PR https://github.com/pulp/pulp-2to3-migration/pull/331 fixes the unique-constraint issue. However, with the uniqueness actually being enforced, the reproducer above now fails trying to do the second migration, with the following stacktrace:
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pulp [f38e75f950d6404da3f078824f53aa31]: rq.worker:ERROR: Traceback (most recent call last):
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: psycopg2.errors.ForeignKeyViolation: insert or update on table "pulp_2to3_migration_pulp2rpm" violates foreign key constraint "pulp_2to3_migration__pulp2content_id_508869c4_fk_pulp_2to3"
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: DETAIL: Key (pulp2content_id)=(ec5e7c59-e1fb-497a-9b53-ac27c1fdc973) is not present in table "pulp_2to3_migration_pulp2content".
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: The above exception was the direct cause of the following exception:
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: Traceback (most recent call last):
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py", line 975, in perform_job
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: rv = job.perform()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 696, in perform
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: self._result = self._execute()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 719, in _execute
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.func(*self.args, **self.kwargs)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/tasks/migrate.py", line 77, in migrate_from_pulp2
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pre_migrate_all_content(plan)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/pre_migration.py", line 68, in pre_migrate_all_content
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pre_migrate_content_type(content_model, mutable_type, lazy_type, premigrate_hook)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/pre_migration.py", line 218, in pre_migrate_content_type
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: content_model.pulp_2to3_detail.pre_migrate_content_detail(pulp2content_batch)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/rpm/pulp_2to3_models.py", line 144, in pre_migrate_content_detail
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: batch_size=DEFAULT_BATCH_SIZE)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return getattr(self.get_queryset(), name)(*args, **kwargs)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 480, in bulk_create
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: obj_without_pk._state.db = self.db
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/transaction.py", line 240, in __exit__
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 262, in commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: self._commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: raise dj_exc_value.with_traceback(traceback) from exc_value
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: django.db.utils.IntegrityError: insert or update on table "pulp_2to3_migration_pulp2rpm" violates foreign key constraint "pulp_2to3_migration__pulp2content_id_508869c4_fk_pulp_2to3"
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: DETAIL: Key (pulp2content_id)=(ec5e7c59-e1fb-497a-9b53-ac27c1fdc973) is not present in table "pulp_2to3_migration_pulp2content".
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: Traceback (most recent call last):
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: psycopg2.errors.ForeignKeyViolation: insert or update on table "pulp_2to3_migration_pulp2rpm" violates foreign key constraint "pulp_2to3_migration__pulp2content_id_508869c4_fk_pulp_2to3"
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: DETAIL: Key (pulp2content_id)=(ec5e7c59-e1fb-497a-9b53-ac27c1fdc973) is not present in table "pulp_2to3_migration_pulp2content".
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: The above exception was the direct cause of the following exception:
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: Traceback (most recent call last):
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/worker.py", line 975, in perform_job
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: rv = job.perform()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 696, in perform
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: self._result = self._execute()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/rq/job.py", line 719, in _execute
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.func(*self.args, **self.kwargs)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/tasks/migrate.py", line 77, in migrate_from_pulp2
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pre_migrate_all_content(plan)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/pre_migration.py", line 68, in pre_migrate_all_content
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: pre_migrate_content_type(content_model, mutable_type, lazy_type, premigrate_hook)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/pre_migration.py", line 218, in pre_migrate_content_type
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: content_model.pulp_2to3_detail.pre_migrate_content_detail(pulp2content_batch)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/home/vagrant/devel/pulp-2to3-migration/pulp_2to3_migration/app/plugin/rpm/pulp_2to3_models.py", line 144, in pre_migrate_content_detail
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: batch_size=DEFAULT_BATCH_SIZE)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return getattr(self.get_queryset(), name)(*args, **kwargs)
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 480, in bulk_create
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: obj_without_pk._state.db = self.db
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/transaction.py", line 240, in __exit__
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 262, in commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: self._commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: raise dj_exc_value.with_traceback(traceback) from exc_value
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: File "/usr/local/lib/pulp/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: return self.connection.commit()
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: django.db.utils.IntegrityError: insert or update on table "pulp_2to3_migration_pulp2rpm" violates foreign key constraint "pulp_2to3_migration__pulp2content_id_508869c4_fk_pulp_2to3"
Mar 09 02:33:16 pulp2-nightly-pulp3-source-centos7.padre-fedora.example.com rq[5594]: DETAIL: Key (pulp2content_id)=(ec5e7c59-e1fb-497a-9b53-ac27c1fdc973) is not present in table "pulp_2to3_migration_pulp2content".
Making the constraint strict/correct is pointing out a problem elsewhere in the migration code. Investigation continues.
Added by ggainey over 3 years ago
Added by ggainey over 3 years ago
Revision 86f0f99d | View on GitHub
Fixes a Pulp2Content unique-constraint that wasn't very unique.
In Postgres, all NULLs are unique - so if a nullable field is included in a UiniqueConstraint, it allows rows that differ only by the NULL.
Build two constraints, one when the 'optional' field is NULL, and one where it has content.
fixes #8329 [nocoverage]
Added by ggainey over 3 years ago
Revision 86f0f99d | View on GitHub
Fixes a Pulp2Content unique-constraint that wasn't very unique.
In Postgres, all NULLs are unique - so if a nullable field is included in a UiniqueConstraint, it allows rows that differ only by the NULL.
Build two constraints, one when the 'optional' field is NULL, and one where it has content.
fixes #8329 [nocoverage]
Updated by ggainey over 3 years ago
- Status changed from ASSIGNED to MODIFIED
Applied in changeset pulp:pulp-2to3-migration|86f0f99d2abd16f397dece908c03cecee92a3aa0.
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Fixes a Pulp2Content unique-constraint that wasn't very unique.
In Postgres, all NULLs are unique - so if a nullable field is included in a UiniqueConstraint, it allows rows that differ only by the NULL.
Build two constraints, one when the 'optional' field is NULL, and one where it has content.
fixes #8329 [nocoverage]