Issue #4060
closedQueryExistingArtifacts stage does not prevent duplicates within a stream
Description
Note: This problem occurs because content_2 (below) either is passed by the QueryExisting stage twice. This problem will occur in the save stage. Either way, 2 or more dupes are in the Queue for the save stage at the same time. Whether the dupes are saved in separate batches or together in 1 batch, we will fail with an integrity error.
I've encountered this while implementing Docker sync, so there isn't a simple reproducer yet. This issue occurs when a sync has multiple metadata files that refer to the same content, which has the same artifact.
metadata_1.artifacts = [content_1, content_2]
metadata_2.artifacts = [content_2, content_3]
content_2 and its artifact are able to pass through the pipeline twice resulting in a duplicate key.
django.db.utils.IntegrityError: duplicate key value violates unique constraint "pulp_app_artifact_sha256_key"
The actual error occurs in the ArtifactSaver stage during a bulk save, but AFAICT this stage needs to rely on unique units to allow bulk_save.
Full traceback:
Traceback (most recent call last):
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/rq/worker.py", line 793, in perform_job
rv = job.perform()
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/rq/job.py", line 599, in perform
self._result = self._execute()
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/rq/job.py", line 605, in _execute
return self.func(*self.args, **self.kwargs)
File "/home/vagrant/devel/pulp_docker/pulp_docker/app/tasks/synchronizing.py", line 44, in synchronize
DeclarativeVersion(first_stage, repository).create()
File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/stages/declarative_version.py", line 125, in create
loop.run_until_complete(pipeline)
File "/usr/lib64/python3.6/asyncio/base_events.py", line 468, in run_until_complete
return future.result()
File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/stages/api.py", line 128, in create_pipeline
await asyncio.gather(*futures)
File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/stages/artifact_stages.py", line 323, in __call__
Artifact.objects.bulk_create(artifacts_to_save)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 465, in bulk_create
ids = self._batched_insert(objs_without_pk, fields, batch_size)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 1149, in _batched_insert
inserted_id = self._insert(item, fields=fields, using=self.db, return_id=True)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 1136, in _insert
return query.get_compiler(using=using).execute_sql(return_id)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1289, in execute_sql
cursor.execute(sql, params)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 100, in execute
return super().execute(sql, params)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 68, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
return self.cursor.execute(sql, params)
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "pulp_app_artifact_sha256_key"
DETAIL: Key (sha256)=(8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f) already exists.
Related issues
Problem: ArtifactSaver stage doesn't handle IntegrityErrors
Solution: Use a QueryAndSaveArtifacts stage instead of ArtifactSaver
This patch introduces a new stage that first looks for existing artifact before trying to save any artifacts.
re: #4060 https://pulp.plan.io/issues/4060