Issue #6463
closedpulp 3.2.1 duplicate key error when sync
Added by binlinf0 over 4 years ago. Updated about 4 years ago.
Description
Noticed we have a few errors when sync repos
"error": {
"description": "duplicate key value violates unique constraint \"core_repositoryversion_repository_id_number_3c54ce50_uniq\"\nDETAIL: Key (repository_id, number)=(59eb02b1-edab-46e3-a69b-d69a8b314f20, 2) already exists.\n",
Please investigate what could cause this.
Related issues
Updated by bmbouter over 4 years ago
Originally I thought I had seen this error before from another dev, but now I realize it' not the same and this is the first report of it. So to help resolve it we need some more info, but unfortunately I'm not sure exactly what to ask for.
- I believe the error is saying that repository already has a version "2" in it. Is that your read also?
- Can you show us the repsitory versions output for that repository?
- Do have any insight into the various operations that were running when this occurred?
- When's the first time you observed it?
- How many times have you observed it and how frequently?
- How many resource managers are you running?
Updated by binlinf0 over 4 years ago
Task status
# ./get /pulp/api/v3/tasks/d5149ed8-225b-4fbe-831f-2bb4d190d8f2/ HTTP/1.1 200 OK Allow: GET, PATCH, DELETE, HEAD, OPTIONS Connection: keep-alive Content-Length: 3826 Content-Type: application/json Date: Tue, 21 Apr 2020 19:29:30 GMT Server: nginx/1.16.1 Vary: Accept, Cookie X-Frame-Options: SAMEORIGIN { "created_resources": [], "error": { "description": "duplicate key value violates unique constraint \"core_repositoryversion_repository_id_number_3c54ce50_uniq\"\nDETAIL: Key (repository_id, number)=(ec123c49-0900-4eb6-a635-e156d9f1cf67, 2) already exists.\n", "traceback": " File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/rq/worker.py\", line 884, in perform_job\n rv = job.perform()\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/rq/job.py\", line 664, in perform\n self._result = self._execute()\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/rq/job.py\", line 670, in _execute\n return self.func(*self.args, **self.kwargs)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/pulp_rpm/app/tasks/synchronizing.py\", line 152, in synchronize\n dv.create()\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/pulpcore/plugin/stages/declarative_version.py\", line 141, in create\n with self.repository.new_version() as new_version:\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/pulp_rpm/app/models/repository.py\", line 75, in new_version\n version.save()\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/models/base.py\", line 741, in save\n force_update=force_update, update_fields=update_fields)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/models/base.py\", line 779, in save_base\n force_update, using, update_fields,\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/models/base.py\", line 870, in _save_table\n result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/models/base.py\", line 908, in _do_insert\n using=using, raw=raw)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/models/manager.py\", line 82, in manager_method\n return getattr(self.get_queryset(), name)(*args, **kwargs)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/models/query.py\", line 1186, in _insert\n return query.get_compiler(using=using).execute_sql(return_id)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/models/sql/compiler.py\", line 1375, in execute_sql\n cursor.execute(sql, params)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/backends/utils.py\", line 67, in execute\n return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/backends/utils.py\", line 76, in _execute_with_wrappers\n return executor(sql, params, many, context)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/backends/utils.py\", line 84, in _execute\n return self.cursor.execute(sql, params)\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/utils.py\", line 89, in __exit__\n raise dj_exc_value.with_traceback(traceback) from exc_value\n File \"/opt/utils/venv/pulp/3.7.3/lib64/python3.7/site-packages/django/db/backends/utils.py\", line 84, in _execute\n return self.cursor.execute(sql, params)\n" }, "finished_at": "2020-04-21T19:29:13.283924Z", "name": "pulp_rpm.app.tasks.synchronizing.synchronize", "progress_reports": [], "pulp_created": "2020-04-21T19:29:13.117971Z", "pulp_href": "/pulp/api/v3/tasks/d5149ed8-225b-4fbe-831f-2bb4d190d8f2/", "reserved_resources_record": [ "/pulp/api/v3/repositories/rpm/rpm/ec123c49-0900-4eb6-a635-e156d9f1cf67/", "/pulp/api/v3/remotes/rpm/rpm/e18c1386-3f3a-4edf-93ea-73b314da475c/" ], "started_at": "2020-04-21T19:29:13.231035Z", "state": "failed", "worker": "/pulp/api/v3/workers/fdd85b92-d77c-446f-af25-552032266b12/" } repo versions # ./get /pulp/api/v3/repositories/rpm/rpm/ec123c49-0900-4eb6-a635-e156d9f1cf67/versions/ HTTP/1.1 200 OK Allow: GET, HEAD, OPTIONS Connection: keep-alive Content-Length: 867 Content-Type: application/json Date: Tue, 21 Apr 2020 19:30:40 GMT Server: nginx/1.16.1 Vary: Accept, Cookie X-Frame-Options: SAMEORIGIN { "count": 2, "next": null, "previous": null, "results": [ { "base_version": null, "content_summary": { "added": { "rpm.package": { "count": 928, "href": "/pulp/api/v3/content/rpm/packages/?repository_version_added=/pulp/api/v3/repositories/rpm/rpm/ec123c49-0900-4eb6-a635-e156d9f1cf67/versions/1/" } }, "present": { "rpm.package": { "count": 928, "href": "/pulp/api/v3/content/rpm/packages/?repository_version=/pulp/api/v3/repositories/rpm/rpm/ec123c49-0900-4eb6-a635-e156d9f1cf67/versions/1/" } }, "removed": {} }, "number": 1, "pulp_created": "2020-02-06T04:13:45.867491Z", "pulp_href": "/pulp/api/v3/repositories/rpm/rpm/ec123c49-0900-4eb6-a635-e156d9f1cf67/versions/1/" }, { "base_version": null, "content_summary": { "added": {}, "present": {}, "removed": {} }, "number": 0, "pulp_created": "2020-02-04T21:09:59.856262Z", "pulp_href": "/pulp/api/v3/repositories/rpm/rpm/ec123c49-0900-4eb6-a635-e156d9f1cf67/versions/0/" } ] }
This happens when we kick off syncing of all 270 repos. It happens every time we run the sync. We have one resource manager with about 10 workers. Some repos are pointing to upstream repos which have the same content.
Updated by binlinf0 over 4 years ago
I am removing and recreating most of repos to see if it will happens again. I will will leave one repo with failed sync tasks for troubleshooting purpose.
Updated by dkliban@redhat.com over 4 years ago
- Status changed from NEW to CLOSED - NOTABUG
I suspect that the initial repositories were created with an earlier version of Pulp that had a bug related to this. Though I have no been able to find a bug in our issue tracker to point at for sure.
Without specific reproduction steps, we are unable to keep this bug open. Feel free to re-open if you figure out how to reproduce the issue.
Updated by dkliban@redhat.com over 4 years ago
- Status changed from CLOSED - NOTABUG to NEW
I was able to reproduce this bug by cancelling a sync of a kickstart repo. My database ended up with a repository version that has complete=False. As a result the next() method is not giving the correct next repository version number on subsequent syncs[0].
[0] https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L630
Updated by dkliban@redhat.com over 4 years ago
The RepositoryVersion context manager needs to cleanup any incomplete versions in the __enter__
method. However, that is probably not safe to do if running outside of a task.
https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L758
Updated by bmbouter over 4 years ago
dkliban@redhat.com wrote:
The RepositoryVersion context manager needs to cleanup any incomplete versions in the
__enter__
method. However, that is probably not safe to do if running outside of a task.https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L758
Maybe __enter__
should do that, but the design originally had this cleanup occuring in another place and I believe that is currently broken. Here's the original design to handle this case (RQ OOM, or a power issue abruptly halting a worker causing it's __exit__
not to be called).
- Worker OOMs while working, leaving the db with it's RepositoryVersion having complete=False
- Worker restarts thanks to systemd, but it receives a new PID and therefore a new worker name
- Each Pulp worker checks every few seconds for "missing workers". Those are workers who have stopped heartbeating. That check occurs here which calls check_worker_heartbeat.
- A worker that is shown as offline triggers mark_worker_offline which should provide all necessary cleanup.
I think the issue is mark_worker_offline
is not checking for created_resources with complete=False in the tasks its canceling. For this to occur without race conditions we also have to be sure the creation of a resource becomes associated with a task as a created resource in one database transaction.
Updated by mdellweg over 4 years ago
I would not check for complete==False
in mark_worker_offline
, as this is very special to repository versions.
Transferring the cleanup duty to the task (the object persisted in the database) by calling some cleanup
provided for certain task methods might be better (more scalable).
But as discussed the matter with repository versions is more complicated due to versions also being created in synchronous calls (by some plugins). With that constraint in mind, __enter__
is the first common location for both code paths.
Updated by fao89 over 4 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 72
Updated by dalley about 4 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
Updated by dalley about 4 years ago
Here's my reproducer script for future reference:
http POST $BASE_ADDR/pulp/api/v3/remotes/rpm/rpm/ name=foo url=http://mirror.linux.duke.edu/pub/centos/8/BaseOS/x86_64/kickstart/
export REMOTE_HREF=$(http $BASE_ADDR/pulp/api/v3/remotes/rpm/rpm/ | jq -r '.results[0] | .pulp_href')
http POST $BASE_ADDR/pulp/api/v3/repositories/rpm/rpm/ name=foo
export REPO_HREF=$(http $BASE_ADDR/pulp/api/v3/repositories/rpm/rpm/ | jq -r '.results[0] | .pulp_href')
http POST :${REPO_HREF}sync/ remote=$REMOTE_HREF
export TASK_HREF=$(http $BASE_ADDR/pulp/api/v3/tasks/ | jq -r '.results[0] | .pulp_href')
sleep 10
http PATCH :$TASK_HREF state=canceled
sleep 2
http POST :${REPO_HREF}sync/ remote=$REMOTE_HREF
Updated by pulpbot about 4 years ago
- Status changed from ASSIGNED to POST
Updated by pulpbot about 4 years ago
Updated by dalley about 4 years ago
- Has duplicate Issue #7220: When a task crashes, the incomplete repo version is not cleaned up and leads to duplicate key error when creating new repo versions added
Added by dalley about 4 years ago
Updated by dalley about 4 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|1851b70ec76d39ac8a05fa2dbbd96d0fc157253a.
Updated by pulpbot about 4 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Updated by ttereshc about 4 years ago
- Related to Backport #7737: Backport request: 6463: duplicate key error when sync to pulpcore 3.6/pulp-rpm added
Updated by daviddavis about 4 years ago
- Related to Backport #7844: Backport version cleanup fix to 3.6 added
Fix duplicate key error after incomplete sync task
closes: #6463 https://pulp.plan.io/issues/6463