Project

Profile

Help

Issue #9292

race condition syncing repositories with the same tags results in bad data within the database

Added by jsherril@redhat.com about 2 months ago. Updated 11 days ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Quarter:

Description

On a 4-core vm with 12 GB of ram, i synced 2 repos at the same time:

http://perf54.perf.lab.eng.bos.redhat.com:5000

test_repo11 & test_repo12

(apologies for these being internal only). After syncing, we can use the following query to identify tags shared by this repository:

select container_tag.name, container_tag.content_ptr_id from core_repositorycontent  inner join  container_tag on container_tag.content_ptr_id = core_repositorycontent.content_id inner join core_repositorycontent as core_repositorycontent2 on core_repositorycontent.content_id = core_repositorycontent2.content_id and core_repositorycontent.repository_id != core_repositorycontent2.repository_id;

In my case i had 106 of them, here's one:

ver154 | 5ba8ddfc-fde7-4485-8880-fae8f3901d8e

Lets examine the that tag:

pulpcore=#  select * from container_Tag where content_ptr_id = '5ba8ddfc-fde7-4485-8880-fae8f3901d8e';
            content_ptr_id            |  name  |          tagged_manifest_id          
--------------------------------------+--------+--------------------------------------
 5ba8ddfc-fde7-4485-8880-fae8f3901d8e | ver154 | 8b3c4916-0012-443b-b2a7-3421e65a51d7

Lets see which repos it is in:

pulpcore=# select repository_id, content_id from core_repositorycontent where content_id = '5ba8ddfc-fde7-4485-8880-fae8f3901d8e' order by repositorY_id;
            repository_id             |              content_id              
--------------------------------------+--------------------------------------
 16cc1c09-4d64-4b43-a138-250074f56ef7 | 5ba8ddfc-fde7-4485-8880-fae8f3901d8e
 60aa08e6-5997-4103-91c3-1b0730d16024 | 5ba8ddfc-fde7-4485-8880-fae8f3901d8e
(2 rows)

Its in 2 repos! This is unexpected, is its manifest in both?

pulpcore=# select repository_id, content_id from core_repositorycontent where content_id = '8b3c4916-0012-443b-b2a7-3421e65a51d7' order by repositorY_id;
            repository_id             |              content_id              
--------------------------------------+--------------------------------------
 60aa08e6-5997-4103-91c3-1b0730d16024 | 8b3c4916-0012-443b-b2a7-3421e65a51d7
(1 row)

No, its manifest is only in one. This isn't right.

If i sync these sequentially, the problem doesn't seem to occur


Related issues

Copied to Container Support - Backport #9334: Backport 2.8: race condition syncing repositories with the same tags results in bad data within the databaseCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision 7285b7ab View on GitHub
Added by mdellweg about 2 months ago

Refactor sync pipeline with content resolution

This will no longer create hollow tags that will be updated later, but wait for their creation until the Manifest is in the database.

fixes #9292

Revision 7285b7ab View on GitHub
Added by mdellweg about 2 months ago

Refactor sync pipeline with content resolution

This will no longer create hollow tags that will be updated later, but wait for their creation until the Manifest is in the database.

fixes #9292

History

#2 Updated by mdellweg about 2 months ago

  • Triaged changed from No to Yes

#3 Updated by mdellweg about 2 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to mdellweg

I have a theory about this: The sync pipeline is creating hollow tags first to interrelate them in a later stage. So the actual symptom described here will happen if two sync's try to create a tag "latest" without a manifest. Due to database constraints, they will end up referencing the same tag after the FindExistingContent stage. Then later both will interrelate this tag with the their instance of the manifest. The one going last will obviously win.

The idea to solve this will be to refactor the interrelation into the first stage by using the resolution feature. That way, the tags will only be created in their final state. The concurrent pipelines will no longer fight over them.

#4 Updated by pulpbot about 2 months ago

  • Status changed from ASSIGNED to POST

#5 Updated by mdellweg about 2 months ago

  • Status changed from POST to MODIFIED

#6 Updated by mdellweg about 2 months ago

  • Copied to Backport #9334: Backport 2.8: race condition syncing repositories with the same tags results in bad data within the database added

#8 Updated by ipanova@redhat.com 11 days ago

  • Sprint/Milestone set to 2.9.0

#9 Updated by pulpbot 11 days ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF