Project

Profile

Help

Issue #5707

base_version can cause content can show as being added and removed in the same version

Added by bmbouter 8 months ago. Updated 6 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 63

Description

This was originally reported on IRC by @gmbnomis.

Reproducer

1. Create an empty RepositoryVersion, version 1
2. Create a second RepositoryVersion, adding content unit A
3. Create a third RepositoryVersion, by adding content unit A again AND specify Version 1 as base_version

Issue

RepositoryVersion 3 shows content A as being both removed and added in the same repo.


Related issues

Has duplicate Pulp - Story #5731: Syncing creates RepositoryContent records even if no content was actually addedCLOSED - DUPLICATE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision 96b76e06 View on GitHub
Added by gmbnomis 6 months ago

Avoid superfluous RepositoryContent records

When adding and removing the same content in a repository version, don't update the RepositoryContent record (resulting in an superfluous RepositoryContent for which version_added == version_removed == current version). Instead, just delete the existing record. This fixes two problems:

  1. Such content is shown as added and removed.
  2. Re-adding the content again in the same version leads to a IntegrityError because the uniqueness constraint is violated.

Handle the "remove then re-add" case analogously.

https://pulp.plan.io/issues/5707 closes #5707

Revision c5a887fa View on GitHub
Added by gmbnomis 6 months ago

Avoid superfluous RepositoryContent records

When adding and removing the same content in a repository version, don't update the RepositoryContent record (resulting in an superfluous RepositoryContent for which version_added == version_removed == current version). Instead, just delete the existing record. This fixes two problems:

  1. Such content is shown as added and removed.
  2. Re-adding the content again in the same version leads to a IntegrityError because the uniqueness constraint is violated.

Handle the "remove then re-add" case analogously.

https://pulp.plan.io/issues/5707 closes #5707

(cherry picked from commit 96b76e0687ac52e8bc68751ec414c9b6b4501dc2)

Revision 7129d7ef View on GitHub
Added by pulpbot 6 months ago

Avoid superfluous RepositoryContent records (#487)

When adding and removing the same content in a repository version, don't update the RepositoryContent record (resulting in an superfluous RepositoryContent for which version_added == version_removed == current version). Instead, just delete the existing record. This fixes two problems:

  1. Such content is shown as added and removed.
  2. Re-adding the content again in the same version leads to a IntegrityError because the uniqueness constraint is violated.

Handle the "remove then re-add" case analogously.

https://pulp.plan.io/issues/5707 closes #5707

(cherry picked from commit 96b76e0687ac52e8bc68751ec414c9b6b4501dc2)

Co-authored-by: gmbnomis

History

#1 Updated by daviddavis 8 months ago

  • Tracker changed from Issue to Task
  • % Done set to 0

#2 Updated by daviddavis 8 months ago

  • Tracker changed from Task to Issue
  • Severity set to 2. Medium
  • Triaged set to No

#3 Updated by daviddavis 8 months ago

There's a similar situation in #5731 where added content is actually removed by the finalize step.

#4 Updated by daviddavis 8 months ago

  • Related to Story #5731: Syncing creates RepositoryContent records even if no content was actually added added

#5 Updated by fao89 8 months ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 62

#6 Updated by rchan 7 months ago

  • Sprint changed from Sprint 62 to Sprint 63

#7 Updated by gmbnomis 6 months ago

  • Assignee set to gmbnomis

#8 Updated by gmbnomis 6 months ago

I have written two implementations and would like to get feedback on which one we should take (they have different tradeoffs):

1. Normalize the representation during add_content/remove_content (see https://github.com/gmbnomis/pulpcore/commit/00f284b2aa00cfadad5347e7d2ec33df10cf05a1)
2. Add a dedicated normalization function that is called when required (see https://github.com/gmbnomis/pulpcore/commit/baf80d7dd3f2615b8206fb660f9ef88a65a284fb)

Additionally, there is a variant of 2 (see https://github.com/gmbnomis/pulpcore/commit/3a65935d40a4b2788e9e383225a1596325f44c6b):

2a. Add the dedicated normalization function from 2. and adapt the added/removed computation to filter out the non-normalized content

Pros/cons as I see it:

Solution 1:
+ simple implementation
+ added() and removed() always return the correct added and removed content sets
- adds queries to all add_content()/remove_content() calls

Solution 2:
+ In almost all cases, normalization is run only two times for each repository version (before and after finalization). add_content()/remove_content() need additional DB operations in very complex cases only (content is removed/added multiple times)
- added()/removed() may show content that isn't actually added/removed when called on a non-normalized repository version (i.e. during creation of a repository version)
- When we add functionality (e.g. different finalizations for different operations) in the future, we may have to call normalization in additional places

Solution 2a:
+ In almost all cases, normalization is run only a single time for each repository version (before saving). add_content()/remove_content() need additional DB operations in very complex cases only (content is removed/added multiple times)
+ added() and removed() always return the correct added and removed content sets
- added()/removed() have more complicated queries in the common case (but it is still the same number of queries)

#9 Updated by gmbnomis 6 months ago

I have added/updated option 2a to the comment above which simplifies the trade-off: Before, the main trade-off was between speed and correctness. Now it is mainly between simplicity of implementation and speed. I wouldn't consider the original option 2 anymore.

#10 Updated by daviddavis 6 months ago

Thank you for working on some proof-of-concepts for this. I've reviewed both solutions and they both look good to me. With 2/2a, Pulp could still end up with superfluous records in the database if the plugin writer is not careful. Is that correct?

Also, I'm wondering if you have any numbers regarding the trade off in speed.

#11 Updated by gmbnomis 6 months ago

daviddavis wrote:

Thank you for working on some proof-of-concepts for this. I've reviewed both solutions and they both look good to me. With 2/2a, Pulp could still end up with superfluous records in the database if the plugin writer is not careful. Is that correct?

If the plugin writer uses the context manager, all superfluous records will be removed when exiting the context. Such records may exist during the creation of a version, but in 2a, they are not visible to the plugin writer as added/removed handle them correctly.

Also, I'm wondering if you have any numbers regarding the trade off in speed.

Excellent point, I had a superficial look into this. Short answer: It is very unlikely that 2/2a are worth the effort.

Basically, option 1 creates an additional round trip to the DB for every add_content/remove_content call and I feared that this could be a considerable overhead during sync. That's probably only true if there is not much work to do for a single add_content/remove_content. But sync handles larger batches and that's not the way these calls are used.

In the common case (no superfluous records), the additional round trip (that update/deletes nothing) is negligible when compared to the cost of actually creating/updating RepositoryContent instances.

#12 Updated by gmbnomis 6 months ago

  • Status changed from NEW to MODIFIED

#14 Updated by daviddavis 6 months ago

  • Related to deleted (Story #5731: Syncing creates RepositoryContent records even if no content was actually added)

#15 Updated by daviddavis 6 months ago

  • Has duplicate Story #5731: Syncing creates RepositoryContent records even if no content was actually added added

#17 Updated by bmbouter 6 months ago

  • Sprint/Milestone set to 3.0.1

#18 Updated by bmbouter 6 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF