Issue #5707
closedbase_version can cause content can show as being added and removed in the same version
Description
This was originally reported on IRC by gmbnomis.
Reproducer¶
1. Create an empty RepositoryVersion, version 1
2. Create a second RepositoryVersion, adding content unit A
3. Create a third RepositoryVersion, by adding content unit A again AND specify Version 1 as base_version
Issue¶
RepositoryVersion 3 shows content A as being both removed and added in the same repo.
Related issues
Updated by daviddavis about 5 years ago
- Tracker changed from Issue to Task
- % Done set to 0
Updated by daviddavis about 5 years ago
- Tracker changed from Task to Issue
- Severity set to 2. Medium
- Triaged set to No
Updated by daviddavis about 5 years ago
There's a similar situation in #5731 where added content is actually removed by the finalize step.
Updated by daviddavis about 5 years ago
- Related to Story #5731: Syncing creates RepositoryContent records even if no content was actually added added
Updated by fao89 about 5 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 62
Updated by gmbnomis about 5 years ago
I have written two implementations and would like to get feedback on which one we should take (they have different tradeoffs):
1. Normalize the representation during add_content/remove_content (see https://github.com/gmbnomis/pulpcore/commit/00f284b2aa00cfadad5347e7d2ec33df10cf05a1)
2. Add a dedicated normalization function that is called when required (see https://github.com/gmbnomis/pulpcore/commit/baf80d7dd3f2615b8206fb660f9ef88a65a284fb)
Additionally, there is a variant of 2 (see https://github.com/gmbnomis/pulpcore/commit/3a65935d40a4b2788e9e383225a1596325f44c6b):
2a. Add the dedicated normalization function from 2. and adapt the added/removed computation to filter out the non-normalized content
Pros/cons as I see it:
Solution 1:
+ simple implementation
+ added()
and removed()
always return the correct added and removed content sets
- adds queries to all add_content()
/remove_content()
calls
Solution 2:
+ In almost all cases, normalization is run only two times for each repository version (before and after finalization). add_content()
/remove_content()
need additional DB operations in very complex cases only (content is removed/added multiple times)
- added()
/removed()
may show content that isn't actually added/removed when called on a non-normalized repository version (i.e. during creation of a repository version)
- When we add functionality (e.g. different finalizations for different operations) in the future, we may have to call normalization in additional places
Solution 2a:
+ In almost all cases, normalization is run only a single time for each repository version (before saving). add_content()
/remove_content()
need additional DB operations in very complex cases only (content is removed/added multiple times)
+ added()
and removed()
always return the correct added and removed content sets
- added()
/removed()
have more complicated queries in the common case (but it is still the same number of queries)
Updated by gmbnomis about 5 years ago
I have added/updated option 2a to the comment above which simplifies the trade-off: Before, the main trade-off was between speed and correctness. Now it is mainly between simplicity of implementation and speed. I wouldn't consider the original option 2 anymore.
Updated by daviddavis about 5 years ago
Thank you for working on some proof-of-concepts for this. I've reviewed both solutions and they both look good to me. With 2/2a, Pulp could still end up with superfluous records in the database if the plugin writer is not careful. Is that correct?
Also, I'm wondering if you have any numbers regarding the trade off in speed.
Updated by gmbnomis about 5 years ago
daviddavis wrote:
Thank you for working on some proof-of-concepts for this. I've reviewed both solutions and they both look good to me. With 2/2a, Pulp could still end up with superfluous records in the database if the plugin writer is not careful. Is that correct?
If the plugin writer uses the context manager, all superfluous records will be removed when exiting the context. Such records may exist during the creation of a version, but in 2a, they are not visible to the plugin writer as added/removed handle them correctly.
Also, I'm wondering if you have any numbers regarding the trade off in speed.
Excellent point, I had a superficial look into this. Short answer: It is very unlikely that 2/2a are worth the effort.
Basically, option 1 creates an additional round trip to the DB for every add_content/remove_content call and I feared that this could be a considerable overhead during sync. That's probably only true if there is not much work to do for a single add_content/remove_content. But sync handles larger batches and that's not the way these calls are used.
In the common case (no superfluous records), the additional round trip (that update/deletes nothing) is negligible when compared to the cost of actually creating/updating RepositoryContent instances.
Added by gmbnomis about 5 years ago
Updated by gmbnomis about 5 years ago
- Status changed from NEW to MODIFIED
Applied in changeset pulpcore|96b76e0687ac52e8bc68751ec414c9b6b4501dc2.
Added by gmbnomis about 5 years ago
Revision c5a887fa | View on GitHub
Avoid superfluous RepositoryContent records
When adding and removing the same content in a repository version, don't update the RepositoryContent record (resulting in an superfluous RepositoryContent for which version_added == version_removed == current version). Instead, just delete the existing record. This fixes two problems:
- Such content is shown as added and removed.
- Re-adding the content again in the same version leads to a IntegrityError because the uniqueness constraint is violated.
Handle the "remove then re-add" case analogously.
https://pulp.plan.io/issues/5707 closes #5707
(cherry picked from commit 96b76e0687ac52e8bc68751ec414c9b6b4501dc2)
Updated by gmbnomis about 5 years ago
Applied in changeset pulpcore|c5a887fa6f174f44e1452ed45b2aadde9e15abf2.
Updated by daviddavis about 5 years ago
- Related to deleted (Story #5731: Syncing creates RepositoryContent records even if no content was actually added)
Updated by daviddavis about 5 years ago
- Has duplicate Story #5731: Syncing creates RepositoryContent records even if no content was actually added added
Added by pulpbot about 5 years ago
Revision 7129d7ef | View on GitHub
Avoid superfluous RepositoryContent records (#487)
When adding and removing the same content in a repository version, don't update the RepositoryContent record (resulting in an superfluous RepositoryContent for which version_added == version_removed == current version). Instead, just delete the existing record. This fixes two problems:
- Such content is shown as added and removed.
- Re-adding the content again in the same version leads to a IntegrityError because the uniqueness constraint is violated.
Handle the "remove then re-add" case analogously.
https://pulp.plan.io/issues/5707 closes #5707
(cherry picked from commit 96b76e0687ac52e8bc68751ec414c9b6b4501dc2)
Co-authored-by: gmbnomis gmbnomis@users.noreply.github.com
Updated by pulpbot about 5 years ago
Applied in changeset pulpcore|7129d7ef2ea1fdc5255ba658459fce1b34cc6bc0.
Updated by bmbouter about 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Avoid superfluous RepositoryContent records
When adding and removing the same content in a repository version, don't update the RepositoryContent record (resulting in an superfluous RepositoryContent for which version_added == version_removed == current version). Instead, just delete the existing record. This fixes two problems:
Handle the "remove then re-add" case analogously.
https://pulp.plan.io/issues/5707 closes #5707