Project

Profile

Help

Issue #9101

Content_artifact is not updated

Added by ppicka 3 months ago. Updated 2 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 103
Quarter:

Description

When content is created with "on_demand" policy (only remote artifacts) and then is the same content synced with 'immediate' policy (artifact downloaded) content_artifact.artifact of that content is not updated.

Looks QueryExistingContent stage should check this update.

Reproducer script:

#!/usr/bin/env bash

export REPO1=$(head /dev/urandom | tr -dc a-z | head -c5)
export REPO2=$(head /dev/urandom | tr -dc a-z | head -c5)
export REMOTE1=$(head /dev/urandom | tr -dc a-z | head -c5)
export REMOTE2=$(head /dev/urandom | tr -dc a-z | head -c5)


# sync content on_demand
pulp file repository create --name $REPO1
pulp file remote create --name $REMOTE1 \
    --url 'https://fixtures.pulpproject.org/file/PULP_MANIFEST' \
    --policy 'on_demand'

pulp file repository sync --name $REPO1 --remote $REMOTE1

export ADDED=$(pulp file repository version show --repository $REPO1 --version 1 | \
jq -r '.content_summary | .added | ."file.file" | .href')

http :24817${ADDED}
echo "here are no checksums as only remote artifacts are present"
echo "press [enter] to continue"
read

# sync same content immediate
pulp file repository create --name $REPO2
pulp file remote create --name $REMOTE2 \
    --url 'https://fixtures.pulpproject.org/file/PULP_MANIFEST' \
    --policy 'immediate'

pulp file repository sync --name $REPO2 --remote $REMOTE2

export ADDED=$(pulp file repository version show --repository $REPO2 --version 1 | \
jq -r '.content_summary | .added | ."file.file" | .href')

http :24817${ADDED}
echo "as artifacts downloaded checksums should be visible "
echo "if you check in django shell or db, content_artifacts does not have updated 'artifact' relation"
content_saver_filter_update_in_loop.patch (1.15 KB) content_saver_filter_update_in_loop.patch lmjachky, 08/06/2021 11:44 PM
ca_bulk_update.patch (3.72 KB) ca_bulk_update.patch gerrod, 08/11/2021 03:17 AM
clipboard-202108112318-j9dgm.png (109 KB) clipboard-202108112318-j9dgm.png lmjachky, 08/11/2021 11:18 PM
250

Related issues

Related to Pulp - Issue #8305: Deleting a remote used as source for live content corrupts ContentArtifact recordsNEW<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Copied to Pulp - Backport #9261: Backport #9101 "Content_artifact is not updated" to 3.14.zCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision c2b732e5 View on GitHub
Added by gerrod 2 months ago

Properly update present ContentAtifacts after immediate sync

fixes: #9101

History

#1 Updated by dkliban@redhat.com 3 months ago

  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High
  • Triaged changed from No to Yes
  • Sprint set to Sprint 101

#2 Updated by dkliban@redhat.com 3 months ago

  • Priority changed from High to Normal
  • Severity changed from 3. High to 2. Medium

#3 Updated by ipanova@redhat.com 3 months ago

  • Sprint changed from Sprint 101 to Sprint 102

#4 Updated by lmjachky 3 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to lmjachky

#5 Updated by dalley 3 months ago

  • Related to Issue #8305: Deleting a remote used as source for live content corrupts ContentArtifact records added

#6 Updated by lmjachky 3 months ago

(1) One of the possible solutions is to update the corresponding reference (i.e., artifact) for ContentArtifact objects in the ContentSaver stage. This stage is responsible for saving Content and it assumes that once content is saved, it does not need to create/update the content again. I am attaching a patch that resolves the reported issue. Yet, I will think about a better approach because the attached patch causes django to invoke a database call for every content unit when it was already saved, which can result in performance degradation.

(2) Maybe replacing the call ContentArtifact.objects.bulk_get_or_create(content_artifact_bulk) with ContentArtifact.objects.bulk_create_or_update(content_artifact_bulk) could be a better solution after making this code to run even when content_already_saved == False?

(3) Another solution would be to add another stage (e.g., ContentUpdater) that will go through saved content and will check whether DeclarativeContent objects are correctly mapped to the database (i.e., ContentArtifact). Then, it will call bulk_update on the content that needs to be updated. This is only redemption after ignoring the issue in stages which should handle this in the first place.

#7 Updated by gerrod 3 months ago

After looking over your suggestions I came up with this patch to do all the work in (hopefully) two db calls. I did basic checks to make sure it fixes the problem, but I haven't check if it covers all edge cases and if it only does two db calls.

#8 Updated by lmjachky 3 months ago

250

I performed a couple of benchmark tests to see if Gerrod's patch does not decrease the performance (since filter() is called n-times; similarly to my update() call). I run the attached Pavel's script on a repository with 5,000 files multiple times in a row (this means that only the first 2 syncs were really creating new content; the rest of them were checking the existing content). I enabled profiling and logged service time average for the ContentSaver stage.

In comparison to my patch, the Gerrod's patch almost did not increased the time spent in the stage when content already existed in the database. Therefore, I conclude that we should follow the approach Gerrod has presented.

#9 Updated by rchan 3 months ago

  • Sprint changed from Sprint 102 to Sprint 103

#10 Updated by daviddavis 2 months ago

  • Priority changed from Normal to High

I just ran into this with Satellite QE. The bad part is that Content gets stuck without artifact--there's no way to fix the problem other than to delete the Content and start over. I think this is warrants higher priority.

#12 Updated by dalley 2 months ago

lmjachky Is the attached patch ready to go, then? Could a PR be created prior to Monday / Tuesday, so that this BZ can be addressed?

#13 Updated by dalley 2 months ago

  • Status changed from ASSIGNED to POST
  • Assignee changed from lmjachky to gerrod
  • Sprint/Milestone set to 3.15.0

#14 Updated by dalley 2 months ago

  • Copied to Backport #9261: Backport #9101 "Content_artifact is not updated" to 3.14.z added

#17 Updated by gerrod 2 months ago

  • Status changed from POST to MODIFIED

#18 Updated by pulpbot 2 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF