Issue #8716
closedAdvisory-digest-calculation can result in the 'same' advisory having different digests
Description
We calculate digest by formatting everything about an advisory into the format createrepo understands, and then generating a hash for the resulting string. This relies on a given advisory always being constructed in exactly the same way every time.
When asking for lists (e.g., collections, references, package-lists) via django-queries during this process, we don't specify an order_by. This means we are at the mercy of Postgres - which does not define any specific ordering in the absence of an explicit order_by.
This means that creating the digest for a given advisory can result in different answers, depending on the load on postgres and the state of its optimizer and/or cache.
To reproduce this, I uploaded a large number of advisories into pulp, repeatedly. The number of advisory-Artifacts will (often) climb by some number, even after all the advisories 'exist' in Pulp.
Using the advisories in this repo: https://github.com/ggainey/pulp_startup/tree/main/lero_tests/tmp/bla
And this to do the uploads:
cd lero_tests/tmp/bla
REPO_HREF=`pulp rpm repository create --name bar | jq -r '.pulp_href'`
for x in *; do http --form POST :/pulp/api/v3/content/rpm/advisories/ file@./$x repository=$REPO_HREF; done
Each advisory-upload is its own task. Occasionally you will see the number of advisories rise, even after all 356 in the dataset have been created once.
pulp rpm repository version show --repository bar | jq '.content_summary.present."rpm.advisory".count'
356
(re-run upload loop)
pulp rpm repository version show --repository bar | jq '.content_summary.present."rpm.advisory".count'
392
NOTE: uploading an advisory, that already exists, will not only fail - it will make it impossible to upload further advisories, until https://pulp.plan.io/issues/8683 gets fixed. Addressing this issue probably can/needs-to wait until after 8683 is fixed/merged.
NOTE: you may see "ReservedResource deleted" errors in the logs during this test. This problem has been opened as https://pulp.plan.io/issues/8708
Related issues
Updated by ggainey over 3 years ago
- Related to Issue #8683: re-uploading an existing advisory makes new advisory upload fails added
Updated by ggainey over 3 years ago
- Related to Issue #8708: Tasking: Uploading many advisories in a row can have intermittent failures added
Updated by ggainey over 3 years ago
Updated by pulpbot over 3 years ago
Updated by dalley over 3 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 96
Added by ggainey over 3 years ago
Updated by ggainey over 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset 671bff8c4cdaeba274d453c5f3a671113ec2bea9.
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Ensure sub-object ordering when computing an UpdateRecord's digest.
If collections/references/packagelists aren't explicitly ordered, you can end up with two identical advisories having different digests (and therefore being different Artifacts).
Note: there is no test with this PR, because forcing the undesireable behavior requires a heavy concurrent load on postgres, and even then may not happen, depending on externals like disk-speed and postgres-caching. See the referenced issue for a manual reproducer.
fixes #8716 Required PR: https://github.com/pulp/pulp_rpm/pull/1981 [nocoverage]