Advisory-digest-calculation can result in the 'same' advisory having different digests
We calculate digest by formatting everything about an advisory into the format createrepo understands, and then generating a hash for the resulting string. This relies on a given advisory always being constructed in exactly the same way every time.
When asking for lists (e.g., collections, references, package-lists) via django-queries during this process, we don't specify an order_by. This means we are at the mercy of Postgres - which does not define any specific ordering in the absence of an explicit order_by.
This means that creating the digest for a given advisory can result in different answers, depending on the load on postgres and the state of its optimizer and/or cache.
To reproduce this, I uploaded a large number of advisories into pulp, repeatedly. The number of advisory-Artifacts will (often) climb by some number, even after all the advisories 'exist' in Pulp.
Using the advisories in this repo: https://github.com/ggainey/pulp_startup/tree/main/lero_tests/tmp/bla
And this to do the uploads:
cd lero_tests/tmp/bla REPO_HREF=`pulp rpm repository create --name bar | jq -r '.pulp_href'` for x in *; do http --form POST :/pulp/api/v3/content/rpm/advisories/ file@./$x repository=$REPO_HREF; done
Each advisory-upload is its own task. Occasionally you will see the number of advisories rise, even after all 356 in the dataset have been created once.
pulp rpm repository version show --repository bar | jq '.content_summary.present."rpm.advisory".count' 356 (re-run upload loop) pulp rpm repository version show --repository bar | jq '.content_summary.present."rpm.advisory".count' 392
NOTE: uploading an advisory, that already exists, will not only fail - it will make it impossible to upload further advisories, until https://pulp.plan.io/issues/8683 gets fixed. Addressing this issue probably can/needs-to wait until after 8683 is fixed/merged.
NOTE: you may see "ReservedResource deleted" errors in the logs during this test. This problem has been opened as https://pulp.plan.io/issues/8708
Added by ggainey about 1 year ago
Ensure sub-object ordering when computing an UpdateRecord's digest.
If collections/references/packagelists aren't explicitly ordered, you can end up with two identical advisories having different digests (and therefore being different Artifacts).
Note: there is no test with this PR, because forcing the undesireable behavior requires a heavy concurrent load on postgres, and even then may not happen, depending on externals like disk-speed and postgres-caching. See the referenced issue for a manual reproducer.