New artifact is downloaded even when we already have a content unit with the same identity
I'll share a pulp_ansible edge case,
We have rh-certified and community collections, both with the same identity (namespace, name, version), the content ideally would be the same, but in some cases the sha256s are different.
"sha256": "297358d05551bd104eb94be40edaa14dd55ea9f9a2f3953f58f0197689536c7d", "size": 153850
"size": 153853, "sha256": "d3f47947d6a0ac81f5fc8d65ef38abe0ec953765504c9421640d81f0f39462c7"
So when you sync from one source and re-sync from another source, the existent content is used, but it downloads a new artifact since the sha256 is different. So in the end, we have:
- Content (Source A)
- ContentArtifact (Source A)
- Artifact (Source A)
- Artifact (Source B)
Specifically, at pulp_ansible side, the post_save step gets data from the current artifact, On the same example, it would mean using Artifact (source B) data on Content (source A) https://github.com/pulp/pulp_ansible/blob/master/pulp_ansible/app/tasks/collections.py#L587-L641
Please register to edit this issue