Issue #9542
closedRepositories with packages in common (same hash) redownload the file even if it is already in the artifact folder
Description
Given that the workers are using the remote repository metadata which contains package hashes to inform the sync process, pulp_rpm should check if the package already exists locally as an artifact. If it does, skip the artifact download and create a content object with a reference to the existing artifact that will be associated to the repository being sync'd. This saves on downloading the same file over and over during syncs.
I validated this by setting up two remotes and two repositories. Both remotes point to the same remote repository (Appstream) and the local repositories point to each of the remotes respectively. After I finished syncing one of the repositories using the "immediate" policy, I setup a tcpdump to monitor GET requests:
tcpdump -i <interface_name> -s 0 -A 'tcp[((tcp12:1] & 0xf0) >> 2):4] = 0x47455420'
Next I began the sync of the second repository. I then watched as the GET requests showed the second sync was clearly redownloading every file.
This makes little sense given that the artifacts are stored as files named after their SHA256 hash (minus the first two characters which are used as the folder name). In other words, there can only be one blob per unique file hash.
Related issues
Fixes Artifact redownloading bug
The
sync_to_async_iterable
wraps the Artifact queryset, but unlike querysets, it can't be reused. This causes subsequent iterations through it to not actually iterate.closes #9542