Some repositories have some packages listed in the metadata twice. This
shouldn't happen (but it does). createrepo_c deduplicates by virtue of
parsing everything into a dict keyed by pkgid, but the iterative parser
does not. This eventually results in a mismatch once the iterative
parser comes across a package that the createrepo_c primary parser
already handled. So we keep a list of pkgid's we've already written out
in order to skip them once the iterative parser hits them a 2nd (or 3rd,
etc.) time.
Some repositories have some packages listed in the metadata twice. This
shouldn't happen (but it does). createrepo_c deduplicates by virtue of
parsing everything into a dict keyed by pkgid, but the iterative parser
does not. This eventually results in a mismatch once the iterative
parser comes across a package that the createrepo_c primary parser
already handled. So we keep a list of pkgid's we've already written out
in order to skip them once the iterative parser hits them a 2nd (or 3rd,
etc.) time.
Handle duplicate packages in upstream repos
Some repositories have some packages listed in the metadata twice. This shouldn't happen (but it does). createrepo_c deduplicates by virtue of parsing everything into a dict keyed by pkgid, but the iterative parser does not. This eventually results in a mismatch once the iterative parser comes across a package that the createrepo_c primary parser already handled. So we keep a list of pkgid's we've already written out in order to skip them once the iterative parser hits them a 2nd (or 3rd, etc.) time.
backports: #8944 https://pulp.plan.io/issues/8944
fixes #8962
(cherry picked from commit 8393a60695dd28a38d515cd7376396734626ae16)