Issue #1843
Updated by jcline@redhat.com over 8 years ago
Syncing some distributions published by Pulp with a different Pulp server probably does not work in all cases. If a the repository in question contains a PULP_DISTRIBUTION.xml metadata file, it is possible for Pulp to re-publish it with invalid data. This causes a the second Pulp server syncing from the first might be unable to fail. Specifically, download files are referenced in by PULP_DISTRIBUTION.xml. This is because some files referenced by the PULP_DISTRIBUTION.xml file that do no exist are intentionally skipped during publish[0]. However, the metadata is not altered in any way[1], so it still references those files. A specific example of this is the version published by Pulp[0] (but do exist upstream). For example, the RHEL6[2] kickstart repository repository. It contains a PULP_DISTRIBUTION.xml file that references `repodata/productid`. During sync this is downloaded along with the XML file, but when the repository is published, it is explicitly skipped. Ultimately, this occurs because Pulp blindly syncs and publishes this PULP_DISTRIBUTION.xml file[1] while filtering content retrieved using it. To fix this, we should be generating/altering the PULP_DISTRIBUTION.xml file we publish to ensure we don't create invalid metadata. However, a bigger question is whether or not filtering content[0] is even appropriate. I suspect it is not. This issue is not meant to address that problem, though. [0] https://github.com/pulp/pulp_rpm/blob/pulp-rpm-2.8.2-1/plugins/pulp_rpm/plugins/distributors/yum/publish.py#L796-L797 [1] https://github.com/pulp/pulp_rpm/blob/pulp-rpm-2.8.2-1/plugins/pulp_rpm/plugins/importers/yum/parse/treeinfo.py#L437-L441 [2] https://cdn.redhat.com/content/dist/rhel/server/6/6Server/x86_64/kickstart/