Most DRPMs are missing after a sync
When syncing http://mirror.centos.org/centos/6/updates/x86_64/, a user noticed that even though there are ~1000 DRPMs, only ~220 get synced. I was able to reproduce this on 2.8.6 (there's no option for 2.8.6 in the "Version" field).
#. pulp-admin rpm repo create --repo-id el6-updates --download-policy on_demand --feed http://mirror.centos.org/centos/6/updates/x86_64/
#. pulp-admin rpm repo sync run --repo-id el6-updates
#. Note the large number of missing DRPMs
There is nothing in the logs that indicates something went wrong.
#1 Updated by mhrivnak over 4 years ago
- Status changed from NEW to CLOSED - NOTABUG
Most of the drpms listed in the HTML directory index for that repo are not present in the XML metadata. The crude parsing method below shows about 234 drpms in the XML metadata.
$ curl -s http://ftp.cvut.cz/centos/6/updates/x86_64/repodata/245eccdf7402d7d6cfb7d87245a387ee06d679760b9414cb5ba2938ab074ca18-prestodelta.xml.gz | zgrep "<newpackage"| wc -l 234
For any DRPMs not listed in the prestodelta file, pulp (or any other client) does not have a way to find them.
So it appears that pulp is working correctly, and the creator of that repo chose to not keep older drpms in the repo's metadata.
#2 Updated by email@example.com over 4 years ago
- Status changed from CLOSED - NOTABUG to NEW
That is not how prestodelta.xml works and Pulp is not working correctly. See http://docs.pulpproject.org/plugins/pulp_rpm/tech-reference/rpm.html#prestodelta-xml
curl -s http://ftp.cvut.cz/centos/6/updates/x86_64/repodata/245eccdf7402d7d6cfb7d87245a387ee06d679760b9414cb5ba2938ab074ca18-prestodelta.xml.gz | zgrep "</delta>"| wc -l 1079
#3 Updated by mhrivnak over 4 years ago
Ah, very interesting. So pulp has been parsing these incorrectly for a very long time. I'm surprised it took this long to notice. Looking at the the parsing logic, it assumes only one file per "newpackage" entry, and uses whatever the first "delta" match is returned by ElementTree.
Correcting that could be a pretty big change. The filename is part of the unit key, so at least the modeling can probably stay the same. But parsing in the importer will have to be improved to iterate over all the delta entries, and then the distributor will need to sort all the units by "newpackage" (which is not in the unit key, but does get saved on the model) so it can publish the deltas together.
#6 Updated by bmbouter over 1 year ago
- Status changed from NEW to CLOSED - WONTFIX
Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.
Please register to edit this issue