Published RPM metadata isn't sorted properly
Ticket moved to GitHub: "pulp/pulp_rpm/2274":https://github.com/pulp/pulp_rpm/issues/2274
RPM metadata should be published in-order, which helps with compression efficency (see associated BZ). createrepo_c does this, but not via the library itself, so Pulp is still publishing unordered metadata.
Note that the metadata is "fine", it works, it's just inefficient to compress.
createrepo_c uses location_href as the sort key.
Problem: Pulp mixes location_href's together from many different repositories, and because they are meaningless, it basically ignores them. So we store useless data in the database.
We should remove the location_href and location_base fields (the latter is entirely unused), and replace them with just a filename, which we can possibly use to reconstruct a location_href if we need to keep it for backwards compatibility. Then we can properly sort by it, and we can use it directly in various places without needing to rewrite the value constantly.
It is not a "real" part of the RPM package metadata, only a value which createrepo_c happens to provide on the objects which we copied over. This probably shouldn't have been done.
We should therefore sort by the filename, which is basically equivalent to sorting by location_href since within a repository all the packages should have the same directory.
Updated by dalley over 2 years ago
This applies to module metadata too - we should probably sort alphabetically like upstream does.
This comes with an upside of making most of the metadata generation deterministic, and making it easier to verify that we've generated the correct metadata.