Filter "artifacts" when rendering modulemd according to RPMs in repo
Currently, if we upload a modulemd document to a Pulp repo and publish that repo, the modulemd document is rendered verbatim in the resulting yum repo, including all artifacts.
It would be useful instead if the "artifacts" section were filtered to only include those RPMs which are actually in the repo. This would integrate better with build systems which will produce an entire module without knowledge of how the artifacts in that module may later be split over multiple repos.
Example: koji produces a modulemd like this: https://kojipkgs.fedoraproject.org//packages/nodejs/10/20181011185441.a5b0195c/files/module/modulemd.x86_64.txt
Note in the modulemd, the artifacts list includes both binary, debuginfo and source RPMs, e.g.:
artifacts: rpms: ... - nodejs-1:10.12.0-1.module_2302+cf7b8058.src - nodejs-1:10.12.0-1.module_2302+cf7b8058.x86_64 - nodejs-debuginfo-1:10.12.0-1.module_2302+cf7b8058.x86_64 - nodejs-debugsource-1:10.12.0-1.module_2302+cf7b8058.x86_64 - nodejs-devel-1:10.12.0-1.module_2302+cf7b8058.x86_64 - nodejs-devel-debuginfo-1:10.12.0-1.module_2302+cf7b8058.x86_64 ...
(Note: please ignore that this modulemd file is named with x86_64 but contains other arches too - that's a bug which will be fixed, so I stripped them from my example)
In practice, binary/debuginfo/source RPMs are (by policy) almost always shipped into separate repos. But the build system doesn't know about this (and shouldn't have to), so it makes sense that it produced a modulemd containing all of these together.
If we want to release this module, it would be most convenient if we could simply upload this modulemd into source, binary and debuginfo repos, then upload source, binary, debuginfo RPMs into those three repos and have the modulemd rendered in each repo containing only the RPMs present in that repo. If Pulp isn't doing this, we will need an extra step external to Pulp, to do this prior to module upload.
In considering whether this makes sense, I'd point to prior art:
- Pulp's own handling of erratum pkglists already worked like this, where an erratum has one pkglist which is then filtered per repo at updateinfo.xml rendering time.
- Pungi, which is already used to publish modular yum repos, works this way - see https://pagure.io/pungi/blob/32bb9aeabe8d3f020599b51ca6aecafd36c0e4ec/f/pungi/phases/createrepo.py#_225
Steps to reproduce¶
- Find a modulemd containing mix of src, debuginfo and binary RPMs in "artifacts"
- Upload the modulemd and (only) the binary RPMs to a repo
- Publish the repo
Published YAML for the module contains all artifacts present in the input.
Published YAML for the module contains only those artifacts which both are present in the input, and present in the repo (i.e. only the binary RPMs).
#2 Updated by rmcgover almost 3 years ago
Could Pulp team please groom this soon?
We need to know at least if you agree in principle with this idea, or if you reject it. Or is there more info required?
We'll need something in our toolchain to solve this problem (that modulemd files are generated with more artifacts than will be shipped to the target repo) and we need to plan whether that can be Pulp, or whether it must be something else.
#4 Updated by rmcgover almost 3 years ago
- Status changed from NEW to CLOSED - NOTABUG
Thanks Ina for pushing that discussion forward.
In that thread we confirmed that it's harmless to have extra artifacts in modulemd docs beyond what's present in the target repo. Also, it is planned to remove the Pungi behavior I'd pointed to in the issue description.
That means there's no requirement for Pulp to do anything here. This issue was based on my mistaken understanding of how modules were to be consumed, so I'll withdraw this request.
Please register to edit this issue