Story #3982
closedAs a user, I can get artifacts which belong to a specific module
0%
Description
Motivation¶
- Currently there is no way to copy modules with all their artifacts without bringing in all dependencies.
- Internally Pulp parses modulemd artifact names (specified in metadata) and looks for each RPM by NEVRA every time it needs to know which RPM units belong to a module.
The modulemd fetch content units call returns a list like this
{"metadata"=>
{"_storage_path"=>
"/var/lib/pulp/content/units/modulemd/d6/3abf7a6a5638d4aeb257aea68e09f8ea39b017aed73d196b7f9a6bd9d1ecfd",
"name"=>"django",
"stream"=>"1.6",
"artifacts"=>
["python-django-bash-completion-0:1.6.11.7-1.module_1560+089ce146.noarch",
"python2-django-0:1.6.11.7-1.module_1560+089ce146.noarch"],
"checksum"=>
"5c6054966a7981e48e2e8b2b7f9e2a33fc58ae36cb7aeab9a0cb096b16739f50",
"_last_updated"=>1533230776,
"_content_type_id"=>"modulemd",
"profiles"=>
{"default"=>["python2-django"], "python2_development"=>["python2-django"]},
"summary"=>"A high-level Python Web framework",
"downloaded"=>true,
"version"=>20180307130104,
"pulp_user_metadata"=>{},
"context"=>"c2c572ec",
"_ns"=>"units_modulemd",
"_id"=>"33d9aff8-2c70-42ac-b0fc-0a8eef87f266",
"arch"=>"noarch",
"description"=>
"Django is a high-level Python Web framework that encourages rapid development and a clean, pragmatic design. It focuses on automating as much as possible and adhering to the DRY (Don't Repeat Yourself) principle."},
"updated"=>"2018-08-02T17:26:16Z",
"repo_id"=>"311e01ab-29b7-4b3c-90f4-29b17480b22e",
"created"=>"2018-08-02T17:26:16Z",
"unit_type_id"=>"modulemd",
"unit_id"=>"33d9aff8-2c70-42ac-b0fc-0a8eef87f266",
"_id"=>{"$oid"=>"5b633eb8cc36bbe621415477"}}
Note:
"artifacts"=>
["python-django-bash-completion-0:1.6.11.7-1.module_1560+089ce146.noarch",
"python2-django-0:1.6.11.7-1.module_1560+089ce146.noarch"],
What katello would like with respect to the the publish operation is a rpm uuid/unit it mapping for each of these rpms. This will aid katello in accounting for rpms that got copied over and hence make the determination on the modules to copy over.
Suggested API change¶
Artifacts field can't be modified due to semver reasons.
Using serializer, add a new field to the output of a module:
"pulp_artifacts_map"=>
[{"filename": "python-django-bash-completion-1.6.11.7-1.module_1560+089ce146.noarch",
"unit_id": <uuid>},
{"filename":"python2-django-1.6.11.7-1.module_1560+089ce146.noarch",
"unit_id": <uuid>}]
filename
should correspond to the filename in Pulp and can/will be different from the one mentioned in the artifacts
field.
unit_id
is a UUID of an RPM related to the module of interest.
Note: at the moment only RPMs can be present in the artifacts, so it's not necessary to have type
for each entry of the pulp_artifacts_map
. At any point later it can be added if needed, since it's an additive change.
Suggested solution¶
Create a separate collection which maps a module to an RPM, modulemd_artifact_map
Two fields: modulemd_id
and artifact_id
, and they are unique together (an RPM can belong to multiple modules)
Records are created at sync or upload time.
Records are removed in a post_delete hook of Modulemd model.
Sync: create/update(?) mapping during sync.
RPM upload: check if an RPM is modular and if it is, require to provide module NSVCA.
Modulemd upload: create/update map for RPMs which are in a repo, no requirement to have all artifacts in a repo.
Copy:
- during RPM copy - check if RPM is modular and if it is, check if it's a part of any module in a destination repo.
- during Modulemd copy - check if any RPMs in a destination repo belong to that module.
Publish: no changes
Removal of a modulemd content unit from Pulp: remove mapping in post-save hook for modulemd
Migration: process existing modules and modular rpms and create the mapping collection, focus on performance
Further potential improvements¶
Unassociation of RPM: if RPM is modular and module is still in a repo => reject/complain. (This behaviour might be unwanted by some users)
Applicability: leverage the mapping when deciding which modules are applicable.
Depsolving: maybe the part which copies artifacts can be improved
Open questions¶
- Do we want to have a more generic map? parent_id and child_id? E.g. is there a need to map modulemd_defaults to modulemds in the future?
- What to do at re-sync time? Do we need to check that no artifacts were added every time?
Related issues