Story #3982
Updated by ttereshc over 5 years ago
h3. Motivation * Currently there is no way to copy modules with all their artifacts without bringing in all dependencies. * Internally Pulp parses modulemd artifact names (specified in metadata) and looks for each RPM by NEVRA every time it needs to know which RPM units belong to a module. The modulemd fetch content units call returns a list like this <pre> {"metadata"=> {"_storage_path"=> "/var/lib/pulp/content/units/modulemd/d6/3abf7a6a5638d4aeb257aea68e09f8ea39b017aed73d196b7f9a6bd9d1ecfd", "name"=>"django", "stream"=>"1.6", "artifacts"=> ["python-django-bash-completion-0:1.6.11.7-1.module_1560+089ce146.noarch", "python2-django-0:1.6.11.7-1.module_1560+089ce146.noarch"], "checksum"=> "5c6054966a7981e48e2e8b2b7f9e2a33fc58ae36cb7aeab9a0cb096b16739f50", "_last_updated"=>1533230776, "_content_type_id"=>"modulemd", "profiles"=> {"default"=>["python2-django"], "python2_development"=>["python2-django"]}, "summary"=>"A high-level Python Web framework", "downloaded"=>true, "version"=>20180307130104, "pulp_user_metadata"=>{}, "context"=>"c2c572ec", "_ns"=>"units_modulemd", "_id"=>"33d9aff8-2c70-42ac-b0fc-0a8eef87f266", "arch"=>"noarch", "description"=> "Django is a high-level Python Web framework that encourages rapid development and a clean, pragmatic design. It focuses on automating as much as possible and adhering to the DRY (Don't Repeat Yourself) principle."}, "updated"=>"2018-08-02T17:26:16Z", "repo_id"=>"311e01ab-29b7-4b3c-90f4-29b17480b22e", "created"=>"2018-08-02T17:26:16Z", "unit_type_id"=>"modulemd", "unit_id"=>"33d9aff8-2c70-42ac-b0fc-0a8eef87f266", "_id"=>{"$oid"=>"5b633eb8cc36bbe621415477"}} </pre> Note: <pre> "artifacts"=> ["python-django-bash-completion-0:1.6.11.7-1.module_1560+089ce146.noarch", "python2-django-0:1.6.11.7-1.module_1560+089ce146.noarch"], </pre> What katello would like with respect to the the publish operation is a rpm uuid/unit it mapping for each of these rpms. This will aid katello in accounting for rpms that got copied over and hence make the determination on the modules to copy over. h3. Suggested API change Artifacts field can't be modified due to semver reasons. Using serializer, add Add a new field to the output of a module: <pre> "pulp_artifacts_map"=> [{"filename": "python-django-bash-completion-1.6.11.7-1.module_1560+089ce146.noarch", "unit_id": <uuid>}, {"filename":"python2-django-1.6.11.7-1.module_1560+089ce146.noarch", "unit_id": <uuid>}] </pre> @filename@ should correspond to the filename in Pulp and can/will be different from the one mentioned in the @artifacts@ field. @unit_id@ is a UUID of an RPM related to the module of interest. Note: at the moment only RPMs can be present in the artifacts, so it's not necessary to have @type@ for each entry of the @pulp_artifacts_map@. At any point later it can be added if needed, since it's an additive change. h3. Suggested solution Create a separate collection which maps a module to an RPM, @modulemd_artifact_map@ Two fields: @modulemd_id@ and @artifact_id@, and they are unique together (an RPM can belong to multiple modules) Records are created at sync or upload time. Records are removed in a post_delete hook of Modulemd model. Sync: create/update(?) mapping during sync. RPM upload: check if an RPM is modular and if it is, require to provide module NSVCA. Modulemd upload: create/update map for RPMs which are in a repo, no requirement to have all artifacts in a repo. Copy: no changes Publish: no changes Removal of a modulemd content unit from Pulp: remove mapping in post-save hook for modulemd Migration: process existing modules and modular rpms and create the mapping collection, focus on performance h3. Further potential improvements Unassociation of RPM: if RPM is modular and module is still in a repo => reject/complain. (This behaviour might be unwanted by some users) Applicability: leverage the mapping when deciding which modules are applicable. Depsolving: maybe the part which copies artifacts can be improved h3. Open questions # Do we want to have a more generic map? parent_id and child_id? E.g. is there a need to map modulemd_defaults to modulemds in the future? # What to do at re-sync time? Do we need to check that no artifacts were added every time?