Story #3982
closedAs a user, I can get artifacts which belong to a specific module
0%
Description
Motivation¶
- Currently there is no way to copy modules with all their artifacts without bringing in all dependencies.
- Internally Pulp parses modulemd artifact names (specified in metadata) and looks for each RPM by NEVRA every time it needs to know which RPM units belong to a module.
The modulemd fetch content units call returns a list like this
{"metadata"=>
{"_storage_path"=>
"/var/lib/pulp/content/units/modulemd/d6/3abf7a6a5638d4aeb257aea68e09f8ea39b017aed73d196b7f9a6bd9d1ecfd",
"name"=>"django",
"stream"=>"1.6",
"artifacts"=>
["python-django-bash-completion-0:1.6.11.7-1.module_1560+089ce146.noarch",
"python2-django-0:1.6.11.7-1.module_1560+089ce146.noarch"],
"checksum"=>
"5c6054966a7981e48e2e8b2b7f9e2a33fc58ae36cb7aeab9a0cb096b16739f50",
"_last_updated"=>1533230776,
"_content_type_id"=>"modulemd",
"profiles"=>
{"default"=>["python2-django"], "python2_development"=>["python2-django"]},
"summary"=>"A high-level Python Web framework",
"downloaded"=>true,
"version"=>20180307130104,
"pulp_user_metadata"=>{},
"context"=>"c2c572ec",
"_ns"=>"units_modulemd",
"_id"=>"33d9aff8-2c70-42ac-b0fc-0a8eef87f266",
"arch"=>"noarch",
"description"=>
"Django is a high-level Python Web framework that encourages rapid development and a clean, pragmatic design. It focuses on automating as much as possible and adhering to the DRY (Don't Repeat Yourself) principle."},
"updated"=>"2018-08-02T17:26:16Z",
"repo_id"=>"311e01ab-29b7-4b3c-90f4-29b17480b22e",
"created"=>"2018-08-02T17:26:16Z",
"unit_type_id"=>"modulemd",
"unit_id"=>"33d9aff8-2c70-42ac-b0fc-0a8eef87f266",
"_id"=>{"$oid"=>"5b633eb8cc36bbe621415477"}}
Note:
"artifacts"=>
["python-django-bash-completion-0:1.6.11.7-1.module_1560+089ce146.noarch",
"python2-django-0:1.6.11.7-1.module_1560+089ce146.noarch"],
What katello would like with respect to the the publish operation is a rpm uuid/unit it mapping for each of these rpms. This will aid katello in accounting for rpms that got copied over and hence make the determination on the modules to copy over.
Suggested API change¶
Artifacts field can't be modified due to semver reasons.
Using serializer, add a new field to the output of a module:
"pulp_artifacts_map"=>
[{"filename": "python-django-bash-completion-1.6.11.7-1.module_1560+089ce146.noarch",
"unit_id": <uuid>},
{"filename":"python2-django-1.6.11.7-1.module_1560+089ce146.noarch",
"unit_id": <uuid>}]
filename
should correspond to the filename in Pulp and can/will be different from the one mentioned in the artifacts
field.
unit_id
is a UUID of an RPM related to the module of interest.
Note: at the moment only RPMs can be present in the artifacts, so it's not necessary to have type
for each entry of the pulp_artifacts_map
. At any point later it can be added if needed, since it's an additive change.
Suggested solution¶
Create a separate collection which maps a module to an RPM, modulemd_artifact_map
Two fields: modulemd_id
and artifact_id
, and they are unique together (an RPM can belong to multiple modules)
Records are created at sync or upload time.
Records are removed in a post_delete hook of Modulemd model.
Sync: create/update(?) mapping during sync.
RPM upload: check if an RPM is modular and if it is, require to provide module NSVCA.
Modulemd upload: create/update map for RPMs which are in a repo, no requirement to have all artifacts in a repo.
Copy:
- during RPM copy - check if RPM is modular and if it is, check if it's a part of any module in a destination repo.
- during Modulemd copy - check if any RPMs in a destination repo belong to that module.
Publish: no changes
Removal of a modulemd content unit from Pulp: remove mapping in post-save hook for modulemd
Migration: process existing modules and modular rpms and create the mapping collection, focus on performance
Further potential improvements¶
Unassociation of RPM: if RPM is modular and module is still in a repo => reject/complain. (This behaviour might be unwanted by some users)
Applicability: leverage the mapping when deciding which modules are applicable.
Depsolving: maybe the part which copies artifacts can be improved
Open questions¶
- Do we want to have a more generic map? parent_id and child_id? E.g. is there a need to map modulemd_defaults to modulemds in the future?
- What to do at re-sync time? Do we need to check that no artifacts were added every time?
Related issues
Updated by ipanova@redhat.com over 6 years ago
What you see on the module is translated information provided by libmodulemd. To provide the rpm_uuid information in the module. we'd need to before saving the module, query the rpm_colletions to find the uuid which 1) will be time consuming 2) that rpm can actually be removed from the db and the modulemd will reference the uuid which is not valid anymore. 3) the rpm is not in the db yet 4) or it is in the db but not associated yet to the destination repo.
Another reasoning is that this feature will be unacceptable for CDT. The way they compose the repos, this feature will limit them and make it impossible to follow their regular workflow. As far as i understood first the modules are copied over into destination repos and then rpm get uploaded.
Read the constraints and pattern usage [0] It is described the relationship between modules and defaults, but it applies same workflow for modules and rpms.
What prevents you do to same query to find the rpm UUID as we would do at moment of the Modulemd module saving?
Updated by jomitsch@redhat.com over 6 years ago
Thanks for raising these concerns ina, I can explain our issue in Katello so maybe we can come up with a solution.
We have content views in Katello, which have versions that are individual repos in pulp. These content views allow filtering, for example I can exclude rpm 'foo' from a content view, publish a new version, which creates a pulp repo where 'foo' is excluded.
We now are including Module Streams in these content view versions.
When a user has Content View Version that has filtered out i.e. 'nodejs' packages, the packages will not be included in the new version (a new repo in pulp) but the module streams will still be copied over. This could include module streams that include 'nodejs' artifacts. So the end result is the user can enable module streams like a 'nodejs' module stream, but it doesn't actually have the underlying rpms in the repo.
To solve this, we were hoping that we could associate module stream artifacts to rpms themselves. We are discussing matching the rpm name with the artifact string and scoping per-repository as an alternative, but are wondering if pulp has any solutions
Updated by dkliban@redhat.com over 6 years ago
- Tracker changed from Issue to Story
- % Done set to 0
Updated by ttereshc over 6 years ago
I agree with Ina that we should not change a DB model, because it can easily bring data inconsistency.
I'm not sure that we can change our API (as requested in the description) due to semver. It's not just an addition, but change of the format.
How Katello identifies which modules to copy right now?
You say that it's possible that module is copied but not all underlying rpms are present in the destination repo. Correct?
Does Katello use recursive copy for modules? It should coppy all the rpms as well.
Updated by bmbouter over 5 years ago
- Status changed from NEW to CLOSED - WONTFIX
Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.
Updated by ttereshc over 5 years ago
- Subject changed from Need api for Module md artifact -> rpm associations to As a user, I can know artifacts which belong to a specific module
- Status changed from CLOSED - WONTFIX to NEW
Re-opening as per Katello request.
Updated by ttereshc over 5 years ago
- Subject changed from As a user, I can know artifacts which belong to a specific module to As a user, I can see artifacts which belong to a specific module
Updated by ttereshc over 5 years ago
- Subject changed from As a user, I can see artifacts which belong to a specific module to As a user, I can get artifacts which belong to a specific module
- Description updated (diff)
Updated by ipanova@redhat.com over 5 years ago
Unassociation of RPM: I would let users do what they want to, as you mentioned it might be unwanted restriction. We need to make sure we have this documented as well explained consequences and side effects.
What would be the benefit of mapping defaults and modulemd? They do not depend on each other and when we copy/remove modules we do not touch defaults and viceversa.
At re-sync time: as far as i remember NSVCA of a module would change( i.e it would be considered as a new different module) in case list of artifact would change, so during re-sync if we find a module with id X in the mapping collection then we just skip it, otherwise we would create a mapping because it would be a new module entry
Updated by ttereshc over 5 years ago
I think you are right, we shouldn't be worried about changes at resync time.
FWIW, I added changes for the copy operation. Forgot about it initially.
Updated by ttereshc over 5 years ago
- Status changed from NEW to CLOSED - WONTFIX
The changes for the request are quite broad and large.
The main driving reason was to copy modules and their artifacts without dependencies.
This will be solved as a part of #4718 - artifacts will be copied by default.
Updated by ttereshc over 5 years ago
- Related to Issue #4718: Module integrity is not preserved at copy time added