Story #9316
closedAs a user, I can mirror the packages in a repo (kick out ones that are no longer in the upstream) without mirroring the metadata.
100%
Description
This used to be how things worked, then we changed how mirroring worked. But there are rare cases when metadata mirroring is not possible (see related issue) so we should add back a way to do this.
edit: I have two ideas. They are not mutually exclusive, we could implement the first and then later move in the direction of the second without much trouble.
Idea #1¶
A simple extension of our current API. On the sync URL, we would support one additional parameter, which is only valid if mirror=True (else it will either do nothing, or fail to validate). Because it only adds a small option to the sync API and can be done on a plugin by plugin basis, there is no need for a migration.
mirror=False
means additive sync, the same as it does currently
For mirror=True mirror_type=$value
, a value of:
-
mirror_type="original"
(or "exact") would signify metadata mirroring -
mirror_type="reproduction"
would signify content mirror (+ autopublish? *)
The exact names of the modes are open for discussion.
* If all mirror modes functioned as though they did an immediate publish, it might allow Katello and RHUI to drop some separate codepaths. On the other hand it would carry the limitation that you can't do a content-only-mirror sync without publishing.
Idea #2¶
We deprecate the mirror
option globally, and add a "sync_policy" option to replace it. This would work similarly to "download_policy" and would be a setting on the remote. Instead of being a boolean, many different options could be available, potentially customizable by the plugins. We would discourage using the mirror
or mirror_type
options to override this setting, although we would allow it for backwards compatibility.
For metadata mirroring, content-only-mirroring, and additive sync respectively the options would be one of:
- clone | mirror | replica | exact | exact_clone | exact_replica | exact_mirror
- content_mirror | content_clone | inexact_clone | inexact_replica | inexact_mirror | reproduction
- additive
Because this involves changing the remote - probably the base remote class, this would require a migration and a more substantial time investment across plugins.
Since more flexibility in function is allowed than mirror_type
, it would be hard for this option to have consistent behavior re: ending with a publication.
Related issues
Updated by dalley over 3 years ago
- Related to Issue #9303: "xml:base" / "location_base" feature of RPM metadata is incompatible with some Pulp use cases, and is handled incorrectly in others added
Updated by dalley over 3 years ago
- Tracker changed from Issue to Story
- % Done set to 0
- Severity deleted (
2. Medium) - Triaged deleted (
No)
Ask Katello about implementation details
Updated by dalley over 3 years ago
- Related to Issue #9231: The interaction of skip_types and mirror=True is unintuitive added
Updated by ggainey over 3 years ago
One issue I have w/ Option 2 is that I believe that these settings should stay owned by the repository. I can have a remote that is used for two different repositories - one wants to reflect the exact state of that upstream to its clients (ie mirror), and another wants to have control over the content and publication.
Also - how do plugins that don't publish (e.g. pulp_file) respond if this is put into a base class (either Remote or Repository)?
Anyway - these are just initial thoughts, need to mull this over some more.
Updated by dalley over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
Updated by dalley over 3 years ago
- Related to Story #8856: As a user, I have a convenient UX for mirroring repositories added
Updated by dalley over 3 years ago
One issue I have w/ Option 2 is that I believe that these settings should stay owned by the repository. I can have a remote that is used for two different repositories - one wants to reflect the exact state of that upstream to its clients (ie mirror), and another wants to have control over the content and publication.
If you think about the use case for why we need this, it makes sense to put the information on the remote permanently. We don't want the user to have to keep track of which repos can be synced in mirrored-metadata mode and which can't be. Katello sidesteps this issue because they have a separate set of settings stored in their database, but Pulp CLI / API users don't.
I do think option 2 would be something we would try to do alongside a push to discourage workflows apart from 1-to-1 mappings between remotes and repos. The core issue is that in the latter case where the user wants to have control, having remotes doesn't make sense at all because it would interfere strongly with their modifications. We should instead be bolstering the copy / modify workflows to suit their needs.
This is kind of somewhat outlined in this issue: https://pulp.plan.io/issues/8856
Also - how do plugins that don't publish (e.g. pulp_file) respond if this is put into a base class (either Remote or Repository)?
The file plugin does actually publish but I take your point. The supported sync policies wouldn't be universal. So each plugin would allow the subset of sync policies that they support.
Updated by pulpbot over 3 years ago
- Status changed from ASSIGNED to POST
Updated by ipanova@redhat.com over 3 years ago
I like idea number #1.
For plugins that do not have repodata, options metadata mirroring
and content-ony-mirroring
would be behaving in the same way. It feels wrong to add unnecessary complexity and confusion for the plugins that would not benefit from this change because they would not use that mode.
Posting here some convo about the option naming ideas.
ipanova what about we be explicit and have a combination of "mirror=True and --content_only" and "mirror=True and --all" or "mirror=true and --clone" or replica when mirror set to true content_only and replica will be mutually exclusive and when mirror set to false they can be just ignored dralley I'm a little worried about doing that in the API because it might paint ourselves into another corner. I'm not sure if we will ever need a new mode - fwiw I can't think of one we might need, but since we just got burned :( For the CLI though that sounds great, we do have some convenience options like that where it doesn't map directly to the API ipanova dralley I'm a little worried about doing that in the API because it might paint ourselves into another corner. I'm not sure if we will ever need a new mode - fwiw I can't think of one we might need, but since we just got burned :( For the CLI though that sounds great, we do have some convenience options like that where it doesn't map directly to the API well yeah this is a good point I mean i cant think of yet another mode either :D dralley it's a completely reasonable option and might be fine if we don't need any more modes though. It's worth discussing ipanova but in case there will be one in the future we could deprecate the content-only and replica and introduce mirror_modes. The reason i am trying to think of not using 'mirror and mirror_mode' because this is confusing because mirror itself is already a mode itself for syncing operation
Updated by ggainey over 3 years ago
RE naming, for Option 1 - how about "full-clone" and "content-clone"? I'm trying to have values that mean what they're doing, without being sentences :)
Updated by jsherril@redhat.com over 3 years ago
my vote for naming: mirror_type=complete vs mirror_type=content_only
alternative: mirror_type=complete vs mirror_type=regenerate_metadata
Updated by ttereshc over 3 years ago
+1 to option 1
as for the naming:
+1 to mirror_type=complete vs mirror_type=content_only
+0 to mirror_type=clone | exact vs mirror_type=content_only
Updated by quba42 over 3 years ago
I have not yet read the whole issue, but I would just like to insert myself to declare that I have a somewhat related problem for pulp_deb:
https://pulp.plan.io/issues/8756
In the pulp_deb case, it is mirror=false
that is causing problems, since metadata types should always be mirrored (since it does not make sense/is not possible to have multiple versions of a single metadata file within a single repo version).
Perhaps we could have some design discussion about metadata and mirroring?
Updated by dalley over 3 years ago
@quba It's not quite the same issue [0] but I do think what you brought up is relevant to the discussion.
It sounds like, even without taking the RPM sense of "metadata mirroring" into consideration, mirror
is doing too much and isn't applicable to all plugins. For the Debian plugin these issues might be solvable but, for a theoretical git plugin, what would mirror=False
even mean? And some registries don't permit removing content, so it would be useless in that situation also.
Maybe we should aim to avoid making mirror
any more important than it already is. How about:
Option 3¶
(this is still only referring to RPM plugin changes, not Debian or pulpcore)
We take the "sync_policy" option from option 2 and instead put it on the /sync/ endpoint as in option 1 instead of on the remote. mirror
would remain and still have backwards compatible behavior, but for the purposes of the RPM plugin it would be deprecated.
sync_policy={ additive, mirror_complete, mirror_content_only }
- If
sync_policy
is not explicitly provided andmirror=False
, the default would beadditive
- If
sync_policy
is not explicitly provided andmirror=True
, the default would bemirror_complete
- If
sync_policy
is explicitly provided we would ignore the value ofmirror
or perhaps raise an error if the values conflict.
(the default value of mirror
is False, so the default mode of a sync without either explicitly set would be additive
, which is also backwards compatible)
Like option 1, there's no migration, but we avoid using or promoting reliance on any over-specialized flags. And we could pull mirror
out of the core serializer and move it to the plugins, because it doesn't make sense for all plugins. And new plugins can avoid it entirely, since it's only there for backwards compat.
[0] You should probably either A) kick out the old metadata when validating the repository version e.g. [1] or adopt a strategy more similar to what the RPM plugin is doing - create a publication with the metadata directly at sync-time, instead of storing metadata as content
Updated by dkliban@redhat.com over 3 years ago
Each existing plugin should provide it's own implementation of the 'mirror' parameter for the repository sync API. All new plugins should carefully consider if they want to provide such a parameter or if they want to provide something that makes more sense for their users.
Updated by quba42 over 3 years ago
dalley, Thanks for the consideration and suggestions. I put a topic on tomorrows "Open Floor Agenda" as well. Perhaps we could discuss this a bit more there.
Added by dalley over 3 years ago
Updated by dalley over 3 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset 3b275d9b6757bc55617cd969b49912459f1d0aa2.
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
As a user, I can perform a content-only mirror of a repo.
Required PR: https://github.com/pulp/pulpcore/pull/1684
closes: #9316 https://pulp.plan.io/issues/9316