Project

Profile

Help

Story #9316

closed

As a user, I can mirror the packages in a repo (kick out ones that are no longer in the upstream) without mirroring the metadata.

Added by dalley over 3 years ago. Updated over 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 107
Quarter:
Q4-2021

Description

This used to be how things worked, then we changed how mirroring worked. But there are rare cases when metadata mirroring is not possible (see related issue) so we should add back a way to do this.

edit: I have two ideas. They are not mutually exclusive, we could implement the first and then later move in the direction of the second without much trouble.

Idea #1

A simple extension of our current API. On the sync URL, we would support one additional parameter, which is only valid if mirror=True (else it will either do nothing, or fail to validate). Because it only adds a small option to the sync API and can be done on a plugin by plugin basis, there is no need for a migration.

mirror=False means additive sync, the same as it does currently

For mirror=True mirror_type=$value, a value of:

  1. mirror_type="original" (or "exact") would signify metadata mirroring
  2. mirror_type="reproduction" would signify content mirror (+ autopublish? *)

The exact names of the modes are open for discussion.

* If all mirror modes functioned as though they did an immediate publish, it might allow Katello and RHUI to drop some separate codepaths. On the other hand it would carry the limitation that you can't do a content-only-mirror sync without publishing.

Idea #2

We deprecate the mirror option globally, and add a "sync_policy" option to replace it. This would work similarly to "download_policy" and would be a setting on the remote. Instead of being a boolean, many different options could be available, potentially customizable by the plugins. We would discourage using the mirror or mirror_type options to override this setting, although we would allow it for backwards compatibility.

For metadata mirroring, content-only-mirroring, and additive sync respectively the options would be one of:

  1. clone | mirror | replica | exact | exact_clone | exact_replica | exact_mirror
  2. content_mirror | content_clone | inexact_clone | inexact_replica | inexact_mirror | reproduction
  3. additive

Because this involves changing the remote - probably the base remote class, this would require a migration and a more substantial time investment across plugins.

Since more flexibility in function is allowed than mirror_type, it would be hard for this option to have consistent behavior re: ending with a publication.


Related issues

Related to RPM Support - Issue #9303: "xml:base" / "location_base" feature of RPM metadata is incompatible with some Pulp use cases, and is handled incorrectly in othersCLOSED - CURRENTRELEASEdalleyActions
Related to RPM Support - Issue #9231: The interaction of skip_types and mirror=True is unintuitiveCLOSED - DUPLICATEdalleyActions
Related to Pulp - Story #8856: As a user, I have a convenient UX for mirroring repositoriesCLOSED - DUPLICATE

Actions
Actions #1

Updated by dalley over 3 years ago

  • Related to Issue #9303: "xml:base" / "location_base" feature of RPM metadata is incompatible with some Pulp use cases, and is handled incorrectly in others added
Actions #2

Updated by dalley over 3 years ago

  • Quarter set to Q4-2021
Actions #3

Updated by dalley over 3 years ago

  • Tracker changed from Issue to Story
  • % Done set to 0
  • Severity deleted (2. Medium)
  • Triaged deleted (No)

Ask Katello about implementation details

Actions #4

Updated by dalley over 3 years ago

  • Related to Issue #9231: The interaction of skip_types and mirror=True is unintuitive added
Actions #5

Updated by dalley over 3 years ago

  • Sprint set to Sprint 106
Actions #6

Updated by dalley over 3 years ago

  • Description updated (diff)
Actions #7

Updated by dalley over 3 years ago

  • Description updated (diff)
Actions #8

Updated by dalley over 3 years ago

  • Description updated (diff)
Actions #9

Updated by dalley over 3 years ago

  • Description updated (diff)
Actions #10

Updated by ggainey over 3 years ago

One issue I have w/ Option 2 is that I believe that these settings should stay owned by the repository. I can have a remote that is used for two different repositories - one wants to reflect the exact state of that upstream to its clients (ie mirror), and another wants to have control over the content and publication.

Also - how do plugins that don't publish (e.g. pulp_file) respond if this is put into a base class (either Remote or Repository)?

Anyway - these are just initial thoughts, need to mull this over some more.

Actions #11

Updated by dalley over 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley
Actions #12

Updated by dalley over 3 years ago

  • Related to Story #8856: As a user, I have a convenient UX for mirroring repositories added
Actions #13

Updated by dalley over 3 years ago

One issue I have w/ Option 2 is that I believe that these settings should stay owned by the repository. I can have a remote that is used for two different repositories - one wants to reflect the exact state of that upstream to its clients (ie mirror), and another wants to have control over the content and publication.

If you think about the use case for why we need this, it makes sense to put the information on the remote permanently. We don't want the user to have to keep track of which repos can be synced in mirrored-metadata mode and which can't be. Katello sidesteps this issue because they have a separate set of settings stored in their database, but Pulp CLI / API users don't.

I do think option 2 would be something we would try to do alongside a push to discourage workflows apart from 1-to-1 mappings between remotes and repos. The core issue is that in the latter case where the user wants to have control, having remotes doesn't make sense at all because it would interfere strongly with their modifications. We should instead be bolstering the copy / modify workflows to suit their needs.

This is kind of somewhat outlined in this issue: https://pulp.plan.io/issues/8856

Also - how do plugins that don't publish (e.g. pulp_file) respond if this is put into a base class (either Remote or Repository)?

The file plugin does actually publish but I take your point. The supported sync policies wouldn't be universal. So each plugin would allow the subset of sync policies that they support.

Actions #14

Updated by pulpbot over 3 years ago

  • Status changed from ASSIGNED to POST
Actions #15

Updated by ipanova@redhat.com over 3 years ago

I like idea number #1. For plugins that do not have repodata, options metadata mirroring and content-ony-mirroring would be behaving in the same way. It feels wrong to add unnecessary complexity and confusion for the plugins that would not benefit from this change because they would not use that mode.

Posting here some convo about the option naming ideas.

ipanova
what about we be explicit and have a combination of "mirror=True and --content_only" and "mirror=True and --all"
or "mirror=true and --clone"
or replica
when mirror set to true content_only and replica will be mutually exclusive
and when mirror set to false they can be just ignored
dralley
I'm a little worried about doing that in the API because it might paint ourselves into another corner.  I'm not sure if we will ever need a new mode - fwiw I can't think of one we might need, but since we just got burned :(

For the CLI though that sounds great, we do have some convenience options like that where it doesn't map directly to the API

ipanova

    dralley
    I'm a little worried about doing that in the API because it might paint ourselves into another corner.  I'm not sure if we will ever need a new mode - fwiw I can't think of one we might need, but since we just got burned :(

    For the CLI though that sounds great, we do have some convenience options like that where it doesn't map directly to the API

well yeah this is a good point

I mean i cant think of yet another mode either :D
dralley
it's a completely reasonable option and might be fine if we don't need any more modes though.  It's worth discussing
ipanova
but in case there will be one in the future we could deprecate the content-only and replica and introduce mirror_modes. The reason i am trying to think of not using 'mirror and mirror_mode' because this is confusing
because mirror itself is already a mode itself for syncing operation
Actions #16

Updated by ggainey over 3 years ago

RE naming, for Option 1 - how about "full-clone" and "content-clone"? I'm trying to have values that mean what they're doing, without being sentences :)

Actions #17

Updated by jsherril@redhat.com over 3 years ago

my vote for naming: mirror_type=complete vs mirror_type=content_only

alternative: mirror_type=complete vs mirror_type=regenerate_metadata

Actions #18

Updated by ttereshc over 3 years ago

+1 to option 1

as for the naming:
+1 to mirror_type=complete vs mirror_type=content_only
+0 to mirror_type=clone | exact vs mirror_type=content_only

Actions #19

Updated by quba42 over 3 years ago

I have not yet read the whole issue, but I would just like to insert myself to declare that I have a somewhat related problem for pulp_deb:

https://pulp.plan.io/issues/8756

In the pulp_deb case, it is mirror=false that is causing problems, since metadata types should always be mirrored (since it does not make sense/is not possible to have multiple versions of a single metadata file within a single repo version).

Perhaps we could have some design discussion about metadata and mirroring?

Actions #20

Updated by dalley over 3 years ago

@quba It's not quite the same issue [0] but I do think what you brought up is relevant to the discussion.

It sounds like, even without taking the RPM sense of "metadata mirroring" into consideration, mirror is doing too much and isn't applicable to all plugins. For the Debian plugin these issues might be solvable but, for a theoretical git plugin, what would mirror=False even mean? And some registries don't permit removing content, so it would be useless in that situation also.

Maybe we should aim to avoid making mirror any more important than it already is. How about:

Option 3

(this is still only referring to RPM plugin changes, not Debian or pulpcore)

We take the "sync_policy" option from option 2 and instead put it on the /sync/ endpoint as in option 1 instead of on the remote. mirror would remain and still have backwards compatible behavior, but for the purposes of the RPM plugin it would be deprecated.

sync_policy={ additive, mirror_complete, mirror_content_only }

  • If sync_policy is not explicitly provided and mirror=False, the default would be additive
  • If sync_policy is not explicitly provided and mirror=True, the default would be mirror_complete
  • If sync_policy is explicitly provided we would ignore the value of mirror or perhaps raise an error if the values conflict.

(the default value of mirror is False, so the default mode of a sync without either explicitly set would be additive, which is also backwards compatible)

Like option 1, there's no migration, but we avoid using or promoting reliance on any over-specialized flags. And we could pull mirror out of the core serializer and move it to the plugins, because it doesn't make sense for all plugins. And new plugins can avoid it entirely, since it's only there for backwards compat.


[0] You should probably either A) kick out the old metadata when validating the repository version e.g. [1] or adopt a strategy more similar to what the RPM plugin is doing - create a publication with the metadata directly at sync-time, instead of storing metadata as content

[1] https://github.com/pulp/pulp_rpm/blob/6a317cb5ac789772d1ddc2b571c9bdc1c387c847/pulp_rpm/app/models/repository.py#L276-L321

Actions #21

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 106 to Sprint 107
Actions #22

Updated by dkliban@redhat.com over 3 years ago

Each existing plugin should provide it's own implementation of the 'mirror' parameter for the repository sync API. All new plugins should carefully consider if they want to provide such a parameter or if they want to provide something that makes more sense for their users.

Actions #23

Updated by quba42 over 3 years ago

dalley, Thanks for the consideration and suggestions. I put a topic on tomorrows "Open Floor Agenda" as well. Perhaps we could discuss this a bit more there.

Actions #24

Updated by dalley over 3 years ago

  • Sprint/Milestone set to 3.16.0

Added by dalley over 3 years ago

Revision 3b275d9b | View on GitHub

As a user, I can perform a content-only mirror of a repo.

Required PR: https://github.com/pulp/pulpcore/pull/1684

closes: #9316 https://pulp.plan.io/issues/9316

Actions #25

Updated by dalley over 3 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #26

Updated by pulpbot over 3 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF