Project

Profile

Help

Story #6353

As a user, I can mirror RPM repository content and metadata

Added by dkliban@redhat.com 10 months ago. Updated 2 months ago.

Status:
NEW
Priority:
Normal
Assignee:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

Motivation

  • Clients installing packages from RPM mirrors hosted by Pulp don't have access to the original metadata provided in the remote repository.
  • There are problems with caching and/or load-balancing if multiple instances of pulp produce different metadata syncing from the same remote repository.
  • If a repo contains duplicated content under different paths, such repo can't be synced at all, unless a path is a part of the content natural key.

Proposed solution.

Add ability to create repository versions that contain the original metadata from the remote repository.

This could be accomplished by the following:

  • Have a way to distinguish between repositories with managed content and with the exact mirror (e.g. create a repository with exact_mirror=True or a new dedicated repository type, RpmMirrorRepository)
  • For such repos, create a publication at sync time (includes published artifacts and metadata).
  • For such repos, publish is no-op and always returns the existing publication for the requested repo version.
  • For such repos, no modifications are allowed except the sync in mirror mode.

Pros

Cons

  • doesn't solve the problem of various relative paths for the same content in general way
  • a separate code path (at times) to handle this type of repositories.

Related issues

Related to Pulp - Story #5200: Support 'mirrored' metadataCLOSED - WONTFIX

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

History

#1 Updated by dkliban@redhat.com 10 months ago

  • Description updated (diff)

#2 Updated by ttereshc 10 months ago

  • Sprint/Milestone set to Pulp 3.x RPM (Katello 3.16)

#3 Updated by lmjachky 10 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to lmjachky

#4 Updated by rchan 10 months ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (lmjachky)
  • Sprint/Milestone deleted (Pulp 3.x RPM (Katello 3.16))

#5 Updated by rchan 10 months ago

  • Sprint/Milestone set to Pulp 3.x RPM (Katello 3.16)

#6 Updated by ttereshc 9 months ago

  • Sprint/Milestone changed from Pulp 3.x RPM (Katello 3.16) to Priority items (outside of planned milestones/releases)

#7 Updated by ttereshc 9 months ago

  • Priority changed from Normal to High

#8 Updated by ttereshc 8 months ago

  • Sprint/Milestone changed from Priority items (outside of planned milestones/releases) to Pulp 3.x RPM (Katello 4.1)

#9 Updated by jsherril@redhat.com 8 months ago

The RPMDistribution will need to support users providing a repository or a repository version in addition to publications.

ideally we wouldn't have to generate a normal yum publication when going this route, as those are quite expensive to generate.

#10 Updated by dkliban@redhat.com 8 months ago

wrote:

The RPMDistribution will need to support users providing a repository or a repository version in addition to publications.

ideally we wouldn't have to generate a normal yum publication when going this route, as those are quite expensive to generate.

You would not need to create a publication. That's why we need to be able to serve the repository version directly.

#11 Updated by ttereshc 8 months ago

  • Priority changed from High to Normal

#12 Updated by ttereshc 7 months ago

  • Related to Story #5200: Support 'mirrored' metadata added

#13 Updated by sskracic 4 months ago

This feature would be very welcome in RHUI as it would save us from regenerating the metadata every time the repo content is updated. So yes, it gets our votes!

#14 Updated by ttereshc 3 months ago

  • Description updated (diff)

#15 Updated by dalley 3 months ago

Open question: Should the DeclarativeContent pipeline be extended to allow this functionality, or should it remain entirely within the plugin?

The latter might make more sense for the initial implementation, but if Debian wants to switch to this method we might want to be able to share the implementation.

This is a separate question from the invasive generic proposal.

#16 Updated by ipanova@redhat.com 3 months ago

dalley wrote:

Open question: Should the DeclarativeContent pipeline be extended to allow this functionality, or should it remain entirely within the plugin?

The latter might make more sense for the initial implementation, but if Debian wants to switch to this method we might want to be able to share the implementation.

I would suggest keeping the changes for now entirely in the plugin. Both RPM and Debian plugins are having a complex pipeline, would be good to first implement the proposal and then decouple what can be shared.

This is a separate question from the invasive generic proposal.

#17 Updated by ipanova@redhat.com 3 months ago

This could be accomplished by the following:

Have a way to distinguish between repositories with managed content and with the exact mirror (e.g. create a repository with exact_mirror=True or a new dedicated repository type, RpmMirrorRepository)

I think having a separate repo type will be a cleaner solution, we can disable endpoints we do not want to expose , for example /modify endpoint and also take control over what options to enable. I agree that this type of the repo should be immutable, meaning no content can be added to it or removed from it.

For such repos, create a publication at sync time (includes published artifacts and metadata).

I wonder how we would leave the room to the user to specify signing_service and gpg_check options?

For such repos, publish is no-op and always returns the existing publication for the requested repo version.

Apparently in this step we could allow user to re-publish the repo with signing_service and gpg_check options if needed, but definitely not allowing setting checksum_types

For such repos, no modifications are allowed except the sync in mirror mode.

I guess we should not allow skipping types.

I am wondering - has it been considered to add metadata as a separate content typse to the mirror repo type? This could allow us to distribute the repository right away without the need of creating the publication. On the other hand i would not know how we'd allow user setting a gpg_check option, for example. This idea is obviously far from being flawless, just throwing it on the table for discussion.

#18 Updated by ttereshc 2 months ago

I wonder how we would leave the room to the user to specify signing_service and gpg_check options?

I'm not sure that we need to provide a way to sign repo metadata here. The idea is to have a pure mirror of the remote repo without any changes. But maybe I'm just not aware of a use case and customers will be interested in it. I'm open for feedback here.

I am wondering - has it been considered to add metadata as a separate content type to the mirror repo type? This could allow us to distribute the repository right away without the need of creating the publication. On the other hand i would not know how we'd allow user setting a gpg_check option, for example. This idea is obviously far from being flawless, just throwing it on the table for discussion.

I believe it was one of the initial ideas. One of the reason the current proposal is different because it's potentially a one step closer to a more generalised solution for the relative_path problem while the separate content type for metadata won't help in such case. Another potential concern is managing this content type in some special way... it's not a content type you want to show to a user and you need to disallow copy operation for it, aka not associate this content with any other repo, etc.

Please register to edit this issue

Also available in: Atom PDF