Story #6353
As a user, I can mirror RPM repository content and metadata
0%
Description
Motivation¶
- Clients installing packages from RPM mirrors hosted by Pulp don't have access to the original metadata provided in the remote repository.
- There are problems with caching and/or load-balancing if multiple instances of pulp produce different metadata syncing from the same remote repository.
- If a repo contains duplicated content under different paths, such repo can't be synced at all, unless a path is a part of the content natural key.
Proposed solution.¶
Add ability to create repository versions that contain the original metadata from the remote repository.
This could be accomplished by the following:
- Have a way to distinguish between repositories with managed content and with the exact mirror (e.g. create a repository with exact_mirror=True or a new dedicated repository type, RpmMirrorRepository)
- For such repos, create a publication at sync time (includes published artifacts and metadata).
- For such repos, publish is no-op and always returns the existing publication for the requested repo version.
- For such repos, no modifications are allowed except the sync in mirror mode.
Pros¶
- non-invasive, only additive model changes
- can be implemented in a plugin which needs it or it can be moved to the pulpcore if it allows plugin input at certain points.
- leaves a way for further improvement to handle a more general case, see the full proposal here https://hackmd.io/02KBjCD3Q0WP7p4ALwzhJw#relative_path-in-PublishedArtifact-only
Cons¶
- doesn't solve the problem of various relative paths for the same content in general way
- a separate code path (at times) to handle this type of repositories.
Related issues
History
#1
Updated by dkliban@redhat.com about 1 year ago
- Description updated (diff)
#2
Updated by ttereshc about 1 year ago
- Sprint/Milestone set to Pulp 3.x RPM (Katello 3.16)
#3
Updated by lmjachky about 1 year ago
- Status changed from NEW to ASSIGNED
- Assignee set to lmjachky
#4
Updated by rchan about 1 year ago
- Status changed from ASSIGNED to NEW
- Assignee deleted (
lmjachky) - Sprint/Milestone deleted (
Pulp 3.x RPM (Katello 3.16))
#5
Updated by rchan about 1 year ago
- Sprint/Milestone set to Pulp 3.x RPM (Katello 3.16)
#6
Updated by ttereshc about 1 year ago
- Sprint/Milestone changed from Pulp 3.x RPM (Katello 3.16) to Priority items (outside of planned milestones/releases)
#9
Updated by jsherril@redhat.com 11 months ago
The RPMDistribution will need to support users providing a repository or a repository version in addition to publications.
ideally we wouldn't have to generate a normal yum publication when going this route, as those are quite expensive to generate.
#10
Updated by dkliban@redhat.com 11 months ago
jsherril@redhat.com wrote:
The RPMDistribution will need to support users providing a repository or a repository version in addition to publications.
ideally we wouldn't have to generate a normal yum publication when going this route, as those are quite expensive to generate.
You would not need to create a publication. That's why we need to be able to serve the repository version directly.
#12
Updated by ttereshc 10 months ago
- Related to Story #5200: Support 'mirrored' metadata added
#15
Updated by dalley 6 months ago
Open question: Should the DeclarativeContent pipeline be extended to allow this functionality, or should it remain entirely within the plugin?
The latter might make more sense for the initial implementation, but if Debian wants to switch to this method we might want to be able to share the implementation.
This is a separate question from the invasive generic proposal.
#16
Updated by ipanova@redhat.com 6 months ago
dalley wrote:
Open question: Should the DeclarativeContent pipeline be extended to allow this functionality, or should it remain entirely within the plugin?
The latter might make more sense for the initial implementation, but if Debian wants to switch to this method we might want to be able to share the implementation.
I would suggest keeping the changes for now entirely in the plugin. Both RPM and Debian plugins are having a complex pipeline, would be good to first implement the proposal and then decouple what can be shared.
This is a separate question from the invasive generic proposal.
#17
Updated by ipanova@redhat.com 6 months ago
This could be accomplished by the following:
Have a way to distinguish between repositories with managed content and with the exact mirror (e.g. create a repository with exact_mirror=True or a new dedicated repository type, RpmMirrorRepository)
I think having a separate repo type will be a cleaner solution, we can disable endpoints we do not want to expose , for example /modify endpoint and also take control over what options to enable. I agree that this type of the repo should be immutable, meaning no content can be added to it or removed from it.
For such repos, create a publication at sync time (includes published artifacts and metadata).
I wonder how we would leave the room to the user to specify signing_service and gpg_check options?
For such repos, publish is no-op and always returns the existing publication for the requested repo version.
Apparently in this step we could allow user to re-publish the repo with signing_service and gpg_check options if needed, but definitely not allowing setting checksum_types
For such repos, no modifications are allowed except the sync in mirror mode.
I guess we should not allow skipping types.
I am wondering - has it been considered to add metadata as a separate content typse to the mirror repo type? This could allow us to distribute the repository right away without the need of creating the publication. On the other hand i would not know how we'd allow user setting a gpg_check option, for example. This idea is obviously far from being flawless, just throwing it on the table for discussion.
#18
Updated by ttereshc 6 months ago
I wonder how we would leave the room to the user to specify signing_service and gpg_check options?
I'm not sure that we need to provide a way to sign repo metadata here. The idea is to have a pure mirror of the remote repo without any changes. But maybe I'm just not aware of a use case and customers will be interested in it. I'm open for feedback here.
I am wondering - has it been considered to add metadata as a separate content type to the mirror repo type? This could allow us to distribute the repository right away without the need of creating the publication. On the other hand i would not know how we'd allow user setting a gpg_check option, for example. This idea is obviously far from being flawless, just throwing it on the table for discussion.
I believe it was one of the initial ideas. One of the reason the current proposal is different because it's potentially a one step closer to a more generalised solution for the relative_path problem while the separate content type for metadata won't help in such case. Another potential concern is managing this content type in some special way... it's not a content type you want to show to a user and you need to disallow copy operation for it, aka not associate this content with any other repo, etc.
#19
Updated by dalley 3 months ago
- Blocked by Story #7815: As a plugin writer, pulpcore ensures that a job working directory is set/removed properly added
#21
Updated by rchan about 2 months ago
- Sprint changed from Sprint 90 to Sprint 91
#22
Updated by rchan about 1 month ago
- Sprint changed from Sprint 91 to Sprint 92
Please register to edit this issue