Project

Profile

Help

Story #5086

closed

As an user, I have an exporter that I can ship it on a disc or on a "dumb" webserver

Added by fachleitner almost 5 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Sprint:
Sprint 61
Quarter:

Description

Problem

To allow offline update or installation, as well as "cheap" hosting on a "dumb" http file server, export of pulp content
is required. Pulp2 has this in the manual https://docs.pulpproject.org/en/2.18/plugins/pulp_rpm/user-guide/recipes.html#export-repositories-and-repository-groups

Design

In Pulp3 an 'exporter' can send content out of a Pulp system. An exporter named FileSystemExporter should be created that accepts a full path to a file system it should export content. FileSystemExporter should be a base abstract class that contains the logic to export a set of content to a file system location. If the content is on the filesystem, it will use hardlinks. Otherwise (eg using S3) it will write the file to the filesystem.

The base FileSystemExporter should be extended in core by two Master classes, FileSystemRepoExporter and FileSystemPublicationExporter. The FileSystemRepoExporter takes either a 'repository' or 'repository_version' but not both. If 'repository' it will export the latest repository_version. The FileSystemPublicationExporter takes a publication and exports it.

A plugin can extend and define a Detail class for one of these two Master classes in order to export its content.

note Explore the possibility to export from multiple repos

On Demand Content

This feature does not work for on-demand content because the content must be present locally for Pulp to write it to a filesystem.


Related issues

Related to Pulp - Story #5559: As a plugin writer, I cannot export my Publication to POSIX filesystemsCLOSED - CURRENTRELEASEdaviddavis

Actions
Actions #1

Updated by bmbouter almost 5 years ago

How much de-duplication do you expect on the remote filesystem? Say you have Repo A and Repo B both which have package 'foo' with the same NEVRA and checksum. Should 1 or 2 copies be stored on the filesystem? Not having to de-duplicate remotely makes this a lot easier, but also not as awesome. Overall I'd like to do something simple first.

What are you requirements?

Actions #2

Updated by apollo13 almost 5 years ago

Since our main usecase would be generating quarterly updates we'd be perfectly fine with no deduplication for now. All in all, it would surely be nice, but not required.

Actions #3

Updated by bmbouter almost 5 years ago

  • Subject changed from As an operator, I want to export a distribution, such that I can ship it on a disc or on a "dumb" webserver to As an user, I have an exporter that I can ship it on a disc or on a "dumb" webserver
  • Description updated (diff)

We need to define the field's needed specifically.

Actions #4

Updated by bmbouter almost 5 years ago

  • Description updated (diff)
Actions #5

Updated by bmbouter almost 5 years ago

  • Sprint/Milestone set to 3.0.0
  • Platform Release deleted (3.0.0)
Actions #6

Updated by fachleitner almost 5 years ago

bmbouter wrote:

On Demand Content

This feature does not work for on-demand content because the content must be present locally for Pulp to write it to a filesystem.

I wonder if the remote content policy should really make a difference? I might get that wrong, but when the exporter requests the content "on_demand" content would be fetched from the remote or served from disk, "streamed" content would be streamed and "immediate" content would be served from disk, not?

Actions #7

Updated by bmbouter almost 5 years ago

fachleitner wrote:

I wonder if the remote content policy should really make a difference? I might get that wrong, but when the exporter requests the content "on_demand" content would be fetched from the remote or served from disk, "streamed" content would be streamed and "immediate" content would be served from disk, not?

These are great questions. This may be more info than you're interested in, but I want to share the info anyway.

Architecturally, the pulp content app and the pulp API are peer services that both face the user but not each other. Deploying Pulp in a way that the pulp content app also serves the Pulp API could be practically difficult as load balancers or network firewall and routing details begin to matter a lot. For example the services could be non-routable to each other.

Instead there is heavy code sharing between the pulp content app and the pulp api code in the form of the Pulp downloaders, downloader factory, etc. This means that for the exporter code to "get undownloaded content" it needs to make its own downloaders, download things, save them, etc, which means it effectively needs to cover all the same corner cases as a re-sync itself.

So if fetching on-demand content for export is equivalent to a re-sync only with policy='immediate' then we could either trigger one before export or ask the user to do the same. Auto-triggering a sync is practically complicated due to the plugin providing the sync endpoint, and also could be unexpected for a user who uses all policy='on_demand' or policy='streamed' to use this exporter and have to store a copy of all that in their Pulp in addition to their exported area (two copies).

For ^ reasons, I believe the least surprising thing to do is to halt an export that contains on-demand content and report to the user that they should re-sync the repository using policy to 'immediate' to acquire all content before trying to export it. Then they can run the export task again.

Feedback on this reasoning and other ideas are welcome.

Actions #8

Updated by daviddavis almost 5 years ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #9

Updated by rchan almost 5 years ago

  • Sprint set to Sprint 56
Actions #10

Updated by dkliban@redhat.com almost 5 years ago

This exporter should support accepting multiple repository versions.

Actions #11

Updated by fachleitner almost 5 years ago

bmbouter wrote:

For ^ reasons, I believe the least surprising thing to do is to halt an export that contains on-demand content and report to the user that they should re-sync the repository using policy to 'immediate' to acquire all content before trying to export it. Then they can run the export task again.

Feedback on this reasoning and other ideas are welcome.

Thanks, for the explanation. I think my question came from the assumption that the exporter would be a client of the content app, like reposync or yum/dnf. Do I understand correctly, that the exporter would be a seperate "service" (or "app"), parallel to the content server, and thus it needs the content on disk to be able to export it?

Actions #12

Updated by bmbouter almost 5 years ago

The user facing docs don't exist yet (this ticket to add it), but the Exporter is code that run in the Pulp tasking system (like all other Pulp tasks). The workflow is that it facilitates a "push" model where you can have pulp export content through a triggered command to Pulp.

The idea of having a "pull" tool to crawl the repository's contents as served from the Content App and save them as a concept for "export" would also be ok. In that case, to remain content type agnostic, tool would probably have to list content via the Pulp API and then fetch it from the content app one-by-one. That would work, and is more like a "pull" model executed from outside Pulp. That is a different idea than a Pulp Exporter though.

Actions #13

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 56 to Sprint 57
Actions #14

Updated by ipanova@redhat.com over 4 years ago

  • Description updated (diff)
Actions #15

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 57 to Sprint 58
Actions #16

Updated by daviddavis over 4 years ago

  • Sprint/Milestone changed from 3.0.0 to 71
Actions #17

Updated by bmbouter over 4 years ago

  • Sprint/Milestone changed from 71 to 3.0.0
Actions #18

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 58 to Sprint 59
Actions #19

Updated by daviddavis over 4 years ago

Per our discussion with bmbouter and dkliban, removing this from the sprint since we don't plan to work on it this sprint. This still should be up for consideration for 3.0 GA though.

Actions #20

Updated by daviddavis over 4 years ago

  • Sprint deleted (Sprint 59)
Actions #21

Updated by daviddavis over 4 years ago

  • Groomed changed from Yes to No
  • Sprint Candidate changed from Yes to No
Actions #22

Updated by daviddavis over 4 years ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #23

Updated by daviddavis over 4 years ago

I have a couple thoughts on this issue.

First, I think it might be better to start development in the plugins rather than core. I'm worried that we might make some incorrect assumptions about how plugins want to export their content. Then we'll be stuck with a solution that doesn't work for all plugins due to semver.

Secondly, I wonder if this needs to be a GA blocker. Do any stakeholders need this or does this work require backwards incompatible changes?

Actions #24

Updated by bmbouter over 4 years ago

daviddavis wrote:

I have a couple thoughts on this issue.

First, I think it might be better to start development in the plugins rather than core. I'm worried that we might make some incorrect assumptions about how plugins want to export their content. Then we'll be stuck with a solution that doesn't work for all plugins due to semver.

From a risk perspective starting in the plugins is a strategy I can understand, but there are challenges with that approach here. If we're not shipping it in core what ships it? We could pick "a plugin" and put it there, but this code would be compatible with any Publication so then users who don't use "that plugin" would need to install it a plugin they don't want to receive this.

Secondly, I wonder if this needs to be a GA blocker. Do any stakeholders need this or does this work require backwards incompatible changes?

There are two reasons I think this should be a GA blocker, but lmk what you think.

1) We need to be sure that the Artifact storage format of a Publication can be fully represented on disk. Currently it cannot due to POSIX filesystem incompatibilities between a file and directory with the same names. We know this by thinking about the problem, but we don't know what other issues we'll run into fulfilling that requirement. Fixing at least this known problem is 1 backwards incompatible change. This export story is the proof point that we can do that (and there aren't more backwards incompatible changes)

2) We need to have at least 1 working example for GA of an exporter to be sure it works. Currently there is no code that uses the existing core machinery for exporters.

What do you think about ^?

Actions #25

Updated by bmbouter over 4 years ago

  • Related to Story #5559: As a plugin writer, I cannot export my Publication to POSIX filesystems added
Actions #26

Updated by bmbouter over 4 years ago

  • Sprint set to Sprint 60
Actions #27

Updated by daviddavis over 4 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
Actions #28

Updated by daviddavis over 4 years ago

  • Description updated (diff)
Actions #29

Updated by bmbouter over 4 years ago

This looks good. 2 questions I had:

1) Are we going to raise a validation error when the repo/repo_version or publication doesn't have all Artifacts locally? That would be good to me so we fail early in that case.

2) I had expected the two objects would be Detail classes based off of this existing master object

Actions #30

Updated by rchan over 4 years ago

  • Sprint changed from Sprint 60 to Sprint 61
Actions #32

Updated by daviddavis over 4 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

Added by daviddavis over 4 years ago

Revision d9ee7ed3 | View on GitHub

Add support for exporting publications to file system.

Required PR: https://github.com/pulp/pulpcore/pull/362

ref #5086 https://pulp.plan.io/issues/5086

Actions #33

Updated by bmbouter over 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF