Task #2868
closedPlatform support for publishing.
100%
Description
Overview¶
The platform needs to support the composition and inventory of publications. Each publication is a representation of a repository's content that can be consumed by a specific technology. An example of a technology is YUM. A publication is created by a Publisher associated with the repository and is provided by a plugin. A repository may have multiple publishers. A publication is composed of two types of files. The first is a content Artifact. The second, is metadata created by the publisher when the publication is created. Once created, a publication may be distributed for consumption in many ways. The platform will support common online distributing such as HTTP and HTTPS and offline distributing such as creating an ISO.
Additional Goals
- Clean separation between publishing and how a publication is distributed.
- Eliminate the use of symlinks as the primary method of publishing.
- Eliminate need for each plugin to provide an Apache .conf file for distributing via http/https.
- Prevent orphaned content from being deleted while published.
Design¶
Tables¶
[Publisher]---*[publication]---*[distribution]
[Exporter]
The Publication table contains publications and is associated to the Publisher that created it. A Distribution defines a method for making publications available for consumption. Attributes currently modeled on Publisher pertaining to distribution such as Auth would be removed. Auth would be handled by Apache and selected by user based Distribution.base_path.
Publication
- [pk] id - The primary key.
- [fk] publisher_id - The publisher that created the publication. Constraint ensure it's deleted when the publication is deleted.
- created - When the publication was created.
The PublishedArtifact table contains linkage to content Artifacts.
PublishedArtifact
- [pk] id - The primary key.
- [fk] publication_id - A publication. Constraint ensure it's deleted when the publication is deleted.
- [fk] artifact_id - An associated content artifact.
- relative_path - The relative path component within the URL that is also relative to the Publication.base_path.
The PublishedMetadata table contains linkage to generated metadata files.
PublishedMetadata
- [pk] id - The primary key.
- [fk] publication_id - A publication. Constraint ensure it's deleted when the publication is deleted.
- file - An absolute path to a metadata file. Stored in: /var/lib/pulp/published/metadata/<id>/<name>
- relative_path - The relative path component within the URL that is also relative to the Distribution.base_path.
Distribution¶
A Distribution maps a Publication of a distribution mechanism. It defines the URLs (paths) under which a publication is distributed, by which protocols and likely how access is authorized. Distributions also define which publications are live (visible for consumption) for a publisher (and thus repository).
Distribution
- [pk] id - The primary key.
- [fk] repository_id - An (optional) repository. When set, the publication is updated whenever a new publication is created by a publisher (for the specified repository).
- [fk] publisher_id - An (optional) publisher. When set, the publication is updated whenever a new publication is created by the publisher (for the specified repository).
- [fk] publication_id - An (optional) publication (mutable).
- name - The distribution name (Eg: rawhide, stable).
- base_path - The base path for the publication. This is the root of the path component of URLs.
- http - HTTP is enabled (bool). The distribution is served by pulp using HTTP.
- https - HTTPS is enabled (bool). The distribution is served by pulp using HTTPS.
An Exporter defines method of generating a static (external) file tree for a Distribution. An Exporter are not needed for the HTTP/HTTPS distribution mechanism provided by the pulp platform. They are only needed to generate an external representation of a Distribution. For example: rsync to static CDN or crane support. This is likely a master-detail pattern and only showing the master table here.
Exporter
- [pk] id - The primary key.
- name - The exporter name.
- last_export - The timestamp of the last successful export.
Sample Data¶
Publisher
-------------------------------
publisher-1, ...
Publication
-------------------------------
publication-1, publisher-1, ...
publication-2, publisher-1, ...
PublishedMetadata
-------------------------------
<id>, publication-1, /var/lib/pulp/published/../repodata/repomd.xml
<id>, publication-1, /var/lib/pulp/published/../repodata/primary.xml
PublishedArtifact
-------------------------------
<id>, publication-1, artifact-1, packages/dog.rpm
<id>, publication-1, artifact-2, packages/cat.rpm
Distribution
-------------------------------
<id>, publisher-1, publication-1, rawhide, f25/rawhide/x86_64, true
<id>, publisher-1, publication-2, stable, f25/stable/x86_64, false
General Flows¶
Create A Repository¶
1. Create a repository.
2. Create a publisher associated with the repository.
3. Create desired distributions associated with the publisher. Each distribution will be configured with
a base_path and http and/or https enabled as desired.
Publishing:¶
"The publisher will compose a publication"
1. Publisher creates a publication using the plugin API.
2. Publisher adds content artifacts to the publication.
3. Publisher generates some metadata files in the working dir.
4. Publisher adds the metadata files to the publication using the plugin API.
5. Publisher commits (publishes) the publication. The plugin API ensures this is atomic.
6. Distributions with auto_updated=true are updated with new publication_id.
Client makes a GET request for content (or metadata):¶
1. Request is routed to the content (WSGI) application (just like in pulp2 for RPM).
2. Query to get the Distribution.
3. Match the scheme against (http and https fields). 404 when scheme not enabled.
4. Query the PublishedArtifact and PublishedMetadata tables (in that order) by URL path component to get the artifact or the metadata.
5. forward the artifact storage path (or metadata path) to:
<not stored locally>
streamer
<stored locally>
x-send (or stream using django in dev environments)
6. Done.
Apache Configuration¶
The platform will provide an /etc/httpd/conf.d/pulp.conf that configures support for HTTP and HTTPS. Published content would be consumed using URLs with a base of:
/pulp/published/<path>
where path is the <Distribution.base_path>/(<PublishedArtifact.relative_path>|<PublishedMetadata.relative_path>)
Exporting¶
An exporter is executed using the plugin defined endpoint. Example: /api/exporter/<type>/export/.
Updated by jortel@redhat.com over 7 years ago
- Description updated (diff)
Discussed with @jsherrill on Katello team. Positive feedback overall.
How to support Auth is still an open question. The intention is to delegate as much of Auth to something outside of pulp (such as apache) as possible.
Discussions have included:
Option 1
Having the user configure locations within /pulp/published/ based on Auth needs. Then map distributions to those locations. This will present significant challenges to working with Katello/Candlepin. Mainly that there would likely be conflicts in URL (path) hierarchies needed for Candlepin vs. configured locations for Auth. For example: mapping protected repositories (distributions) to a location with a URL path prefix of protected/.
Option 2
Having the WSGI access script perform checks and annotate the request using headers. Checks would include things like:
- client certificate provided and verified
- entitlement verified
- basic auth verified
- other
Then, the Content (WSGI) app could check for headers in the request and enforce based on Auth properties of the Distribution. The heavy lifting is still delegated.
Other options?
Updated by jortel@redhat.com over 7 years ago
- Tags deleted (
Pulp 3 Plugin Writer Alpha)
Updated by bmbouter over 7 years ago
The x-send option when serving content I think is separate work that comes as its own feature which is enabled by configuration and defaults to off.
Can that be filed as a separate issue? Since this will only be run in dev environment in the nearterm this issue can only private that Django file serving base implementation.
Updated by bmbouter over 7 years ago
Providing the apache config looks good. I want to confirm that it will not be committed to pulp/pulp which has to stay as pure python. Instead the Apache configs that should be added to the Pulp install Ansible roles produced by #2840. I think producing that config file and adding it to the Ansible playbook can all be 1 task that is tracked independently.
What do others think?
Updated by jortel@redhat.com over 7 years ago
bmbouter wrote:
The x-send option when serving content I think is separate work that comes as its own feature which is enabled by configuration and defaults to off.
Can that be filed as a separate issue? Since this will only be run in dev environment in the nearterm this issue can only private that Django file serving base implementation.
Splitting #2895 into 2 tasks:
- content app without xsend
- update content app to be x-send aware and works with apache.
makes sense to me.
Updated by jortel@redhat.com over 7 years ago
bmbouter wrote:
Providing the apache config looks good. I want to confirm that it will not be committed to pulp/pulp which has to stay as pure python. Instead the Apache configs that should be added to the Pulp install Ansible roles produced by #2840. I think producing that config file and adding it to the Ansible playbook can all be 1 task that is tracked independently.
What do others think?
Makes sense.
Updated by ttereshc over 7 years ago
Publication
[pk] id - The primary key.
[fk] publisher_id - The publisher that created the publication. Constraint ensure it's deleted when the publication is deleted.
s/when the publication is deleted/when the publisher is deleted/ ?
The PublishedArtifact table contains linkage to both content Artifacts and generated metadata files.
Is it an outdated statement? iiuc, metadata is linked to a publication in its own table.
PublishedArtifact
[pk] id - The primary key.
[fk] publication_id - A publication. Constraint ensure it's deleted when the publication is deleted.
[fk] artifact_id - An (optional) associated content artifact.
Why artifact_id
is optional? To create publication and add artifact later? I am not sure I understand the use case here.
Distributions also define which publications are live
How? If publication_id
is present then it's live?
Distributors.
They are only needed to generate an external representation of a Distribution. For example: rsync to static CDN or crane support.
What about tracking the work which was done by those Distributors? Last time of rsync, for example.
If there are multiple Distributors for the same publication_id
, we probably would like to store some details related to each Distributor.
Or is all of that going to be defined on a specific Distributor model and just not on a Master one?
Updated by jortel@redhat.com over 7 years ago
ttereshc wrote:
Publication
[pk] id - The primary key.
[fk] publisher_id - The publisher that created the publication. Constraint ensure it's deleted when the publication is deleted.s/when the publication is deleted/when the publisher is deleted/ ?
Yes, deleted when publisher is deleted.
Also, we'll likely need a reaper task to clean up publications not associated with a Distribution.
The PublishedArtifact table contains linkage to both content Artifacts and generated metadata files.
Is it an outdated statement? iiuc, metadata is linked to a publication in its own table.
Yes, it's outdated and will fix it.
PublishedArtifact
[pk] id - The primary key.
[fk] publication_id - A publication. Constraint ensure it's deleted when the publication is deleted.
[fk] artifact_id - An (optional) associated content artifact.Why
artifact_id
is optional? To create publication and add artifact later? I am not sure I understand the use case here.
It's not. left over from earlier iteration of the design. Will fix it.
Distributions also define which publications are live
How? If
publication_id
is present then it's live?
Yes. When the publication's id is set in a Distribution, the Content App can find it when resolving URLs. Thus, the publication is not visible (live).
Distributors.
They are only needed to generate an external representation of a Distribution. For example: rsync to static CDN or crane support.What about tracking the work which was done by those Distributors? Last time of rsync, for example.
Yes, we could add that.
last_run.
If there are multiple Distributors for the same
publication_id
, we probably would like to store some details related to each Distributor.
Or is all of that going to be defined on a specific Distributor model and just not on a Master one?
Using the master/detail model like publishers, each concrete class of Distributor will have it's own model with whatever attributes needed.
Updated by bmbouter over 7 years ago
@jortel: FYI, we currently don't have a plan for reaper tasks. We may continue what we did in Pulp2 but I really hope we don't. I also don't think Pulp3 has that code in it right now.
Updated by amacdona@redhat.com over 6 years ago
- Status changed from NEW to CLOSED - CURRENTRELEASE
This issue is complete, all of the child tasks are MODIFIED.