Story #4020
closedExtend Content App to serve Artifacts from ContentArtifact.relative_path data associated w/ the repo_version associated with the publication
100%
Description
Problem¶
Publishing a large repository, many times will create a huge number of PublishedArtifact objects. These effectively duplicate data that already exists in Pulp, e.g. ContentArtifact.relative_path.
Also recently a plugin writer pointed out they don't need PublishedMetadata objects, the content itself being served by a distribution at ContentArtifact.relative_path
is enough. They indicated (a) having them provide a publisher didn't add value for them and (b) having to make a huge number of duplicate records they don't need was concerning.
Solution¶
Introduce an attribute on Publication called pass_through
that defaults to False. If True, the content app would:
1. Query like it does normally, looking for PublishedArtifact and PublishedMetadata objects
2. Second, if Publication.pass_through == True
also search in the ContentArtifact.relative_path for units in the associated RepositoryVersion.
This is pretty easy to add because Publication already has a ForeignKey to RepositoryVersion Also note that PublishedArtifact and PublishedMetadata are still searched first.
Benefits¶
- Users who don't need PublishedMetadata will have a simpler experience (as they requested)
Related issues
Updated by dkliban@redhat.com about 6 years ago
Our current design allows the plugin writer to create publishers that can create publications that filter out some content from a repository version. This means users have two opportunities to compose a repository - at repository version creation time and when creating a publication. I would prefer to provide only one such opportunity at repository version creation time.
Another content type that does not require generating metadata at publish time is Maven. All the metadata is part of the content. So I can see a benefit for that plugin.
Updated by daviddavis about 6 years ago
+1 from me. I think PublishedArtifact is a remnant from before repo versions. Being able to remove the table and the need to create a ton of records for every publication would be a big improvement and simplification.
Also, publication already has a FK to repo version so I think we're good there.
Updated by jortel@redhat.com about 6 years ago
The purpose of the PublishedArtifact was to provide the publisher with the opportunity to publish each Artifact with a custom relative path.
For example: A (remote) DNF repository that looks like:
a.rpm
b.rpm
repodata/
The ContentArtifact.relative_path (and PublishedArtifact) for each artifact would be:
a.rpm
b.rpm
The PublishedArtifact provides for publishing in a different (custom) structure. Eg: to a packages/ directory.
packages/
a.rpm
b.rpm
repodata/
The PublishedArtifact.relative_path (for rpms) would be:
packages/a.rpm
packages/b.rpm
Perhaps we can support the publish-as-is use case by providing something like an as-is attribute to Publication since it already has a FK to the version. The content app could then resolve to ContentArtifact.relative_path as suggested when not matched to PublishedMetadata or PublishedArtifact. The core could provide base publishing that produces a PublishedVersion.
Updated by bmbouter about 6 years ago
@jortel I see what you're saying about it providing that customization point, but is anyone using it? I was trying to think if anyone was, but all the plugins I found were using ContentArtifact.relative_path as-is. Even RPM uses it as-is: https://github.com/pulp/pulp_rpm/blob/9cd7b8237194bc79b5454c7e53eaba673a21077a/pulp_rpm/app/tasks/publishing.py#L194
Everything we add takes away from the other things so if this feature isn't being used by any plugins we should consider removing it until it's needed by someone (I think). Are plugin writers using this?
Updated by gmbnomis about 6 years ago
Doesn't this proposal mean that, on content creation, the plugin has to decide on the exact relative location of the artifact in every publication that will ever occur?
What if:
- The plugin writer wants to change the publication layout like done for RPM in Pulp 2.12 [0]
- The plugin writer wants to support user selectable layouts when publishing
- There is a new "v2" of the repo structure, which necessitates path changes. The plugin writer wants to support publishing existent artifacts using the new structure.
I am not saying that this is absolutely necessary, but we should know the consequences of this decision. We could say that these cases are use cases for a live API because it is more complex than what Pulp core is willing to handle.
The other idea I had is what @jortel just described. :-)
[0] https://pulpproject.org/2016/11/14/yum-repo-layout-changes/
Updated by gmbnomis about 6 years ago
bmbouter wrote:
Everything we add takes away from the other things so if this feature isn't being used by any plugins we should consider removing it until it's needed by someone (I think). Are plugin writers using this?
Yes, I am using this in pulp_cookbook. But only because it is convenient. Content units just have the name of the tar file as relative path (e.g. pulp-1.2.3.tar.gz
). Publication is at cookbook_files/pulp/1_2_3/pulp-1.2.3.tar.gz
. But I could use this path at content creation already.
Updated by bmbouter about 6 years ago
- Subject changed from Replace PublishedArtifact with a reference to a RepositoryVersion to Extend Content App to serve Artifacts from ContentArtifact.relative_path data associated w/ the repo_version associated with the publication
- Description updated (diff)
I'm convinced from these posts that there are valid use cases for PublishedArtifact and PublishedMetadata. I rewrote this ticket to leave those things alone, and instead extend the content app.
The plugin writer I'm working w/ would really like to have this soon.
Updated by bmbouter about 6 years ago
- Description updated (diff)
Through list and irc discussion we should make this an option that defaults to off. I've revised the ticket as such.
Updated by jortel@redhat.com about 6 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to jortel@redhat.com
Updated by bmbouter about 6 years ago
I had not written this before but a main goal of this (for me) is to remove a step for the user! Currently you need to create a "publisher". I believe with this change they could just POST to create a publication w/ pass_through=True and the repository version and they could be done. One more call to a distributor and that content is live.
@jortel is ^ make sense to you? Can ^ be done as part of this piece of work?
Updated by jortel@redhat.com about 6 years ago
bmbouter wrote:
I had not written this before but a main goal of this (for me) is to remove a step for the user! Currently you need to create a "publisher". I believe with this change they could just POST to create a publication w/ pass_through=True and the repository version and they could be done. One more call to a distributor and that content is live.
@jortel is ^ make sense to you? Can ^ be done as part of this piece of work?
Yes. I envisioned the same thing. Will include it.
Updated by jortel@redhat.com about 6 years ago
- Status changed from ASSIGNED to POST
Updated by daviddavis about 6 years ago
- Blocks Task #4034: Use the pass_through option when generating new publications added
Added by jortel@redhat.com about 6 years ago
Added by jortel@redhat.com about 6 years ago
Revision 1c8ef717 | View on GitHub
Add support for pass-through publications. closes #4020
Updated by jortel@redhat.com about 6 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset pulp|1c8ef71753b02f9ef3cda1cdf0f75615292a16a9.
Updated by bmbouter almost 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Add support for pass-through publications. closes #4020