Story #3570
closedAs a plugin writer, I have declarative PendingVersion
0%
Description
Problem Statements¶
1. To use the Changesets, plugin writers have to compute additions and removals parameters using differencing code they maintain
2. Changesets require plugin writers to download metadata to compute additions and removals, which puts metadata downloading outside of the Changeset's stream processing.
Solution¶
Introduce a new object called PendingVersion which uses a generator named 'pending_content' which emits PendingContent objects with associated PendingArtifacts. The plugin-provided generator will perform the metadata downloading and yield 100% of the PendingContent and PendingArtifact objects it discovers in that metadata. Changesets expected PendingContent objects the user thought needed to be downloaded. PendingVersion accepts PendingContent that the user wants present in the new repository version.
Handling Removals¶
The PendingVersion should get an additional parameter called sync_mode which accepts either 'mirror' or 'additive'. If using 'additive' then no removals are removed. If using 'mirror'. Any content units in the RepositoryVersion that were not emitted by 'pending_content' should be removed.
Why Declarative?¶
This is a declarative interface because every time the PendingVersion runs the plugin code "declares" the content to be contained in a new repository version.
Separation of Concerns¶
PendingVersion will effectively produce the additions and removals information and pass that data via stream processing to the Changesets. This allows each object to provide a clear piece of functionality for easy maintenance.
Thing example¶
The Changeset docs give a Thing example. This is the entire example, rewritten to use the new interface.
>>>
>>> from pulpcore.plugin.changeset import (
>>> PendingArtifact, PendingContent, PendingVersion)
>>> from pulpcore.plugin.models import Artifact, Content, Remote, Repository
>>>
>>>
>>> class Thing(Content):
>>> pass
>>>
>>>
>>> class ThingRemote(Remote):
>>>
>>> def _pending_content(self):
>>> metadata = # <fetched metadata>
>>> for item in metadata:
>>> # Create a concrete model instance for Thing content
>>> # using the (thing) metadata.
>>> thing = Thing(...)
>>> # Create a pending content instance using the model along with a
>>> # pending artifact for each file associated with the content.
>>> content = PendingContent(
>>> thing,
>>> artifacts={
>>> PendingArtifact(
>>> Artifact(size=1024, sha256='...'), 'http://..', 'one.img'),
>>> PendingArtifact(
>>> Artifact(size=4888, sha256='...'), 'http://..', 'two.img'),
>>> })
>>> yield content
>>>
>>> def sync(self):
>>> pending_version = PendingVersion(self, pending_content=self._pending_content, sync_mode='mirror')
>>> repo_version = pending_version.create():
>>>
Technical Design¶
The PendingVersion should drain from 'pending_content' similar to how the Changesets drain from 'additions' today. It uses the Content's natural key to determine if it is already downloaded. Similarly for all associated PendingArtifact objects.
Note the PendingVersion will also create the repository version for the user as well. The create() call is blocking. Before it is called the new repository version is not created. While it is called the repo_version is created, but not complete. After the call returns the repo_version it is already finished with complete=True.
Wait what happened to SizedIterable and BatchIterator?¶
Since it's fully stream processing we can't know early on how many items there will be. It's common for stream processing systems to not know how many items they are processing, especially when reading the data is part of the stream processing itself. That is the case here.
Those changes are being made as part of task https://pulp.plan.io/issues/3582
Updated by jortel@redhat.com over 6 years ago
This has potential.
This seems like a higher level abstraction that is different than the ChangeSet. Perhaps something named PendingVersion since the plugin writer is really (declaratively) defining the set of content to be included in the new repository version rather than a set of changes to be applied. The PendingVersion could/should use the ChangeSet internally. I should consider leaving the ChangeSet as-is (potentially minus the SizedIterator) in the PluginAPI as well in support of our layered API approach.
The SizedIterator is only needed to report progress. Without this information, I don't see how progress can be reported (by anything) without constantly adjusting the TOTAL reported. Are you proposing this and will it be acceptable?
Updated by bmbouter over 6 years ago
- Subject changed from As a plugin writer, I have declarative Changesets to As a plugin writer, I have declarative PendingVersion
- Description updated (diff)
After @jortel and I talked we thought it would be good to add a new object on top which does the differencing and drives the changeset additions and removals. This pairs with the work in https://pulp.plan.io/issues/3582#note-1
This new object would be the go-to interface for plugin writers and would also create the repo version as well as mark it as completed.
Updated by jortel@redhat.com over 6 years ago
The changes in the description look good!
Suggestions:
This is a change from before where plugin writer's would only yield PendingContent objects to Changesets
- if they believed they needed to be downloaded.
+ if they want the content to be added to the repository.
This is a declarative interface because every time the PendingVersion runs the plugin code
- "declares" the remote repository state.
+ "declares" the content to be contained in a new repository version.
I think PendingVersion.create() would be more appropriate than apply().
Updated by bmbouter over 6 years ago
- Description updated (diff)
Those are good changes. Thanks @jortel. I incorporated them into the ticket also.
Updated by gmbnomis over 6 years ago
I am not sure that I understand:
2. Changesets require plugin writers to download metadata to compute additions and removals, which puts metadata downloading outside of the Changeset's stream processing.
In the example, the metadata is fetched before entering the generator part. Or does "fetch metadata" mean that this can be an iterator that dynamically gets metadata? (which is probably tricky to get right)
Updated by bmbouter over 6 years ago
- Status changed from NEW to ASSIGNED
I'm building a prototype on top of Jortel's changes https://github.com/pulp/pulp/pull/3464/ I'll post a link to it when it's available.
gmbnomis, yes, "fetch metadata" means that an iterator the plugin writer provides would be fetching the metadata as part of the stream. In the example that is _pending_content()
.
Updated by bmbouter over 6 years ago
- Assignee set to bmbouter
Assigning so others know someone is working on a prototype.
Updated by bmbouter over 6 years ago
- Status changed from ASSIGNED to NEW
- Assignee deleted (
bmbouter)
I'm no longer actively working on this issue.
Updated by bmbouter over 6 years ago
- Status changed from NEW to CLOSED - WONTFIX
With DeclarativeVersion available, I don't plan to pursue this work. I'm closing as WONTFIX along with the associated PR.
Updated by bmbouter over 4 years ago
- Tags Performance added
- Tags deleted (
Sync Performance)