Story #3570
Updated by bmbouter over 6 years ago
h3. Problem Statements 1. To use the Changesets, plugin writers have to compute additions and removals parameters using differencing code they maintain 2. Changesets require plugin writers to download metadata to compute additions and removals, which puts metadata downloading outside of the Changeset's stream processing. h3. Solution Introduce Replace additions and removals from the Changeset() constructor with a new object called PendingVersion which uses a generator named 'pending_content' which emits PendingContent objects just like plugin writer's do already for the 'additions' parameter to Changesets. parameter. The plugin-provided generator will perform the metadata downloading and yield 100% of the PendingContent and PendingArtifact objects it discovers in that metadata. This is a change from before where plugin writer's would only yield PendingContent objects to Changesets if they believed they needed to be downloaded. h5. Handling Removals The PendingVersion Changeset should get an additional parameter called sync_mode which accepts either 'mirror' or 'additive'. If using 'additive' then no removals are removed. If using 'mirror'. Any content units in the RepositoryVersion that were not emitted by 'pending_content' should be removed. h5. Why Declarative? This is a declarative interface because every time the PendingVersion Changeset runs the plugin code "declares" the remote repository state. h5. Separation of Concerns PendingVersion will effectively produce the additions and removals information and pass that data via stream processing state but leaves it to the Changesets. This allows each object Changeset to provide a clear piece of functionality for easy maintenance. figure out how to get it there. h3. Thing example The "Changeset docs":https://docs.pulpproject.org/en/3.0/nightly/plugins/plugin-api/changeset.html give a Thing example. This is the entire example, rewritten to use the new interface. <pre><code class="python"> >>> >>> from pulpcore.plugin.changeset import ( >>> ChangeSet, PendingArtifact, PendingContent) >>> from pulpcore.plugin.models import Artifact, Content, Remote, Repository >>> >>> >>> class Thing(Content): >>> pass >>> >>> >>> class ThingRemote(Remote): >>> >>> def _pending_content(self): >>> metadata = # <fetched metadata> >>> for item in metadata: >>> # Create a concrete model instance for Thing content >>> # using the (thing) metadata. >>> thing = Thing(...) >>> # Create a pending content instance using the model along with a >>> # pending artifact for each file associated with the content. >>> content = PendingContent( >>> thing, >>> artifacts={ >>> PendingArtifact( >>> Artifact(size=1024, sha256='...'), 'http://..', 'one.img'), >>> PendingArtifact( >>> Artifact(size=4888, sha256='...'), 'http://..', 'two.img'), >>> }) >>> yield content >>> >>> def sync(self): >>> pending_version changeset = PendingVersion(self, ChangeSet(self, pending_content=self._pending_content, sync_mode='mirror') >>> repo_version = pending_Version.apply(): changeset.apply_and_drain(): >>> </code></pre> h3. Technical Design The PendingVersion Changeset should drain from 'pending_content' similar to how the Changesets drain just like it does from 'additions' today. It uses the Content's natural key to determine if it is already downloaded. Similarly for all associated PendingArtifact objects. Note the PendingVersion will also create the repository version for the user as well. The apply() call is blocking. Before it is called the new repository version is not created. While it is called the repo_version is created, but not complete. After the call returns the repo_version it is already finished with complete=True. h3. Wait what happened to SizedIterable and BatchIterator? Since it's fully stream processing we can't know early on how many items there will be. It's common for stream processing systems to not know how many items they are processing, especially when reading the data is part of the stream processing itself. That is the case here. Those changes are being made as part of task https://pulp.plan.io/issues/3582 SizedIterable and BatchIterator may still be usable inside the Changeset, but they don't have a use in the Plugin API with this change.