Project

Profile

Help

Story #3570

Updated by bmbouter about 6 years ago

h3. Problem Statements 

 1. To use the Changesets, plugin writers have to compute additions and removals parameters using differencing code they maintain 

 2. Changesets require plugin writers to download metadata to compute additions and removals, which puts metadata downloading outside of the Changeset's stream processing. 


 h3. Solution 

 Introduce a new object called PendingVersion which uses a generator named 'pending_content' which emits PendingContent objects with associated PendingArtifacts. The plugin-provided generator will perform the metadata downloading and yield 100% of the PendingContent and PendingArtifact objects it discovers in that metadata. Changesets expected PendingContent objects the user thought needed to be downloaded. PendingVersion accepts PendingContent that the user wants present in the new repository version. 

 h5. Handling Removals 

 The PendingVersion should get an additional parameter called sync_mode which accepts either 'mirror' or 'additive'. If using 'additive' then no removals are removed. If using 'mirror'. Any content units in the RepositoryVersion that were not emitted by 'pending_content' should be removed. 

 h5. Why Declarative? 

 This is a declarative interface because every time the PendingVersion runs the plugin code "declares" the content to be contained in a new repository version. 

 h5. Separation of Concerns 

 PendingVersion will effectively produce the additions and removals information and pass that data via stream processing to the Changesets. This allows each object to provide a clear piece of functionality for easy maintenance. 


 h3. Thing example 

 The "Changeset docs":https://docs.pulpproject.org/en/3.0/nightly/plugins/plugin-api/changeset.html give a Thing example. This is the entire example, rewritten to use the new interface. 

 <pre><code class="python"> 
 >>> 
 >>> from pulpcore.plugin.changeset import ( 
 >>>       ChangeSet, PendingArtifact, PendingContent, PendingVersion) PendingContent) 
 >>> from pulpcore.plugin.models import Artifact, Content, Remote, Repository 
 >>> 
 >>> 
 >>> class Thing(Content): 
 >>>       pass 
 >>> 
 >>> 
 >>> class ThingRemote(Remote): 
 >>> 
 >>>       def _pending_content(self): 
 >>>           metadata = # <fetched metadata> 
 >>>           for item in metadata: 
 >>>               # Create a concrete model instance for Thing content 
 >>>               # using the (thing) metadata. 
 >>>               thing = Thing(...) 
 >>>               # Create a pending content instance using the model along with a 
 >>>               # pending artifact for each file associated with the content. 
 >>>               content = PendingContent( 
 >>>                   thing, 
 >>>                   artifacts={ 
 >>>                       PendingArtifact( 
 >>>                           Artifact(size=1024, sha256='...'), 'http://..', 'one.img'), 
 >>>                       PendingArtifact( 
 >>>                           Artifact(size=4888, sha256='...'), 'http://..', 'two.img'), 
 >>>                   }) 
 >>>               yield content 
 >>> 
 >>>       def sync(self): 
 >>>           pending_version = PendingVersion(self, pending_content=self._pending_content, sync_mode='mirror') 
 >>>           repo_version = pending_version.create(): 
 >>> 
 </code></pre> 


 h3. Technical Design 

 The PendingVersion should drain from 'pending_content' similar to how the Changesets drain from 'additions' today. It uses the Content's natural key to determine if it is already downloaded. Similarly for all associated PendingArtifact objects. 

 Note the PendingVersion will also create the repository version for the user as well. The create() call is blocking. Before it is called the new repository version is not created. While it is called the repo_version is created, but not complete. After the call returns the repo_version it is already finished with complete=True. 

 h3. Wait what happened to SizedIterable and BatchIterator? 

 Since it's fully stream processing we can't know early on how many items there will be. It's common for stream processing systems to not know how many items they are processing, especially when reading the data is part of the stream processing itself. That is the case here. 

 Those changes are being made as part of task    https://pulp.plan.io/issues/3582

Back