Project

Profile

Help

Story #3570

Updated by bmbouter over 6 years ago

h3. Problem Statements 

 1. To use the Changesets, plugin writers have to compute additions and removals parameters using differencing code they maintain 

 2. Changesets require plugin writers to download metadata to compute additions and removals, which puts metadata downloading outside of the Changeset's stream processing. 


 h3. Solution 

 Replace additions and removals from the Changeset() constructor with a generator named 'pending_content' which emits PendingContent objects just like plugin writer's do already for the 'additions' parameter. The plugin-provided generator will perform the metadata downloading and yield 100% of the PendingContent and PendingArtifact objects content units it discovers in that metadata. knows about. This is a change from before where plugin writer's would only yield PendingContent objects if they believed they needed to be downloaded. 

 h5. Handling Removals 

 The Changeset should get an additional parameter called sync_mode which accepts either 'mirror' or 'additive'. If using 'additive' then no removals are removed. If using 'mirror'. Any content units in the RepositoryVersion that were not emitted by 'pending_content' should be removed. 

 h5. Why Declarative? 

 This is a declarative interface because every time the Changeset runs the plugin code "declares" the remote repository state but leaves it to the Changeset to figure out how to get it there. 


 h3. Thing example 

 The "Changeset docs":https://docs.pulpproject.org/en/3.0/nightly/plugins/plugin-api/changeset.html give a Thing example. This is the entire example, rewritten to use the new interface. 

 <pre><code class="python"> 
 >>> 
 >>> from pulpcore.plugin.changeset import ( 
 >>>       ChangeSet, PendingArtifact, PendingContent) 
 >>> from pulpcore.plugin.models import Artifact, Content, Remote, Repository 
 >>> 
 >>> 
 >>> class Thing(Content): 
 >>>       pass 
 >>> 
 >>> 
 >>> class ThingRemote(Remote): 
 >>> 
 >>>       def _pending_content(self): 
 >>>           metadata = # <fetched metadata> 
 >>>           for item in metadata: 
 >>>               # Create a concrete model instance for Thing content 
 >>>               # using the (thing) metadata. 
 >>>               thing = Thing(...) 
 >>>               # Create a pending content instance using the model along with a 
 >>>               # pending artifact for each file associated with the content. 
 >>>               content = PendingContent( 
 >>>                   thing, 
 >>>                   artifacts={ 
 >>>                       PendingArtifact( 
 >>>                           Artifact(size=1024, sha256='...'), 'http://..', 'one.img'), 
 >>>                       PendingArtifact( 
 >>>                           Artifact(size=4888, sha256='...'), 'http://..', 'two.img'), 
 >>>                   }) 
 >>>               yield content 
 >>> 
 >>>       def sync(self): 
 >>>           changeset = ChangeSet(self, pending_content=self._pending_content, sync_mode='mirror') 
 >>>           changeset.apply_and_drain(): 
 >>> 
 </code></pre> 


 h3. Technical Design 

 The Changeset should drain from 'pending_content' just like it does from 'additions' today. It uses the Content's natural key to determine if it is already downloaded. Similarly for all associated PendingArtifact objects. 

 h3. Wait what happened to SizedIterable and BatchIterator? 

 Since it's fully stream processing we can't know early on how many items there will be. It's common for stream processing systems to not know how many items they are processing, especially when reading the data is part of the stream processing itself. That is the case here. 

 SizedIterable and BatchIterator may still be usable inside the Changeset, but they don't have a use in the Plugin API with this change.

Back