Project

Profile

Help

Task #3051

Updated by mhrivnak over 6 years ago

This solves the same problem as issue #2987, but it does enforcement at the data model level. It is more complex than the string-parsing proposed in #2987, but enforcing the schema in the database, instead of in python, is a big advantage. 

 I call this the “Jerry Springer” model. You’ll see why... 

 I am not recommending this solution over the one proposed in #2987, but it is worth considering. At this early stage of Pulp 3 development, it could make sense to proceed with the quicker solution. At a minimum, this approach could be used as a future replacement for the implementation of #2987. 

 This solution approaches the issue like a filesystem. To manage the integrity of a filesystem tree, you have to independently represent and track each node in the tree.  

 We will track each segment of a path as a node and store the full tree in the database. This lets us identify for sure which nodes are leaf nodes. Then the challenge is to enforce that only leaf nodes can have a distribution. 

 Consider for the moment that we’ll track each node’s “path” and “parent”. For example “a/b/c/”, and a reference to the node for path “a/b/”. 

 We can accomplish enforcement with two constraints: 
 * A node’s path must be unique. 
 * A node cannot be a parent and have a distribution. This is another way of expressing “only leaf nodes can have a distribution”. 

 Of course the challenge with the second constraint is in figuring out if a given node is a parent (Now you’re understanding the name… [0]). On a filesystem, a directory stores references to its children. But in a relational DB, it’s much more desirable to have a FK on the child that references its parent. Given that, is there anything we can do on a node, without querying other nodes, to test whether it is a parent? Yes! 

 We can add an extra field to the model that is a unique ID, in addition to the PK, and use that as the “to_field” on the ForeignKey field. Let’s call it “willing_parent_id” to signify that a node is willing to be a parent if the field is populated. If that field is null, we are guaranteed that there are no children. That’s half the battle. 

 The second half is enforcing that either “willing_parent_id” or “distribution” must be null. We can do that at the DB level using the “constraints” feature of our relational database. Django doesn’t fully expose that, but we can use the technique described here: 

 https://www.fusionbox.com/blog/detail/custom-database-constraints-in-django/594/ 

 The constraint would be approximately: 

 <pre><code class="sql"> 
 ALTER TABLE distributed_path ADD CONSTRAINT has_distribution_or_children CHECK (distribution_id IS NULL OR willing_parent_id IS NULL) 
 </code></pre> 

 The model would look approximately like this (completely untested) code: 

 <pre><code class="python"> 
 class DistributedPath(models.Model): DistributedPath(Model): 
	 path = models.CharField(unique=True) 
	 willing_parent_id = models.UUIDField(unique=True, null=True, default=uuid.uuid4) 
	 parent = models.ForeignKey('self', null=True, to_field='willing_parent_id', on_delete=models.PROTECT) to_field='willing_parent_id') 
	 distribution = models.ForeignKey(Distribution, null=True) 
 </code></pre> 

 [0] For those not familiar, Jerry Springer is famous in the US for his trashy TV show that's well-known for revealing paternity test results on air.

Back