Project

Profile

Help

Task #3051

closed

Prevent Distribution base_path overlap in the data model

Added by mhrivnak over 6 years ago. Updated almost 5 years ago.

Status:
CLOSED - DUPLICATE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

This solves the same problem as issue #2987, but it does enforcement at the data model level. It is more complex than the string-parsing proposed in #2987, but enforcing the schema in the database, instead of in python, is a big advantage.

I call this the “Jerry Springer” model. You’ll see why...

I am not recommending this solution over the one proposed in #2987, but it is worth considering. At this early stage of Pulp 3 development, it could make sense to proceed with the quicker solution. At a minimum, this approach could be used as a future replacement for the implementation of #2987.

This solution approaches the issue like a filesystem. To manage the integrity of a filesystem tree, you have to independently represent and track each node in the tree.

We will track each segment of a path as a node and store the full tree in the database. This lets us identify for sure which nodes are leaf nodes. Then the challenge is to enforce that only leaf nodes can have a distribution.

Consider for the moment that we’ll track each node’s “path” and “parent”. For example “a/b/c/”, and a reference to the node for path “a/b/”.

We can accomplish enforcement with two constraints:

  • A node’s path must be unique.
  • A node cannot be a parent and have a distribution. This is another way of expressing “only leaf nodes can have a distribution”.

Of course the challenge with the second constraint is in figuring out if a given node is a parent (Now you’re understanding the name… [0]). On a filesystem, a directory stores references to its children. But in a relational DB, it’s much more desirable to have a FK on the child that references its parent. Given that, is there anything we can do on a node, without querying other nodes, to test whether it is a parent? Yes!

We can add an extra field to the model that is a unique ID, in addition to the PK, and use that as the “to_field” on the ForeignKey field. Let’s call it “willing_parent_id” to signify that a node is willing to be a parent if the field is populated. If that field is null, we are guaranteed that there are no children. That’s half the battle.

The second half is enforcing that either “willing_parent_id” or “distribution” must be null. We can do that at the DB level using the “constraints” feature of our relational database. Django doesn’t fully expose that, but we can use the technique described here:

https://www.fusionbox.com/blog/detail/custom-database-constraints-in-django/594/

The constraint would be approximately:

ALTER TABLE distributed_path ADD CONSTRAINT has_distribution_or_children CHECK (distribution_id IS NULL OR willing_parent_id IS NULL)

The model would look approximately like this (completely untested) code:

class DistributedPath(models.Model):
    path = models.CharField(unique=True)
    willing_parent_id = models.UUIDField(unique=True, null=True, default=uuid.uuid4)
    parent = models.ForeignKey('self', null=True, to_field='willing_parent_id', on_delete=models.PROTECT)
    distribution = models.ForeignKey(Distribution, null=True)

[0] For those not familiar, Jerry Springer is famous in the US for his trashy TV show that's well-known for revealing paternity test results on air.


Related issues

Related to Pulp - Task #2987: The Distribution ViewSet needs to prevent base_path overlap.CLOSED - CURRENTRELEASEdaviddavis

Actions
Related to Pulp - Task #3448: Warn users that distributor base paths should not overlapCLOSED - CURRENTRELEASEdaviddavis

Actions
Related to Pulp - Issue #3449: Requesting content from nested distribution path results in 500 errorCLOSED - WONTFIXActions
Related to Pulp - Story #3044: Distribution create/update operations should be asynchronousCLOSED - CURRENTRELEASECodeHeeler

Actions

Also available in: Atom PDF