Project

Profile

Help

Issue #3541

Core should not add/remove content to a repository or create a repository_version without plugin input

Added by amacdona@redhat.com over 1 year ago. Updated about 2 hours ago.

Status:
NEW
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 60

Description

General Problem:

Some plugins have validation requirements for the membership of content units in a repository. The validation required depends on the plugin and the content type. Currently add/remove is done with a POST to v3/repositories/1234/versions/. This does not involve the plugins at all, so there is no opportunity for plugins to create these validations.

Example Problem with Docker:

ManifestLists cannot be added unless the Manifests they refer to are also in the repository.
L1 is ManifestList, and it refers to (contains) 2 Manifests, M1, M2. The repository is considered corrupt if it contains L1, but not M1 and M2. M1 and M2 could be in the repository without L1.

Solutions:

Add a plugin opportunity to Validate

Allow a plugin to define a validate_repo_version

class DockerManifestList(Content):

    ...

    @staticmethod
    def validate_repo_version(qs_content, repo_version):
        """ 
        Validates a repository version's DockerManifestList units.

        This method specifically ensures that the repo has all the right DockerManifests that correspond
        with this DockerManifestList

        Args:
            qs_content (django.Queryset): A Queryset of the detail instances for this content type
            repo_version (RepositoryVersion): The Repository version to validate

        Raises:
            django.core.exceptions.ValidationError: if the repository is invalid
        """ 
        # check all of the validation related DockerManifestList content
        if has_a_validation_problem:
            raise ValidationException('Repo foo has ManifestList Y but is missing Manifest Z')

Add a plugin add_content and remove_content

class ModuleMD(Content):

    ...

    @staticmethod
    def add_content(qa_add_content, repo_version):
        """ 
        Handles any extra work needed for ModuleMD content being added to a RepositoryVersion.

        Args:
            qs_content (django.Queryset): A Queryset of the detail instances for this content type
            repo_version (RepositoryVersion): The Repository version the content is being added to

        """ 
        # check on if I need to also add some other content also

    @staticmethod
    def remove_content(qa_add_content, repo_version):
        """ 
        Handles any extra work needed for ModuleMD content being removed from a RepositoryVersion.

        Args:
            qs_content (django.Queryset): A Queryset of the detail instances for this content type
            repo_version (RepositoryVersion): The Repository version the content is being removed from

        """ 
        # check if I should also remove corresponding RPMs
        #  qa_rpms_to_remove = ...
        if qa_rpms_to _remove
            repo_version.remove_content(qa_rpms_to_remove, call_plugin_hook=False)

It's possible we'll have a recursive call

RepositoryVersion.remove_content -> ModuleMD.remove_content -> RepositoryVersion.remove_content so the 'if' statement will prevent it by not causing an additional call to ModuleMD.remove_content if there are 0 items to remove.

The "existing add_content and remove_content": calls will receive an additional parameter call_plugin_hook which defaults to True. If True, the plugin callback will occur, otherwise it won't.


Related issues

Related to Pulp - Task #3522: Plan Master/Detail Tasks CLOSED - WONTFIX Actions
Related to Python Support - Issue #3604: Bugs around creating content units MODIFIED Actions
Duplicated by Pulp - Issue #4740: Pulpcore doesn't provide a way to guarantee uniqueness in repo versions CLOSED - DUPLICATE Actions

History

#1 Updated by amacdona@redhat.com over 1 year ago

  • Related to Task #3522: Plan Master/Detail Tasks added

#2 Updated by milan over 1 year ago

  • Description updated (diff)

#3 Updated by bmbouter over 1 year ago

Option 1 has a problem with it that there are some plugin types that don't need validation and now they would have to provide extra views, which for them just means more work.

Because of ^, I believe option 2 is ideal for the most number of plugin writers and users. Here's one description of what that solution could look like in practice:

class DockerManifestList(ContentUnit):

    ...

    @staticmethod
    def validate_content_in_repo_version(repo_version):
        """ 
        Validates a repository version's DockerManifestList units.

        This method specifically ensures that the repo has all the right DockerManifests that correspond
        with this DockerManifestList

        Args:
            repo_version (RepositoryVersion): The Repository version to validate

        Raises:
            django.core.exceptions.ValidationError: if the repository is invalid
        """ 
        # check all of the validation related DockerManifestList content
        if has_a_validation_problem:
            raise ValidationException('Repo foo has ManifestList Y but is missing Manifest Z')

What we can do then is add a validate method to RepositoryVersion itself that will call the validate method for each unique type of unit in the repo. If any of them fail allow the exception to be raised. This would look like:

class RepositoryVersion(Model):

    ...

    def validate(self):
        for type_name in all_unique_content_unit_types_in_self:
            type_cls = get_the_class_from_the_name(type_name) # Gets a reference to the class
            type_cls.validate_content_in_repo_version()

This could be called by anyone who wants to validate a RepositoryVersion so that is a new added value. One place specifically where that would be useful would be on the __exit__(...) context manager on RepositoryVerison.

The net effect would be that:

1. plugin writers can provide validation
2. Core can use that validation to validate repository versions
3. This provides multiple layers of validation so that plugin writer code can use these facilities to create value also.

#4 Updated by amacdona@redhat.com over 1 year ago

  • Triaged changed from No to Yes

#5 Updated by amacdona@redhat.com over 1 year ago

  • Related to Issue #3604: Bugs around creating content units added

#6 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#7 Updated by daviddavis 6 months ago

  • Blocks Task #4028: Prevent duplicate files in repositories added

#8 Updated by daviddavis 6 months ago

  • Duplicated by Issue #4740: Pulpcore doesn't provide a way to guarantee uniqueness in repo versions added

#9 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

#10 Updated by bmbouter about 1 month ago

  • Sprint/Milestone changed from 3.0 to 71

#11 Updated by bmbouter about 1 month ago

  • Sprint/Milestone changed from 71 to 3.0

#12 Updated by daviddavis 14 days ago

  • Blocks deleted (Task #4028: Prevent duplicate files in repositories)

#13 Updated by bmbouter 8 days ago

  • Subject changed from Core should not add/remove content to a repository without plugin input to Core should not add/remove content to a repository or create a repository_version without plugin input
  • Description updated (diff)

#14 Updated by bmbouter 8 days ago

  • Description updated (diff)

#15 Updated by bmbouter 8 days ago

With SomeContent.add_content and SomeContent.remove_content modifying the RepositoryVersion possible, each call to RepositoryVersion.add_content and RepositoryVersion.remove_content plugin writers need to be aware of this.

I looked at the Stages API, and I believe it's safe because the last thing it does is call RepositoryVersion.add_content for the batch here and then requests a new batch. Similarly in RepositoryVersion.remove_content is called here before a new batch is requested.

#16 Updated by bmbouter 8 days ago

  • Description updated (diff)

#17 Updated by bmbouter 8 days ago

  • Description updated (diff)

#18 Updated by bmbouter 7 days ago

  • Sprint set to Sprint 60

#19 Updated by ttereshc 6 days ago

  • Groomed changed from No to Yes

#20 Updated by gmbnomis 3 days ago

I am very sorry to be late again with my comment, but, looking at the current proposal, it feels very much like a distributed implementation of a plugin specific add/remove endpoint (the former option 1). What I mean by this: If we implement creating a repository version using a plugin endpoint, we would customize the same operations (add, remove, and validation), but not per content type, but per plugin.

I think that doing it on plugin level would allow us to follow a more "holistic" approach:

1. The validation as proposed can only validate content types that are present, but not content type that belong to a plugin but are not there. If, for example, a plugin has a content type that needs to be in a (non-empty) repo version exactly once (say a file with external signatures), it would be harder to implement a check for the "content is missing" case. Basically, all other content types would have to check the presence of the potentially missing content type.

2. Since a plugin knows exactly which content types it manages, it is easier to take a "holistic" more optimized approach and
  • only check content types that are necessary
  • reuse database query results
  • query multiple content types at once (e.g. using prefetching)

3. If add_content/remove_content implement additional functionality, it is easier to support additional parameters if the endpoint belongs to the plugin. For example, whether to remove RPMs based on ModuleMD removal could be a flag to the add/remove POST (I have no idea whether this makes sense, but you hopefully get the idea why additional plugin specific parameters might be sensible)

4. I don't know if repositories containing content of multiple plugins is still a thing, But if so, validation as proposed would always re-validate the entire content across all content types even if the added/removed content only belongs to a single plugin.

Of course, there are drawbacks to a plugin specific endpoint:
1. As already said, plugins will need to do additional work even in the "no customization required" case. (But this is the case for other entities already)
2. One cannot add/remove content from multiple plugins in a single add/remove operation (and thus, in a single repo version).

I reckon that 2. is the more important point here. If we regard this as a key feature, the "hooks per content type" approach may be better suited. Otherwise, the per plugin approach looks more flexible and easier to optimize in the future (IMHO).

Regardless on how we decide, I have two suggestions to the approach described in the issue above:

  • I have the strong feeling that putting add_content, remove_content and validate_repo_version hooks into Content is overloading the Content classes. Content should be pretty basic and should not have intricate knowledge of RepoVersion. I think we need to find a way to put this functionality somewhere else and link it to the respective Content class/type.
  • The recursive call_plugin_hook parameter feels error prone. What if we split up the methods? In this case, we would have additional "raw" content add/remove methods in RepositoryVersion that call no hooks and are to be used by content add/remove hooks.

#21 Updated by bmbouter 1 day ago

gmbnomis wrote:

I am very sorry to be late again with my comment, but, looking at the current proposal, it feels very much like a distributed implementation of a plugin specific add/remove endpoint (the former option 1). What I mean by this: If we implement creating a repository version using a plugin endpoint, we would customize the same operations (add, remove, and validation), but not per content type, but per plugin.

I think that doing it on plugin level would allow us to follow a more "holistic" approach:

1. The validation as proposed can only validate content types that are present, but not content type that belong to a plugin but are not there. If, for example, a plugin has a content type that needs to be in a (non-empty) repo version exactly once (say a file with external signatures), it would be harder to implement a check for the "content is missing" case. Basically, all other content types would have to check the presence of the potentially missing content type.

I agree doing it at the plugin level could be better than the content-by-content level. The case that a whole type is missing motivates that well.

2. Since a plugin knows exactly which content types it manages, it is easier to take a "holistic" more optimized approach and
  • only check content types that are necessary
  • reuse database query results
  • query multiple content types at once (e.g. using prefetching)

Agreed. I like the "plugin check" replacing the "type-by-type check"

3. If add_content/remove_content implement additional functionality, it is easier to support additional parameters if the endpoint belongs to the plugin. For example, whether to remove RPMs based on ModuleMD removal could be a flag to the add/remove POST (I have no idea whether this makes sense, but you hopefully get the idea why additional plugin specific parameters might be sensible)

I agree. For the copy case this makes perfect sense. The challenge is that these checks also need to be at the add_content and remove_content API itself so cases like sync can automatically handle them as well. Having the additional call to the "holistic" plugin check in RepositoryVersion.add_content and RepositoryVersion.remove_content is easy enough, but passing extra parameters in there isn't. What do you think about the need for this to be included for all RepositoryVersion manipulations, not just those driven by plugin endpoints?

4. I don't know if repositories containing content of multiple plugins is still a thing, But if so, validation as proposed would always re-validate the entire content across all content types even if the added/removed content only belongs to a single plugin.

I believe @dalley has suggested we adopt master/detail repositories and add the validation there, and treat them similar to how other plugins require a plugin writer to make it's Detail instance.

Of course, there are drawbacks to a plugin specific endpoint:
1. As already said, plugins will need to do additional work even in the "no customization required" case. (But this is the case for other entities already)

I'm still concerned about this not being inclusive of all RepositoryVersion manipulations.

2. One cannot add/remove content from multiple plugins in a single add/remove operation (and thus, in a single repo version).

I reckon that 2. is the more important point here. If we regard this as a key feature, the "hooks per content type" approach may be better suited. Otherwise, the per plugin approach looks more flexible and easier to optimize in the future (IMHO).

Regardless on how we decide, I have two suggestions to the approach described in the issue above:

  • I have the strong feeling that putting add_content, remove_content and validate_repo_version hooks into Content is overloading the Content classes. Content should be pretty basic and should not have intricate knowledge of RepoVersion. I think we need to find a way to put this functionality somewhere else and link it to the respective Content class/type.

+1 to this approach, let's adjust the plan

  • The recursive call_plugin_hook parameter feels error prone. What if we split up the methods? In this case, we would have additional "raw" content add/remove methods in RepositoryVersion that call no hooks and are to be used by content add/remove hooks.

I didn't quiet follow this last part.

#22 Updated by daviddavis 1 day ago

The recursive call_plugin_hook parameter feels error prone. What if we split up the methods? In this case, we would have additional "raw" content add/remove methods in RepositoryVersion that call no hooks and are to be used by content add/remove hooks.

I agree with this. I think that we're overloading add_content and remove_content and I also worry about these functions being called recursively. Where would the content add/remove hooks be called from though?

#23 Updated by bmbouter about 18 hours ago

We have several plugin writer conventions about which python modules they will place various objects. We could do one more of those and have the plugin writer write a MyPluginRepositoryVersionHandler. This would be in plugin_name.app.handlers. It could have the add_content and remove_content, and validate interfaces there.

Core would provide pulpcore.plugin.handlers.RepositoryVersionHandler and plugin writers would subclass it. The core defines:

class RepositoryVersionHandler:

    @clsmethod
    def add_content(cls, qs_add_content, repo_version):
        pass

    @clsmethod
    def remove_content(cls, qs_remove_content, repo_version):
        pass

    @clsmethod
    def validate(cls, repo_version):
        pass

    def repo_key_implementation(qs_add_content, repo_version, model_class, repo_unit_key):
        # the implementation of repo_key
        # not enabled by default
        # remove the content form repo_version when repo_unit_key has "another one" already in it.

Then a subclass would be:

class FileRepositoryVersionHandler(RepositoryVersionHandler):

    @clsmethod
    def add_content(cls, qs_add_content, repo_version):
        cls.repo_key_implementation(qs_add_content, repo_version, FileContent, 'relative_path')
        # This effectively "enables" the repo_key functionality here instead of on the Content object.

Core would load this and know that for any Content type provided by this plugin, this is the Handler to call. The RepositoryVersion.add_content() and RepositoryVersion.remove_content() would call these handles for any content in the queryset it is adding/removing. There would be 1 call per plugin so it could do it holistically.

Providing a RepositoryVersionHandler is optional. Plugins that don't need it won't have to specify it.

#24 Updated by daviddavis about 2 hours ago

This sounds great. +1 from me.

I wonder if the add_content could handle https://pulp.plan.io/issues/5567#note-8 by default too?

#25 Updated by gmbnomis about 2 hours ago

bmbouter wrote:

We have several plugin writer conventions about which python modules they will place various objects. We could do one more of those and have the plugin writer write a MyPluginRepositoryVersionHandler. This would be in plugin_name.app.handlers. It could have the add_content and remove_content, and validate interfaces there.

I like the idea of a handler. However, this scheme does not allow to customize the handler using parameters (as discussed above).

Perhaps we could make the handler instance-based instead of class-based. In this scheme, the plugin would prepare a handler instance (if it needs one) and pass it to RepositoryVersion as a parameter.

For sync, the handler instance would be an additional optional parameter to DeclarativeVersion.
For the add/remove endpoint (which would be specific to the plugin), the creation of the handler needs to happen in the respective task (which could pass parameters from the add/remove call to the handler)

One important use case for the handler parameters could be to have different behavior of the handler during sync. (For example auto-generating/auto-removing some additional content may only be desired on the add/remove endpoint, but not for sync)

Core would load this and know that for any Content type provided by this plugin, this is the Handler to call. The RepositoryVersion.add_content() and RepositoryVersion.remove_content() would call these handles for any content in the queryset it is adding/removing. There would be 1 call per plugin so it could do it holistically.

The instance based handler is definitely more work for the plugin writer (because it is not automatically picked up like your proposed scheme and we have more boilerplate code for the endpoint). OTOH, I think the instance based approach is more flexible.

Please register to edit this issue

Also available in: Atom PDF