Project

Profile

Help

Refactor #3074

closed

Story #3209: As a user, I have Repository Versions

Move sync logic from models to tasks

Added by mhrivnak about 7 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Sprint:
Sprint 31
Quarter:

Description

Most of the work of this move will be handled by plugins, so refactor tasks will be created to implement this change for each plugin. The scope of this refactor should be limited to Importers/sync for now, so Publish is deliberately left out for now.

The pulpcore changes are:

  1. Remove models.Importer.sync
  2. Remove the viewsets.Importer.sync detail route
  3. remove the sync task from tasks/importer.py
  4. add UserFacingTask to the plugin API
  5. add RepositoryVersion model to plugin API
  6. ensure plugin API docs include new objects

Plugin refactor issues should:

  1. Create a celery task (probably called `sync`) in pulp_myplugin.app.tasks.py
  2. define a POST detail route (probably called sync) on MyTypeImporterViewset that deploys the celery task with appropriate locks. This viewset should accept `repository` as a POST body parameter.

Related issues

Related to File Support - Refactor #3260: Move sync logic from Importer model to a taskCLOSED - CURRENTRELEASEamacdona@redhat.com

Actions
Has duplicate Pulp - Task #3224: Update plugin API with repository version changesCLOSED - DUPLICATE

Actions
Blocks Python Support - Task #3294: Update Python plugin to sync with versioned repositoriesMODIFIEDamacdona@redhat.com

Actions
Actions #1

Updated by jortel@redhat.com about 7 years ago

Class names need to be unique within any given object model. However, the django models are part of the plugin Data Model and the functional objects are part of the plugin Domain Model. Two completely different object models. The models.FileImporter(Model) IsA Model is completely different than say plugin.FileImporter which is not. They are easily distinguishable by namespace and class hierarchy. What's making the naming hard is the perception that the natural and appropriate name for both is not available. The alternatives, such as FileImporterController, don't seem appropriate because they are contrived just for differentiation.

I'm not dismissing the concern that having 2 classes with same name within a plugin project might be confusing for some developers. My perspective comes from working on other projects with much larger object models than pulp where this was not uncommon or confusing. I don't anticipate convincing anyone and don't want to start a debate. Just wanted to make the point.

Actions #2

Updated by mhrivnak about 7 years ago

wrote:

Class names need to be unique within any given object model. However, the django models are part of the plugin Data Model and the functional objects are part of the plugin Domain Model. Two completely different object models. The models.FileImporter(Model) IsA Model is completely different than say plugin.FileImporter which is not. They are easily distinguishable by namespace and class hierarchy. What's making the naming hard is the perception that the natural and appropriate name for both is not available. The alternatives, such as FileImporterController, don't seem appropriate because they are contrived just for differentiation.

I basically think the same and would be happy for both to have the same class name, easily differentiated by namespaces. That could at least be a fine starting point.

Actions #3

Updated by bmbouter about 7 years ago

I think using the same name will be confusing. +1 to the controller language. I can't think of anything better. What about these names?

pulpcore.plugin.models.Importer      <---   the importer as the user thinks about it. Optionally subclassed by the plugin writer.
pulpcore.plugin.controllers.ImporterController    <--- the new object the plugin writer would subclass and provide.

pulpcore.plugin.models.Publisher      <---   the publisher as the user think about it. Optionally subclassed by the plugin writer.
pulpcore.plugin.controllers.PublisherController    <--- the new object the plugin writer would subclass and provide.

How will platform discover the subclass of ImporterController and PublisherController? Any plugin can provide multiple importers or distributors and there could be several plugins. How will we know the right one specifically?

Actions #4

Updated by mhrivnak almost 7 years ago

Thinking about naming today, and where the convention in django came from of objects that interact with a specific model being named $Modelname$Behavior, I wondered if the genesis of that convention might be ModelForms.

https://docs.djangoproject.com/en/1.11/topics/forms/modelforms/#modelform

The example is clear. myapp.models.Article is an Article. ArticleForm is a form that happens to work with Articles. The name is reasonable even if it turns into myapp.forms.ArticleForm, because the darned thing is a form.

@jortel raised the interesting question: should then the model be named ArticleModel? We concluded no, because the model's main purpose in life is to represent an Article. It is a data structure more than an actor, and it's reasonable to name it after just the thing it represents.

So in short, I talked myself into being more ok with a name like ImporterController.

Actions #5

Updated by jortel@redhat.com almost 7 years ago

I wish I could start this comment by offering a better suggestion but, ATM, I don't have one. In any case, Controllers are an anti-pattern in OO and this naming really should be avoided. In most cases, difficulty in selecting a class name suggests that something isn't modeled correctly. Although in this case, the problem is a naming conflict between the pulp (plugin) domain model and the data model. The problem is that both names seem appropriate within their respective models. To resolve this in a way that ignores the different namespaces requires us to change the name of something that already has an appropriate name to something less appropriate.

Looking at the our data model, each class (table) is ,as all objects should be, a noun within the pulp problem domain. For example: User, Repository, Publication, Distribution. Notice nothing (except Importer and Publisher) ends in er. The operations in our django model objects pertain only to persistence which is why these classes are part of the data model. However, our plugin contributed objects deviate from this by ending in er. This naming implies that these objects have operations that are outside the scope of persistence. I believe our naming conflict problem should be resolved by finding a more appropriate name for the Importer and Publisher data model classes. But I'm not sure how. Keep in mind that names that include Config are not appropriate in a data model. Any ideas?

Actions #6

Updated by amacdona@redhat.com almost 7 years ago

Importers

Looking at Importers as they are today, I the fields fall into 2 distinct categories:

  • external source config
    • feed_url
    • ssl stuff
    • proxy stuff
    • username/password
  • sync settings
    • validate
    • sync_mode
    • download_policy

The external source fields would map nicely to a model called ExternalSource, (or ExternalRepo if you prefer) which would be subclassed by plugins. ex. RPMExternalSource, PythonExternalSource. A user would create an ExternalSource to model each repo that they want to sync. examples: (these would be REST calls, but it is simpler to explain with python)

    RPMExternalSource(name=epel7, feed_url=https://dl.fedoraproject.org/pub/epel/7/x86_64/) 
    PythonExternalSource(name=pypi, feed_url=pypi.python.org)

In my view, an external source would not be owned by a Pulp repo. Multiple repos could have the same external source.

The other half of the Importer (the Synchronizer) is just a behavior that the user doesn't need to know about at all. It is implied by the "type" of the ExternalSource. I'm a lot less concerned about what we call this because it is plugin-writer facing, but not user-facing.

The sync settings need not be modeled at all. A call to sync would look like this:

Python

POST to /v3/repoversions/

post_body = {
    "external_source": "pypi",
    "download_policy": "immediate",
    "sync_mode": "additive",
    "validate": True
    "plugin_settings": {
        "sync_types": ["sdist", "whl"]
        "download_dependencies": True
    }
}

RPM

POST to /v3/repoversions/

post_body = {
    "external_source": "epel7",
    "download_policy": "immediate",
    "sync_mode": "additive",
    "validate": True
    "plugin_settings": {
        "download_dependencies": True
    }
}

If we wanted to, we could store any of the sync options, including plugin_settings, in a `sync_settings = GenericKeyValueField()` on the Repository model. If we want to continue to think of a Repository as a content stream from a source, this would let the users see what a Repository is.

Publishers

I am not sure we even need a user-facing Publisher anymore. The Publisher could keep its name, but become a plugin-writer interface. Plugin specific settings can used in the POST body, like for sync. If we wanted to keep a userfacing object for persistent settings, PublishSettings seems right to me.

Actions #7

Updated by jortel@redhat.com almost 7 years ago

The proposal of storing plugin (importer) settings on the repository has a cardinality problem. Pulp2 does not support multiple importers but I'm convinced that pulp3 needs to. Also, the example does not provide the platform with information to select a specific plugin (importer). As for the publisher, I believe we need to keep publisher information in the DB as well.

Actions #8

Updated by amacdona@redhat.com almost 7 years ago

Proposal: Plugins define sync and publish tasks. This make sense because:

  1. The plugin API needs to expose the tasking system for complex copy
  2. sync and publish are purely plugin activities, and are entirely run in a task

Currently, pulpcore defines an Importer and Publisher master ViewSet, which is the parent of the plugin's detail viewset, FileImporter and FilePublisher. The pulpcore ImporterViewSet and PublisherViewSet define sync and publish endpoints, which deploy a generic sync task that calls sync on the detail model. Instead, the detail ViewSets should deploy their own sync tasks.

PulpFile Example Detail ViewSet:

class FileImporterViewSet(pulpcore.plugin.ImporterViewSet):
    @detail_route
    def sync(self):
        async_result = pulp_file.tasks.sync.apply_async_with_reservation(...)
        return OperationPostponed(async_result)

This will give the plugin writers the ability to add extra POST body arguments and perform synchronous validation. The task that is deployed is also written by the plugin writer, so they have absolute control over the entire life of the request. This is what is already expected for features like complex copy.

The sync logic on the plugins would be moved from the Detail Importer to a task, but would stay relatively the same.


@shared_task(base=UserFacingTask)
def sync(importer_pk):
    importer = models.FileImporter.objects.get(importer_pk) # Note, no cast() necessary the plugin knows which type
    synchronizer = Synchronizer(old_version, new_version).run()

Currently, working directory is set by the generic task, but this could be moved to a base task, WorkingDirectoryTask which would be part of the plugin api.

This change would result in the following REST API changes:

POST /v3/importers/file/123456789/sync/ repository=repository_href
POST /v3/publishers/file/123456789/publish/ repository_version=repository_version
Actions #9

Updated by amacdona@redhat.com almost 7 years ago

Another benefit of having the plugin viewsets call plugin tasks is that the pattern is consistent and expandable. The plugin's sync route calls the plugin sync task, the plugin's rich copy route calls a rich copy task etc. This gives plugin writers one pattern to follow, which would be reinforced by the app's architecture:

pulp_plugin/
    app/
        models.py
        viewsets.py
        tasks.py

This also restricts RepositoryVersion to be a "manual" addition and removal of specified ContentUnits, which is consistent with all of our other DRF ViewSets, which are only CRUD without complex custom actions. Anything that is not CRUD would be implemented separately at action endpoints, keeping concerns separate.

Actions #10

Updated by amacdona@redhat.com almost 7 years ago

  • Subject changed from plugins define import and publish behavior separate from Model objects to Move sync logic from models to tasks
  • Description updated (diff)
  • Parent issue set to #3209

Updated based on repository versions discussion. Since these behaviors will mostly be defined by plugins, this task is to remove the pulpcore implementations that will be replaced by plugin code.

Actions #11

Updated by amacdona@redhat.com almost 7 years ago

  • Related to Refactor #3260: Move sync logic from Importer model to a task added
Actions #12

Updated by amacdona@redhat.com almost 7 years ago

  • Description updated (diff)

The reasoning behind this change is documented below:

The problem was described previously in an email:

While working with the FileImporter, I discovered an interesting problem
with circular imports.

An Importer is both a django Model and has a "sync" method, as shown here:

https://github.com/pulp/pulp_file/blob/d684f75e/pulp_file/app/models.py#L69

We decided to design it this way for simplicity, so that a plugin writer
would have a minimum of objects to subclass. It also reduces the number of
objects for the core to discover. The Importer would have methods for sync,
copy, upload, etc. This is very similar to Pulp 2. And just like in Pulp 2,
we know that plugins will often want to implement much of the sync logic in
one or more separate python modules.

When I tried to move the FileImporter's sync code to a separate module, I
quickly found a circular import problem. The FileImporter wants to
instantiate and use a Synchronizer. The Synchronizer of course needs to use
the data model, both the FileContent model and the corresponding Key
namedtuple. A plugin with more types would have yet more models.

Thinking of the idea that circular imports often indicate a design problem,
there is a single and specific design choice throwing a wrench in the
gears. We have a single class acting as both model and controller: the
FileImporter. It both defines part of the data model, AND it is expected to
perform an action (sync) that makes use of several different models
(classic controller territory, and definitely not in-line with OOP).

We made the choice deliberately to put the plugin API methods (sync,
upload, etc.) on the Importer to give the plugin writer a simple
experience. But here we see that is already causing difficulty for plugin
writing.

Perhaps we should re-consider having the Importer both act as model and
controller. I'm using those terms loosely, not to imply that we should have
a strict MVC pattern or anything like that, but just to describe what type
of role any given portion of code is expected to play.

In-person discussion concluded that we should move behaviors related to sync, upload, copy, remove, publish, or anything similar off of the Model objects and into a separate area of each plugin's code. This is a very similar pattern to Pulp 2.

Two new objects will be required: one to hold the importer-related behavior, and another to hold publisher-related behavior. There was not agreement on what a good name would be, but there was a broad desire to keep the models named "Importer" and "Publisher".

The plugin API should provide base classes for each of the two new models. When a plugin implements a subclass, that subclass must be discoverable by the core. For example when a sync task is run on an Importer as known through the REST API, the core must find this other plugin-provided object that implements behaviors that go with the database representation of an Importer.

The new importer-related object could be named ImporterController, pulp_<sometype>.app.controllers.Importer, ImportDoer, or some other name.

The new objects may have references to the models to which they correspond. The models must not have references to the new objects.

Actions #13

Updated by amacdona@redhat.com almost 7 years ago

  • Description updated (diff)
Actions #14

Updated by dkliban@redhat.com almost 7 years ago

  • Sprint/Milestone set to 53
  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #15

Updated by amacdona@redhat.com almost 7 years ago

  • Description updated (diff)
Actions #16

Updated by amacdona@redhat.com almost 7 years ago

  • Has duplicate Task #3220: Expand plugin writer's guide to include RepositoryVersion creating actions (including sync) added
Actions #17

Updated by amacdona@redhat.com almost 7 years ago

  • Description updated (diff)
Actions #18

Updated by amacdona@redhat.com almost 7 years ago

  • Description updated (diff)
Actions #19

Updated by amacdona@redhat.com almost 7 years ago

  • Description updated (diff)
Actions #20

Updated by amacdona@redhat.com almost 7 years ago

  • Has duplicate deleted (Task #3220: Expand plugin writer's guide to include RepositoryVersion creating actions (including sync))
Actions #21

Updated by amacdona@redhat.com almost 7 years ago

  • Has duplicate Task #3224: Update plugin API with repository version changes added
Actions #22

Updated by amacdona@redhat.com almost 7 years ago

  • Status changed from NEW to ASSIGNED
Actions #23

Updated by amacdona@redhat.com almost 7 years ago

  • Assignee set to amacdona@redhat.com
Actions #24

Updated by amacdona@redhat.com almost 7 years ago

  • Status changed from ASSIGNED to POST

Added by amacdona@redhat.com almost 7 years ago

Revision d1abe36f | View on GitHub

Move sync dispatch to plugin API

Modifies the plugin API, moving responsibility for sync from parent classes to the plugins.

closes #3074 https://pulp.plan.io/issues/3074

Added by amacdona@redhat.com almost 7 years ago

Revision d1abe36f | View on GitHub

Move sync dispatch to plugin API

Modifies the plugin API, moving responsibility for sync from parent classes to the plugins.

closes #3074 https://pulp.plan.io/issues/3074

Actions #25

Updated by amacdona@redhat.com almost 7 years ago

  • Status changed from POST to MODIFIED
Actions #27

Updated by amacdona@redhat.com almost 7 years ago

  • Blocks Task #3294: Update Python plugin to sync with versioned repositories added

Added by dalley over 6 years ago

Revision 131ae9ac | View on GitHub

Clean up plugin API Importer docstrings

re: #3074 https://pulp.plan.io/issues/3074

Added by dalley over 6 years ago

Revision 131ae9ac | View on GitHub

Clean up plugin API Importer docstrings

re: #3074 https://pulp.plan.io/issues/3074

Actions #29

Updated by bmbouter over 6 years ago

  • Sprint set to Sprint 31
Actions #30

Updated by bmbouter over 6 years ago

  • Sprint/Milestone deleted (53)
Actions #31

Updated by daviddavis over 5 years ago

  • Sprint/Milestone set to 3.0.0
Actions #32

Updated by bmbouter over 5 years ago

  • Tags deleted (Pulp 3)
Actions #33

Updated by bmbouter almost 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF