Project

Profile

Help

Story #2873

As an authenticated user, I can associate a Content unit with a Repository

Added by dkliban@redhat.com over 2 years ago. Updated 6 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

0%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
No
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

For an API user to associate a Content unit to a Repository, Pulp 3 needs to have the following:

- viewset for the RepositoryContent model that can handle CRD operations
- serializer for the viewset which will return all serialized fields of the RepositoryContent model.
- API endpoint at /api/v3/repositories/<repo-id>/content/
- POST request to the /api/v3/repositories/<repo-id>/content/ endpoint creates Content. The body of the request contains a dictionary that looks like the following:

{
"contents": ["< content >", "< content2 >"]
}

RepositoryContent is created for each content specified in the payload. The repo-id is derived from the URL.

- GET request to the /api/v3/repositories/<repo-id>/content/<content-id> returns serialized RepositoryContent

- DELETE request to the /api/v3/repositories/<repo-id>/content/<content-id> removes the RepositoryContent from the database.

- PUT/PATCH requests to the /api/v3/content/units/<content unit id>/ raises an exception (409 unsupported method)

RepositoryContent is a through model0.

[0] https://docs.djangoproject.com/en/1.9/topics/db/models/#intermediary-manytomany


Related issues

Related to Pulp - Story #2672: As a user, I can associate content with repositories using the content API CLOSED - WONTFIX Actions

History

#1 Updated by dkliban@redhat.com over 2 years ago

  • Tracker changed from Issue to Story
  • % Done set to 0

#2 Updated by bizhang over 2 years ago

  • Tags Pulp 3 Plugin Writer Alpha added

#3 Updated by mhrivnak over 2 years ago

Is this safe to do outside of a task that locks the repository?

#4 Updated by amacdona@redhat.com over 2 years ago

I'd prefer that we keep add/remove in a task. We can currently promise plugin writers that the relevant data will not change while their code is running, and that will make plugin writing much simpler.

If we were to make this synchronous, we would have to add some constraints on the plugin writers.

#5 Updated by bmbouter over 2 years ago

+1 to associations and unassociations occurring in tasks.

#6 Updated by mhrivnak over 2 years ago

Great. I figured you were assuming that, but I wanted to make sure.

#7 Updated by mhrivnak over 2 years ago

There's a mixing of types that we need to straighten out. If you POST a representation of something to a collection endpoint, you should expect that a GET request to that same collection endpoint will return representations of the same general object type. So we cannot POST an entire Content unit, but then GET and find a list of RepositoryContent representations.

If we have an endpoint "/api/v3/repositories/<name>/content/", it should return representations of Content.

Similarly, if we have "/api/v3/repositories/<name>/content/<content-id>", that should return a representation of the Content being identified.

If we instead or additionally want an endpoint to return relationships only, we would benefit from a more specific name. Wordy as it might get, "/api/v3/repositories/<name>/content_relations/" or similar would be appropriate.

To simply create a new Content unit without involving a repository, that makes sense as a POST to "/api/v3/content/".

To also add it to a repository, now we have a multi-step cross-object async operation that is awkward to force into a standard REST create operation; it makes more sense for a "controller" endpoint.

For this use case, where we want one endpoint that can both create a new Content object and associate it to a repository, we need a "controller" endpoint on either the content side or the repo side. I think the latter would work well as an upload endpoint on an Importer. That's the perfect place to say "Take this representation, create a new Content object from it, AND create a task that adds it to your repository."

POST /api/v3/repositories/<name>/importers/<type>/<name>/create_and_associate
<representation of new content>

# Response
HTTP/1.1 202 Accepted
<reference to a task>

It's tempting to split that up into two operations though that would each be hugely valuable on its own.

POST /api/v3/content/
<representation of content>

# Response
HTTP/1.1 201 Created
<representation of newly-created object>

POST /api/v3/repositories/<name>/importers/<type>/<name>/add_content
<URI to a content object>
<URI to a content object>
<URI to a content object>

# Response
HTTP/1.1 202 Accepted
<reference to a task>

This pattern may actually be more convenient for the simple DIY content creation use case. When uploading multiple Content units to a repo, rather than ending up with a separate task to add each and every unit, this would enable a client to do all of the associating in one easily-trackable task that could also be atomic.

As a related point, for reasons we started discussing yesterday and will discuss more, I don't think we'll be able to do content association purely at the core level. The importer will need some involvement for cases like replacing conflicting units, gating on GPG signatures, etc.

So, what do you think of this 2-endpoint approach?

#8 Updated by bmbouter over 2 years ago

Wordy as it is, I think the suggestion of /api/v3/repositories/<name>/content_relations/ is the best, and allow the CRD operations on that to manage content unit relationships.

The controller methods are all valuable too, especially create_and_associate and bulk operations like bulk_associate_unassociate. I hope that we can use model CRUD operations and no factory methods for content (un)association for the MVP.

I was thinking more about content sync/upload and including the importer. If we did involve the importer with a special call it would be for upload not sync because the plugin writer's sync code will create all of the units and associations on its own. For upload though, we may want a complex unit type to be handled and for those cases specifically I think the importer can register a url and handler or something like that. That is important but a separate thing from this ticket because we still need a generic way to associate and unassociate content so exposing the relationships I think is good. I think the complex unit type ticket still needs to be filed.

For the features like GPG filtering we could have a hook that allows a plugin writer to perform association filtering and prevention. That may involve the importer, but it also could be a platform feature that we have them provide a direct implementation. Either way I think we can figure it out when we tackle that feature. It's a good thing to think about though.

For the conflicting units I should have more thoughts on that after we talk about how mutating content units will be handled in the next two weeks. I don't see that use case as special enough to require the importer to be involved in all content unit (un)associations.

#9 Updated by mhrivnak over 2 years ago

bmbouter wrote:

Wordy as it is, I think the suggestion of /api/v3/repositories/<name>/content_relations/ is the best, and allow the CRD operations on that to manage content unit relationships.

The controller methods are all valuable too, especially create_and_associate and bulk operations like bulk_associate_unassociate. I hope that we can use model CRUD operations and no factory methods for content (un)association for the MVP.

There are a couple of interesting aspects here. One is the idea of immutable content sets, aka repo versions. If we are to do that now or later, which either way I consider a killer feature for pulp 3, we cannot give a client/user the impression that they can add or remove relationships directly on a content set. They must initiate an operation that handles the creation of that relationship, and any other accounting that goes with it.

I'll describe the other aspect below, which is why we really need the factory methods in the MVP, even if we also allowed direct relationship manipulation.

I was thinking more about content sync/upload and including the importer. If we did involve the importer with a special call it would be for upload not sync because the plugin writer's sync code will create all of the units and associations on its own. For upload though, we may want a complex unit type to be handled and for those cases specifically I think the importer can register a url and handler or something like that. That is important but a separate thing from this ticket because we still need a generic way to associate and unassociate content so exposing the relationships I think is good. I think the complex unit type ticket still needs to be filed.

For the features like GPG filtering we could have a hook that allows a plugin writer to perform association filtering and prevention. That may involve the importer, but it also could be a platform feature that we have them provide a direct implementation. Either way I think we can figure it out when we tackle that feature. It's a good thing to think about though.

For the conflicting units I should have more thoughts on that after we talk about how mutating content units will be handled in the next two weeks. I don't see that use case as special enough to require the importer to be involved in all content unit (un)associations.

I think there are enough special cases that it's not so special. Examples:
  • For some units, associating them with a repo really means creating a copy. See any Pulp 2 unit type where "repo_id" is part of the unit key, including docker tags, yum repo metadata files, yum package group, etc. The importer must be involved.
  • For some importers, they need to impose a filter, such as signature checking or collision checking.
  • Some types require deliberate replacing in case of collision. A docker tag copied in would overwrite an existing tag of the same name, for example. An rpm would replace one of the same NEVRA. An ISO (File) would replace one of the same path. A Puppet module might replace one of the same name in some cases (currently in Pulp 2 it's handled less gracefully).

As you point out, there are multiple ways we could implement that. There could be some kind of hook that the core grabs and runs, but is that really any different from how sync and publish get called? The core looks up the right object, grabs the associated code, and runs it. There's definitely an opportunity to have a default implementation that just creates a simple relationship with no other logic imposed, but that's getting into implementation details that aren't directly related to the REST API. One way or another, the Importer must be involved in importing content to a repository for many or most of our plugins.

Generally, Pulp has used the Importer as the single point of entry for content into a repository because of complexities like those listed above. We know for sure that Pulp 3 will require these abilities. Though we may not have discussed it directly, this has to be part of the MVP or else we would lose the ability to port half or more of our plugins.

Giving the user the ability to directly say "add this content to that repo" is simple and important for all the reasons we've discussed and agreed on. I don't see that it requires bypassing the Importer though. It's easily accomplished as an action on an Importer.

I'm good with allowing the user to directly create artifacts and Content itself. I'm even good with the idea of at some point allowing the user to directly create relationships without the Importer's involvement ("please sign this waiver..."), and I see value in that. Maybe we even should do that right now to enable simple cases to get off the ground quick. But unfortunately it isn't a substitute for having Importer involvement. And there's also the element of making sure we don't paint ourselves in a corner with regard to repo versions, which may need a bit more thought.

#10 Updated by amacdona@redhat.com over 2 years ago

  • Duplicates Story #2672: As a user, I can associate content with repositories using the content API added

#11 Updated by amacdona@redhat.com over 2 years ago

  • Duplicates deleted (Story #2672: As a user, I can associate content with repositories using the content API)

#12 Updated by amacdona@redhat.com over 2 years ago

  • Related to Story #2672: As a user, I can associate content with repositories using the content API added

#13 Updated by dalley over 2 years ago

I will help make sure this gets groomed by Wednesday

#14 Updated by bmbouter over 2 years ago

So if we do have factory methods, what are they? We'll need to handle adding, removing, and a mix of both add/remove ideally. What would these factory urls be?

#15 Updated by mhrivnak over 2 years ago

Great question. In pulp 2, we have some unintuitive names, but it basically comes down to copy and remove.

Copy allows you to address the set of content in a repo, filter that set, and associate the result with the receiving repo. This is a common part of how users start a promotion workflow. They have a repo that mirrors some remote source, and then they copy content from that repo into some kind of "testing" repo whenever they are ready to consider deploying that specific content.

Remove allows you to match within the set of content associated with a repo to have it removed.

A notably missing feature from Pulp 2 is the ability to associate an arbitrary piece of content without respect to any repo it might currently happen to be associated with. Once content is orphaned, there is no action a user can take to directly associate it to a repo. I think we want that ability in Pulp 3. It should be roughly the same functionality provided by copy, but just a different way of matching the content set.

All three operations require plugin involvement, so I think they are reasonable to leave on the Importer as they are in Pulp 2. That also fits the Pulp 3 approach of immutable content sets (repo versions).

POST /api/v3/repositories/<name>/importers/<type>/<name>/add_content
<URI to a content object>
<URI to a content object>
<URI to a content object>

# Response
HTTP/1.1 202 Accepted
<reference to a task>

"add_content" could be named "associate" perhaps, or something else similar.

Copy and Remove are surprisingly tricky to nail down because of the need for filtering. We could presume that a "content" endpoint will support some filtering features via query parameters, and that these endpoints could support the same query parameters.

POST /api/v3/repositories/<name>/importers/<type>/<name>/remove_content
<URI to a content object>
<URI to a content object>
<URI to a content object>

or

POST /api/v3/repositories/<name>/importers/<type>/<name>/remove_content?arch=i386

In any case, the product of the task that runs any of these operations should be a new repository version.

#16 Updated by amacdona@redhat.com over 2 years ago

One alternative that should be considered before this is (re)implemented is that we have a RepositoryContentViewSet.
https://github.com/pulp/pulp/blob/3.0-dev/platform/pulpcore/app/viewsets/repository.py#L198

This ViewSet can associate existing units, and disassociate. (Or add and remove, if we prefer that language.) Additionally, we can use filtering to show content for a particular repository.

Making use of the importer might be weird, but we might be able to accomplish that with filtering as well if necessary.

#17 Updated by mhrivnak over 2 years ago

Having chatted with @dkliban today, we are going to defer work on this issue until we can figure out if the RepositoryContentViewSet is functional and meets the need.

#18 Updated by dalley over 2 years ago

Minor nitpick, but since we decided (at least it seems like it was decided) to move away from language like "associate" and "disassociate" and toward language like "add/remove content to repository", we should reword this issue accordingly, and start trying to use that language from this point forward.

Email chain here:
https://www.redhat.com/archives/pulp-dev/2017-May/msg00048.html

#19 Updated by mhrivnak about 2 years ago

ttereshc volunteered to test and evaluate this.

It is implemented but likely needs to be task-ified to create the association with a lock on the repo.

#20 Updated by ttereshc about 2 years ago

Current state:

  • /api/v3/repositories/<repo_id>/content/

Only GET is available; implemented as a custom detail route on a Repository model, lists content of a particular repo.

$ http http://127.0.0.1:8000/api/v3/repositories/file_repo_copy/content/

HTTP/1.0 200 OK
Allow: GET, HEAD, OPTIONS
Content-Length: 350
Content-Type: application/json
Date: Thu, 10 Aug 2017 20:08:35 GMT
Server: WSGIServer/0.2 CPython/3.5.3
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN

{
    "next": null,
    "previous": null,
    "results": [
        {
            "_href": "http://127.0.0.1:8000/api/v3/content/file/a102f252-e493-4056-aa14-83eb83bc4018/",
            "artifacts": {
                "3.iso": "http://127.0.0.1:8000/api/v3/artifacts/eff3d741-e81d-4fdd-a7d1-3c4e775c6591/" 
            },
            "digest": "d411a21216f2b496b6e79a9bb5aa3b70fa2315fa86afee9ca5e63991c9ae4398",
            "notes": {},
            "path": "3.iso",
            "type": "file" 
        }
    ]
}
  • /api/v3/repositorycontents/

Only GET and POST are currently available on a RepositoryContentViewset, no DELETE (destroy method should be defined explicitly, I guess because it's a viewset of a through model).

GET lists all the associations between any repository and any content.

http GET http://127.0.0.1:8000/api/v3/repositorycontents/ 

HTTP/1.0 200 OK
Allow: GET, POST, HEAD, OPTIONS
Content-Length: 694
Content-Type: application/json
Date: Thu, 10 Aug 2017 20:06:46 GMT
Server: WSGIServer/0.2 CPython/3.5.3
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN

{
    "next": null,
    "previous": null,
    "results": [
        {
            "content": "http://127.0.0.1:8000/api/v3/content/file/a6f4b106-d45a-4e08-a23d-6776b31cfd56/",
            "repository": "http://127.0.0.1:8000/api/v3/repositories/file_repo/" 
        },
        {
            "content": "http://127.0.0.1:8000/api/v3/content/file/a102f252-e493-4056-aa14-83eb83bc4018/",
            "repository": "http://127.0.0.1:8000/api/v3/repositories/file_repo_copy/" 
        },
        {
            "content": "http://127.0.0.1:8000/api/v3/content/file/a102f252-e493-4056-aa14-83eb83bc4018/",
            "repository": "http://127.0.0.1:8000/api/v3/repositories/file_repo/" 
        },
        {
            "content": "http://127.0.0.1:8000/api/v3/content/file/4a8637da-f11f-46cd-89f1-6bfad61f01b1/",
            "repository": "http://127.0.0.1:8000/api/v3/repositories/file_repo/" 
        }
    ]
}

POST adds one piece of content to a repo, though I don't think it takes care of type of content, so content of any type can be associated to any repo via this endpoint currently.
URLs of content and repo should be provided.

$ http POST http://127.0.0.1:8000/api/v3/repositorycontents/ 
      content='http://127.0.0.1:8000/api/v3/content/file/a102f252-e493-4056-aa14-83eb83bc4018/'
      repository='http://127.0.0.1:8000/api/v3/repositories/file_repo_copy/'

HTTP/1.0 201 Created
Allow: GET, POST, HEAD, OPTIONS
Content-Length: 166
Content-Type: application/json
Date: Thu, 10 Aug 2017 19:57:33 GMT
Server: WSGIServer/0.2 CPython/3.5.3
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN

{
    "content": "http://127.0.0.1:8000/api/v3/content/file/a102f252-e493-4056-aa14-83eb83bc4018/",
    "repository": "http://127.0.0.1:8000/api/v3/repositories/file_repo_copy/" 
}

Apart from addition/removal inside a task, I guess content and repo type compatibiltiy should be checked somewhere.

#21 Updated by bmbouter about 2 years ago

The urls look a little strange, but it's REST compliant so it should be easy for REST clients to interact with them.

I don't think we need to enforce (or check) content and repo type compatability currently. I'm basing this on the MVP not having a use case that provides that.

I think the only requirement is to make it run the associate/disassociate in a task and return a 202 via HTTP w/ the call report.

Thanks for looking at this!

#22 Updated by dkliban@redhat.com about 2 years ago

  • Status changed from NEW to CLOSED - CURRENTRELEASE

#23 Updated by bmbouter almost 2 years ago

  • Tags deleted (Pulp 3 Plugin Writer Alpha)

Cleaning up Redmine tags

#24 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#25 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF