Project

Profile

Help

Story #7708

Improve content creation experience

Added by wibbit 6 days ago. Updated 1 day ago.

Status:
NEW
Priority:
Normal
Assignee:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

Currently when attempting to create RPM content from an already uploaded artefact, if a NEVRA already exists for that content, the task fails.

There are many reasons why a single NEVRA could translate to multiple SHA's depending on build time/host etc.

From within Pulp, I don't believe (outside of passing a string), there is no way to find the HREF of the conflicting package.

  1. An artefact has no knowledge of its content (I can't query the artefact for its NEVRA, and then query the content)
  2. Though you can see the filename (fairly reliable, but not guaranteed), when looking at content, you can't query by file, you'd need to iterate over the entire content, or pass the filename (not reliable as I believe filename and metadata can and often do vary)
  3. When a task fails due to conflicting NEVRA, the Description string in the error message DOES include NEVRA + conflicting pkgId, however I'd rather avoid having to pass strings if at all possible.

So some initial thoughts (not prescribing just initial brainstorming that would work for me).

  1. The task does not fail but instead returns the pkgId/href of the conflicting package.
  2. Provide a method to query content for a would-be created content from an artifact, and IF it were to conflict provide. a) The NEVRA of the conflicting package, which can then be used to query the content for said package. b) All of the metadata from the would be created content (not the existing content). b) Simply return the href for the pre-existing package.

A & B, should have no implications for RBAC, as it's only returning the information for the artefact being referenced, and also allows the developer to subsequently query the content using the NEVRA and see if there are more concerning differences between the pre-existing content and the desired new content, and choose what to do (delete pre-existing, and replace with new, throw away new, and keep old, etc etc etc)

History

#1 Updated by ttereshc 5 days ago

For the sake of starting a discussion, what if we add resource_href to the task report.
created_resources will be an indication whether the resource has been created or not.

The potential problem with this approach is when multiple resources are being created.
I think if we are creating a resource using POST to its dedicated endpoint (C of the CRUD), resource_href can be there. If it's a special action it should not be filled. E.g.
POST /pulp/api/v3/content/rpm/packages/ will have a resource_href filled in the task report when the task is complete.
POST /pulp/api/v3/rpm/rpm/repositories/<uuid>/sync won't have the resource_href filled in the task report when the task is complete.

Any security concerns? will RBAC help here?

Most importantly, any other ideas?

#2 Updated by dalley 5 days ago

While it would still be stretching definitions, I think it might be OK to use a new field named "updated_resources" for this.

Even though content are immutable, I don't think it would necessarily be harmful to overload the meaning in this context - at least not as harmful as overloading "created_resources". And it would have uses elsewhere as well, so we can be less concerned about making a mess of the task API for this specific use case.

#3 Updated by bmbouter 5 days ago

If the Content created from an artifact the user provides, I think it would be useful to return what would have been created as if it was created. Effectively, this changes the behavior to be a "get_or_create" type of expectation with our Content creation endpoints. Similarly for the Artifact creation endpoints. I see this change as a good thing for the user experience. Couldn't we do that with no additional fields added and use created_resource

We don't RBAC content or artifacts themselves today, but when this occurs a user providing an artifact or one that would produce content that already exists, the user would get access to that existing artifact and existing content also in the permissions sytem. They would previously not had access to it. This is all future stuff though, for now paragraph 1 here is my +1 to making this better.

#4 Updated by wibbit 1 day ago

From a personal perspecitve the create_or_update/create_or_return works well for my work flow, It think I prefer return as opposed to update, especially if we can't update.

@bmboute regarding the RBAC, and access to content, the only thing I would highlight here is, from an RPM perspective at least, I believe the NEVRA constitues uniqueness, so there would be nothing stopping me uploading an empty package with the correct nevra details, which theoretically could give access to a package that I don't actually have.

What meaningful implication that has, I'm unsure, however I thought it was worth while highlighting that just because you can generate the nevra, does not mean the original content is available.

#5 Updated by ipanova@redhat.com 1 day ago

wibbit wrote:

From a personal perspecitve the create_or_update/create_or_return works well for my work flow, It think I prefer return as opposed to update, especially if we can't update.

@bmboute regarding the RBAC, and access to content, the only thing I would highlight here is, from an RPM perspective at least, I believe the NEVRA constitues uniqueness, so there would be nothing stopping me uploading an empty package with the correct nevra details, which theoretically could give access to a package that I don't actually have.

What meaningful implication that has, I'm unsure, however I thought it was worth while highlighting that just because you can generate the nevra, does not mean the original content is available.

I agree with you, that's why we should consider using checksum as well.

#6 Updated by bmbouter 1 day ago

wrote:

I agree with you, that's why we should consider using checksum as well.

I agree as well. The ideal situation is one where any user who provides the binary data can have access to content which is identical to that binary data. Additionally, any time Pulp doesn't already have content units that correspond to user provided binary data that data should be able to be saved, for example NEVRA "squatting" should not be possible.

#7 Updated by wibbit 1 day ago

bmbouter wrote:

wrote:

I agree with you, that's why we should consider using checksum as well.

I agree as well. The ideal situation is one where any user who provides the binary data can have access to content which is identical to that binary data. Additionally, any time Pulp doesn't already have content units that correspond to user provided binary data that data should be able to be saved, for example NEVRA "squatting" should not be possible.

Just to be clear, NEVRA != HASH/CHECKSUM due to a valid "identical" package being built at different times/different hosts, so a hash against a artifact can't be used to compare RPM content.

I'd not considered "NEVRA" squatting, which I'd see as more of an annoyance than a risk.

My concern was valid content being present, and some one using an "empty" artifact that has the correct NEVRA metadata to gain access to the pre-existing content in some way. Though I think via the API, that is liable to be limited to metadata, so I'm unsure how more of a real risk this poses.

Having a create_or_return v's a dedicated "query content based on this artifact".

That would leave elements to the API consumer, to query first, however probably would work better with the RBAC logic.

I think at this point, I'm probably better stepping back and letting those that know the platform best, decide the right solution :D

Please register to edit this issue

Also available in: Atom PDF