Project

Profile

Help

Story #3360

As a user, I can create a repository version from any repository version

Added by dkliban@redhat.com over 1 year ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Katello-P3
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 41

Description

Motivation

As a user I want to be able to revert a repository at repository version Y to repository version X from before.

As a user I want to treat different lifecycle environments as separate repositories. When the 'dev' repository is ready to be promoted to the 'testing' repository, I want to create a new repository version of 'testing' that is exactly the same as a specific repository version of 'dev'.

Currently this can be achieved by:

1. Determine which units need to be added to the latest repository version Y and which need to be removed to achieve the same content set as repository version X.
2. Create a new repository version Z by using information from step 1.

Step 1 requires the user to create some tooling for calculating the diffs. The user then also has to make sure that nothing happens to the repository in between when the diff was computed and when the new repository version is created.

Solution

A user should be able to create a repository version by specifying an href for another repository version. This would remove the responsibility of finding deltas between repository versions from the user.

The /api/v3/repositories/<id>/versions/ API should accept a 'base_version' parameter. The base_version can point to any repo version. The base_version param can be used in conjunction with 'add_content_units' and 'remove_content_units' which modify the content from base_version.


Related issues

Related to Pulp - Story #3842: As a user, I can track the cloned repository versions via branches NEW Actions
Related to Pulp - Test #4035: Test base_version parameter CLOSED - COMPLETE Actions

Associated revisions

Revision d0ea420b View on GitHub
Added by daviddavis about 1 year ago

Allow users to specify a base_version when creating a repo version

fixes #3360
https://pulp.plan.io/issues/3360

Revision d0ea420b View on GitHub
Added by daviddavis about 1 year ago

Allow users to specify a base_version when creating a repo version

fixes #3360
https://pulp.plan.io/issues/3360

Revision d0ea420b View on GitHub
Added by daviddavis about 1 year ago

Allow users to specify a base_version when creating a repo version

fixes #3360
https://pulp.plan.io/issues/3360

History

#1 Updated by dkliban@redhat.com over 1 year ago

  • Description updated (diff)

#2 Updated by dkliban@redhat.com over 1 year ago

  • Description updated (diff)

#3 Updated by mhrivnak over 1 year ago

I've been a fan of the hotfix repository model in the past. It provides the needed capability, and I actually like that it keeps an exceptional workflow in a separate repo with a distinctly separate branch of history. As long as a user is aware of and looks at both repos (I agree this is an important usability factor that needs to be facilitated), it's easy to reason about and understand.

One way to address the concern of history being "split between original repository and 'hotfix' repository" is to focus on this history from the distribution perspective. We've talked before about keeping history of what version was available through a given distribution at any given time. I don't know where that stands today, but in this use case, the distribution is what ties the two repositories together, not just logically, but from the client experience. Tracking history there would make it obvious that a hotfix repository took the place of the main repo for a period of time.

Another idea worth considering is a formal "clone" type relationship between versions. Version 1 of the hotfix repo would have a foreign key to version 153 of the main repo, designating that it was created as a direct clone of that version. That may require that both versions be undeletable. Just an idea, but I like that it would facilitate clean branching into a separate repo with a clear and simple tie-in to the original.

Lastly, a benefit of "branching" into a new repository is that it provides an obvious and familiar opportunity for the user to describe the new line of history. In this case, they can use the word "hotfix" in the repo name, plus a small description. While it is likely possible to introduce an idea of different branches of history within a single repository, I think you would also need to add a way to descriptively identify those branches. It could be done, but as they say, "you'd have to do it." When the dust settled, you might find yourself with a new model called "Branch" that looks a whole lot like a repo.

#4 Updated by milan over 1 year ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes

#5 Updated by bmbouter over 1 year ago

Even aside from this use case, I value the ability to set the content of a repo by specifying another repo version by URI. This allows for promotion or snapshot recovery use cases and is easy to implement.

#6 Updated by dkliban@redhat.com over 1 year ago

  • Subject changed from As a user, I can create a repository version from a base version to As a user, I can create a repository version from any repository version
  • Description updated (diff)
  • Groomed changed from Yes to No
  • Tags Pulp 3 MVP added

Updated the description to clarify the use cases and to limit how the new parameter can be used.

@mhrivnak, I think this still supports the use cases you described.

#7 Updated by jortel@redhat.com over 1 year ago

I'm concerned that having a single endpoint with a complicated combination of parameters that control how the endpoint behaves isn't ideal. Especially since some of the parameters are mutually exclusive. Seems that having /api/v3/repositories/<id>/versions/ endpoint do one thing is cleaner. Should we go with the approach of simpler endpoints, I would suggest something like /api/v3/repositories/<id>/versions/clone/ that accepts parameter "version" URI in the body that is an href to an existing version. If we go with a single complex endpoint, I'd suggest "cone_version" URI as the parameter.

As an aside, the existing add_content_units and remove_content_units should be renamed. "_content" is already plural so adding the "_units" is the singular form that's made plural. Should just be add_content, remove_content.

#8 Updated by milan over 1 year ago

I thought the same; might referring here to a concept the user might already be familiar with, such as Git, imply any false assumptions about Pulp behavior? If not, I'd as well prefer the POST:/repositories/<mine>/versions/<123>/clone/ endpoint.

Also would be nice to give a sample request and response (bodies) in the description to get a better idea how the endpoint should behave.

If the discussion is still ongoing about the solution, I'd vote for the "clone relationship", as suggested by @mhrivnak.
I suspect folding of the history branches onto a single repository history line will make telling what hotfix was included in what repo version messy. Even more so with multiple 'hotfixes' co-existing in overlapping time/history intervals i.e a hotfix of a hotfix i.e the "Oops, I forgotten to add this rpm in the hotfix repo version production publication... There, done, almost screwed it up! ...now, where was the last non-fixed version so I can clone it again to restore the repo version Head(?!) and finally call it a day 3:43 am... " case.

#9 Updated by jortel@redhat.com over 1 year ago

Warning: Crazy thinking here.

Are users really looking to clone repositories or manage branches of the same repository? What if instead of cloning an entirely new repository, pulp supported branches. The hotfix scenario could just be a branch. Pulp could emulate git by having a "master" branch by default which points to the latest version (aka commit).

First, let me say there is a lot of room for improvement in many of these details :) This is just a raw idea dump.


RepositoryBranch(Model):
  • name
  • repository (FK)
  • version (number) (a.k.a "HEAD")

To list and manipulate branches: GET: /repositories/<id>/branches/

POST: /repositories/<id>/branches/
  • name
  • base (URI): (optional) repository version. Default: master/HEAD

POST: /repositories/<id>/versions/
  • branch <URI>: (optional) branch. Default: master. Used to automatically update the branch HEAD with newly created version.
  • base <URI>: (optional) base version.
  • add_content <list>: (optional) list of <URI> content.
  • remove_content <list>: (optional) list of <URI> content.

Note: Proposing ^^ a single endpoint here may seem to contradict my previous comment. But since the parameters are not mutually exclusive, I don't see this as complex.


HotFix example (using natural keys for clarity):

Repository RHEL-7 "master" at version 10

To hotfix:

POST: /repositories/RHEL-7/branches/
  • name: "hotix"
  • base: /repositories/RHEL-7/versions/10
POST: /repositories/RHEL-7/versions/
  • branch: "hotfix"
  • base: /repositories/RHEL-7 /versions/10/
  • add_content: [...]

This ^ creates a new version and updates branch "hotfix" with new version: 11. User can then publish version 11 and "master" is unaffected.

Continuing the crazyness, the parameter to endpoints taking RepositoryVersion (URI), such as sync and publish, could also take either a RepositoryBranch URI (which get translated to HEAD) or a RepositoryVersion URI.

If we adopted the branch model, I would advocate for RepositoryVersion.number to become a UUID instead of an integer for two reasons.
  • avoid concerns when the number is not contiguous.
  • we don't need to manage the Repository.last_version.

This could be a very power tool to efficiently support promotion work flows. Each environment dev, testing, production could be a branch instead of separate repositories.

#10 Updated by mdellweg over 1 year ago

I recently had some thoughts about the branched Versions, too. They are more from the database view, that the rest api.
Maybe they can extend the crazy thinking above. I will just cite myself:

As i understand it (from looking at the models), when publishing a version, we make a big sql join over Content and ContentVersion, something like
---
select Content.*, last(ContentVersion) as LCV from Content, ContentVersion where Content.Repository = repo_id and (ContentVersion.version_added <= ver or ContentVersion.version_deleted <= ver) group by Content.id where LCV.version_added
---
Not sure, whether this is valid SQL, but i hope, i got the point.

Now, if we added a foreignKey previous_version to RepositoryVersion, we could traverse that path to get an individual history of versions, and we would just need to restrict the query above to that list of versions. Voila, branched versions without much overhead to the database.

#11 Updated by amacdona@redhat.com over 1 year ago

I think we should move forward with a single endpoint:

POST /v3/repositories/1234/versions/ 

with the request.body:
{add_content: [], remove_content: [] base_version: <uuid>}

IMO, this is the simplest path forward. We can revisit this when we discuss possible branching workflows, but we shouldn't hold this work up to wait for a conclusion on branching.

#12 Updated by milan over 1 year ago

wrote:

Warning: Crazy thinking here.

Are users really looking to clone repositories or manage branches of the same repository? What if instead of cloning an entirely new repository, pulp supported branches. The hotfix scenario could just be a branch. Pulp could emulate git by having a "master" branch by default which points to the latest version (aka commit).

I like this idea too, any formal branch/clone mechanism that reflects the non-linear history of repo versions should do.
Branches, being an abstraction closer to the common content versioning systems concepts, may make the user feel more at home so I actually prefer it more than the repo-cloning mechanism. This approach might also prove more space efficient when compared to the copying to create a clone.

First, let me say there is a lot of room for improvement in many of these details :) This is just a raw idea dump.


RepositoryBranch(Model):
  • name
  • repository (FK)
  • version (number) (a.k.a "HEAD")

To list and manipulate branches: GET: /repositories/<id>/branches/

POST: /repositories/<id>/branches/
  • name
  • base (URI): (optional) repository version. Default: master/HEAD

POST: /repositories/<id>/versions/
  • branch <URI>: (optional) branch. Default: master. Used to automatically update the branch HEAD with newly created version.
  • base <URI>: (optional) base version.
  • add_content <list>: (optional) list of <URI> content.
  • remove_content <list>: (optional) list of <URI> content.

Note: Proposing ^^ a single endpoint here may seem to contradict my previous comment. But since the parameters are not mutually exclusive, I don't see this as complex.

How about a formal HEAD endpoint? Just like with Git, HEAD would refer to whatever branch is currently "checked-out":

GET@/repositories/<id>/HEAD/:
{
    'branch': <branch URI>,
    'version': <branch version>,
}

POST@/repositories/<id>/HEAD/checkout/
# request body:
{
    'branch': <branch URI to check-out>,
}
This might have some interesting use cases; assuming all repo "pending" content units are "attached" to the (branch) HEAD:
  • add content units:
        POST@/repositories/<id>/HEAD/content/
        # request body:
        {'units': [
            <uri content unit x>,
            <uri content unit y>,
            ...,
        ]},
       # response body:
        {'units': [
           <content_unit_obj x>,
           <content_unit_obj y>,
        ]} 
     
  • remove units:
        DELETE@/repositories/<id>/HEAD/content/<unit_id>/
      
  • get current diff against current or another branch:
        GET@/repositories/<id>/HEAD/diff/[&branch=<branch_id>][&detailed=true]
        {
           # generic (core) diff:
           'added': [<unit_obj>, <unit_obj>, ...],
           'deleted': [<unit_obj>, <unit_obj>, ...],
           # specific (plug-in) diff:
           'updated/replaced': [{'old': <old_unit_obj>, 'new': <new_unit_obj>}, {....}, ...],
         }
      
  • commit changes:
        POST@/repositories/<id>/HEAD/commit/
        # response body:
        {
            'branch': <current branch>,
            'version': <new repo version obj>,
        } 
      

Branching would utilize a separate endpoint:

  GET@/repositories/<id>/branches/
   {'branches': [
      <branch_MASTER_obj>,
      <branch_a_obj>,
      <branch_b_obj>,
   ]}

  POST@/repositories/<id>/branches/
  # request body:
  {'branches': [
     'name': <branch name>,
     'version': <repo version uri>,
  ]}
  # response body:
  {'branches': [
    <a new branch obj>,
  ]}

There would always be the MASTER branch, that can't be deleted:

  GET@/repositories/<id>/branches/MASTER/
  {
      'name': 'MASTER',
      'version': <repo version uri>,
  }


HotFix example (using natural keys for clarity):

Repository RHEL-7 "master" at version 10

To hotfix:

POST: /repositories/RHEL-7/branches/
  • name: "hotix"
  • base: /repositories/RHEL-7/versions/10
POST: /repositories/RHEL-7/versions/
  • branch: "hotfix"
  • base: /repositories/RHEL-7 /versions/10/
  • add_content: [...]

This ^ creates a new version and updates branch "hotfix" with new version: 11. User can then publish version 11 and "master" is unaffected.

Continuing the crazyness, the parameter to endpoints taking RepositoryVersion (URI), such as sync and publish, could also take either a RepositoryBranch URI (which get translated to HEAD) or a RepositoryVersion URI.

With the formal HEAD endpoint:
  • create a hotfix branch:
        POST@/repositories/RHEL-7/branches/
        # request body:
        {'branches': [
            'name': 'heartbleed_hotfix',
            # absent 'version' means HEAD.branch.version; note: there's always the MASTER branch
        ]}
        # response body:
        {'branches': [
            {
                'name': 'heartbleed_hotfix',
                'id': <branch id>,
                'version': <MASTER.version>,
                '_uri': <link>,
            },
            ...,
            <MASTER branch obj>,
        ]}
      
  • checkout the branch:
        POST@/repositories/RHEL-7/HEAD/checkout/
        # request body:
        {
            'branch': 'heartbleed_hotfix',
         }
        # response body:
        {
           'branch': 'heartbleed_hotfix',
        }
      
  • "patch" the branch:
      POST@/repositories/RHEL-7/HEAD/content/
      # request body:
      {'content': [
          <uri heartbleed patch content unit>,
      ]}
      # response body:
      {'content':[
          <content heartbleed patch unit obj>,
          <content obj>,
          ...
       ]}
      
  • check everything is sane:
        GET@/repositories/RHEL-7/HEAD/
        {
            'branch': 'heartbleed_hotfix',
            'version': <current branch version (same as MASTER.version)>,
         }
    
        GET @/repositories/RHEL-7/HEAD/diff/
        {
           'added': [<the heartbleed patch unit obj>,
           'removed': [],
        } 
      
  • commit the change:
      POST@/repository/RHEL-7/HEAD/commit/
      # response body
      {
         'branch': 'heartbleed_hotfix',
         'version': <new version in the hotfix branch>,
      }
      

I think the ability to check before committing is the value added here.

If we adopted the branch model, I would advocate for RepositoryVersion.number to become a UUID instead of an integer for two reasons.
  • avoid concerns when the number is not contiguous.
  • we don't need to manage the Repository.last_version.

+1

This could be a very power tool to efficiently support promotion work flows. Each environment dev, testing, production could be a branch instead of separate repositories.

+1

#13 Updated by milan over 1 year ago

mdellweg wrote:

I recently had some thoughts about the branched Versions, too. They are more from the database view, that the rest api.
Maybe they can extend the crazy thinking above. I will just cite myself:

As i understand it (from looking at the models), when publishing a version, we make a big sql join over Content and ContentVersion, something like
---
select Content.*, last(ContentVersion) as LCV from Content, ContentVersion where Content.Repository = repo_id and (ContentVersion.version_added <= ver or ContentVersion.version_deleted <= ver) group by Content.id where LCV.version_added

I think the version expression isn't correct; better, maybe (ContentVersion.version_added <= ver < ContentVersion.version_deleted or ContentVersion.version_deleted < ver)

---
Not sure, whether this is valid SQL, but i hope, i got the point.

Now, if we added a foreignKey previous_version to RepositoryVersion, we could traverse that path to get an individual history of versions, and we would just need to restrict the query above to that list of versions. Voila, branched versions without much overhead to the database.

This might work indeed; supposing we keep track of all the historical ContentVersion records and we declare a Content unit orphaned if no ContentVersion refers to it anymore. I guess any branching implies keeping track of the ContentVersion records indefinitely.

#14 Updated by milan over 1 year ago

wrote:

I think we should move forward with a single endpoint:

[...]
with the request.body: [...]

IMO, this is the simplest path forward. We can revisit this when we discuss possible branching workflows, but we shouldn't hold this work up to wait for a conclusion on branching.

It might not be the most convenient abstraction for the user to use and we might have to stick with its (branching) implications till Pulp4.
On the other hand, I don't see how this endpoint might impair branching (API) decisions later, so +1.

#15 Updated by milan over 1 year ago

...maybe the most value would be added with the ability to add custom tags to repo versions...

#16 Updated by amacdona@redhat.com over 1 year ago

  • Tags deleted (Pulp 3 MVP)

#17 Updated by jsherril@redhat.com over 1 year ago

Note, it is required/assumed that we could specify a version from a different repo from a katello perspective.

For example, i want a new version of repo B that matches repo A version 4

#18 Updated by daviddavis over 1 year ago

  • Description updated (diff)

Updated the description based on email to pulp-dev list. Also, created a separate issue for discussing repo version branches:

https://pulp.plan.io/issues/3842

#19 Updated by daviddavis over 1 year ago

  • Related to Story #3842: As a user, I can track the cloned repository versions via branches added

#20 Updated by bmbouter over 1 year ago

  • Groomed changed from No to Yes

+1 to this story; I'm grooming it.

+1 to adding to a sprint.

#21 Updated by daviddavis over 1 year ago

  • Tags Katello-P3 added

#22 Updated by dkliban@redhat.com about 1 year ago

  • Sprint set to Sprint 41

#23 Updated by daviddavis about 1 year ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis

#24 Updated by daviddavis about 1 year ago

  • Status changed from ASSIGNED to MODIFIED
  • % Done changed from 0 to 100

#25 Updated by daviddavis about 1 year ago

  • Related to Test #4035: Test base_version parameter added

#26 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#27 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF