Project

Profile

Help

Story #5367

As a user, I can have a sync option that retains the latest N packages of the same name

Added by paji@redhat.com 5 months ago. Updated 5 months ago.

Status:
NEW
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
No
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

Goal

Provide a way to specify that for example only the latest N versions of every package with the name name are retained in the repository.

Sync time option retain-old-count

RPM sync will take a sync-time option named retain-old-count with an integer value. This will cause N versions of a package to be retained. Two packages with the same name are considered the same package.

When used the integer value needs to be validated as an Integer that is greater than 0 and raise a Validation error otherwise.

Determining latest

Latest is determine by comparison of the <epoch, version, and release> triplet.

Incompatibility with mirror=True

The mirror=True option requires the retained packages to be a mirror of the remote so mirror=True cannot to be used when the user is also specifying retain-old-count.

This needs to raise a Validation error if mirror=True and retain-old-count are specified together.

Pulp2 Equivalent

In Pulp2 it was called retain-old-count.

Implementation

The RpmDeclarativeVersion should implement a create method that will mimic the one from core The only difference in the RPM one is that instead of adding ContentUnassociation as the last stage a new custom RPM stage should be implemented. Let's call that stage ContentUnassociationRetainN (as a working name).

The ContentUnassociationRetainN stage works in place of the ContentUnassocaition in core and runs with similar assumptions. Specifically it receives queryset objects not DeclarativeContent objects like earlier stages in the pipeline. You can see those emitted from the stage before

The ContentUnassicationRetainN stage needs to further filter these unassociation querysets to filter out units that would be removed but shouldn't be due to their NEVRA. The content associated with the repository is already in place so between the queryset of items marked for removal and the content that is "known good" being outside of those querysets one should be able to compute the previous N versions somehow.

I think we can start with something inefficient but correct and improve it over time through profiling and the explain operator.

Katello Related Issue

https://projects.theforeman.org/issues/16154

History

#1 Updated by ipanova@redhat.com 5 months ago

  • Description updated (diff)

#2 Updated by bmbouter 5 months ago

  • Subject changed from Pulp 3 Limit rpm packages to sync. to As a user, I can have a sync option that retains the latest N packages of the same name
  • Description updated (diff)

A rewrite to bring a first-cut design we can use to iterate on before implementing.

#3 Updated by bmbouter 5 months ago

  • Tracker changed from Issue to Story
  • % Done set to 0

converting to story.

#4 Updated by ipanova@redhat.com 5 months ago

Thank you for writing this down. General workflow is clear and straightforward, however the caveat that was brought up during the meeting consists exactly in the part of calculation of previous N versions. Any ideas on that?

#5 Updated by bmbouter 5 months ago

wrote:

Thank you for writing this down. General workflow is clear and straightforward, however the caveat that was brought up during the meeting consists exactly in the part of calculation of previous N versions. Any ideas on that?

This was not the caveat question that I had heard at the meeting. It is a good one though. Here's are some ideas. What do you think about these?

  1. slow option

Form a queryset that counts the number of packages per name. e.g. 'foo' v1, v1.2, v1.3 would yield a count of 3. Filter out any packages that are <= retain-old-count. These are the only package names that need additional consideration.

From there you could do the filtering in Python to determine which of the packages 'foo' needs removal. doing it in Python is probably too slow though.

  1. database speedups

To speed it up, create a postgreSQL trigger that pre-computes this version rank as an internal integer for each Package row as its inserted. Then Django filter to rank the results in a version-sorted order on this field. This would negate the need to Python-based filtering. Keep the first N (newest) and unassociate the remaining.

We have some triggers (different kinds but similar in concept) in pulp_ansible here for example: https://github.com/pulp/pulp_ansible/blob/master/pulp_ansible/app/migrations/0004_add_fulltext_search_indexes.py

Please register to edit this issue

Also available in: Atom PDF