As a user, I can have a sync option that retains the latest N packages of the same name
Provide a way to specify that for example only the latest N versions of every package with the name name are retained in the repository.
Sync time option retain-old-count¶
RPM sync will take a sync-time option named
retain-old-count with an integer value. This will cause N versions of a package to be retained. Two packages with the same name are considered the same package.
When used the integer value needs to be validated as an Integer that is greater than 0 and raise a Validation error otherwise.
Latest is determine by comparison of the <epoch, version, and release> triplet.
Incompatibility with mirror=True¶
The mirror=True option requires the retained packages to be a mirror of the remote so mirror=True cannot to be used when the user is also specifying
This needs to raise a Validation error if mirror=True and retain-old-count are specified together.
In Pulp2 it was called
The RpmDeclarativeVersion should implement a create method that will mimic the one from core The only difference in the RPM one is that instead of adding ContentUnassociation as the last stage a new custom RPM stage should be implemented. Let's call that stage ContentUnassociationRetainN (as a working name).
The ContentUnassociationRetainN stage works in place of the ContentUnassocaition in core and runs with similar assumptions. Specifically it receives queryset objects not DeclarativeContent objects like earlier stages in the pipeline. You can see those emitted from the stage before
The ContentUnassicationRetainN stage needs to further filter these unassociation querysets to filter out units that would be removed but shouldn't be due to their NEVRA. The content associated with the repository is already in place so between the queryset of items marked for removal and the content that is "known good" being outside of those querysets one should be able to compute the previous N versions somehow.
I think we can start with something inefficient but correct and improve it over time through profiling and the explain operator.
Katello Related Issue¶
Thank you for writing this down. General workflow is clear and straightforward, however the caveat that was brought up during the meeting consists exactly in the part of calculation of previous N versions. Any ideas on that?
This was not the caveat question that I had heard at the meeting. It is a good one though. Here's are some ideas. What do you think about these?
- slow option
Form a queryset that counts the number of packages per name. e.g. 'foo' v1, v1.2, v1.3 would yield a count of 3. Filter out any packages that are <= retain-old-count. These are the only package names that need additional consideration.
From there you could do the filtering in Python to determine which of the packages 'foo' needs removal. doing it in Python is probably too slow though.
- database speedups
To speed it up, create a postgreSQL trigger that pre-computes this version rank as an internal integer for each Package row as its inserted. Then Django filter to rank the results in a version-sorted order on this field. This would negate the need to Python-based filtering. Keep the first N (newest) and unassociate the remaining.
We have some triggers (different kinds but similar in concept) in pulp_ansible here for example: https://github.com/pulp/pulp_ansible/blob/master/pulp_ansible/app/migrations/0004_add_fulltext_search_indexes.py
Please register to edit this issue