Story #206

Updated by bmbouter about 6 years ago

There +++ This bug was initially created as a clone of "Bugzilla Bug #1004001": +++

Description of problem:

+++ This bug was initially created as a clone of Bug #1003999 +++

Filtering in pulp currently only effects published Content Views but there
is no way currently to apply a whitelist filtering algorithm to a yum repository feed.

There are often cases where repositories contain large sets of packages that are never used when (architectures, subdirectories, etc), eg:

where the user may
only want a small, specific set are actually desired. certain subset of the large repo.

The user customer should have the ability to specify on the repo a whitelist set of package names filters which prevent packages from being synced thus saving disk space and sync time.

--- Additional comment from at 10/22/2013 13:53:04 ---

This will done using a new importer option named 'packages_names' which will contain a comma separated list of package names which will be synced. This Another example is designed to mimic Oracle - the way public repos for Oracle Linux include source RPMs in the "pulp_python packages_names importer option": works. same directory as the binary/noarch RPMs, e.g.

When I synced this repo, over half of the space consumed was from the source RPMs:

$ sudo du -Lks /var/lib/pulp/published/http/repos/ol5/x86_64/
7759976 /var/lib/pulp/published/http/repos/ol5/x86_64/
$ sudo du -Lks /var/lib/pulp/published/http/repos/ol5/x86_64/*.src.rpm | awk '{SUM+=$1} END{print SUM}'

--- Additional comment from at 10/22/2013 14:50:02 ---

The total reported other use that I would like to see is for downloading will also need cloning local repos - i.e. "syncing" from one repo to another in order to create a "promote to production" process.

One model developed using Pulp v1 is described in a Usenix paper from 2011 [1]. This would enable "less risky" (i.e. the majority of) packages from an upstream distributor to
be correct based on promoted automatically to the intersection internal production repositories, but packages that require additional testing (e.g. kernel, or applications like mysql or httpd) can be filtered from the automatic sync. That way, the majority of updates can be pushed automatically to clients in a timely manner (or pulled via a standard "yum update"), while reducing the risk of introducting unexpected issues. This model enables target package metadata and sets to be managed in one place, in the whitelist. repository itself, rather than through excludes on each individual client. For example, you might have 3 repositories:

The same metadata "Live" = repo synced daily from upstream will distributor
"Unstable" = repo synced daily from "Live", excluding $risky_pkgs
"Stable" = repo synced daily from "Unstable", excluding $risky_pkgs

$risky_pkgs would
be published, manually promoted from Live -> Unstable -> Stable, for example weekly.

Similarly, different teams might choose to set different policies (maybe they want all kernel updates as soon as they are available,
but only the they want to make sure that Python gets tested with their custom app first), so their $risky_pkgs_teamA list would be different. It enables all teams within an organization to set their own policies while still inheriting certain organization-wide policy (e.g. that packages in must be at least a day old before being installed anywhere).

It is not clear to me why
the metadata features of "cloning" and "sync filters" were removed between Pulp v1 and v2.


--- Additional comment from at 10/22/2013 15:21:47 ---

on the whitelist will be downloaded above use cases, I'd like to see sync filters for at least the following criteria:
- content type (e.g. RPM, source RPM)
- architecture (e.g. x86_64)
- package name match (e.g. 'kernel*' or 'kmod*')
- date package was added to the repo (e.g. "before 20131001" or "before -7days")


--- Additional comment from at 12/02/2014 16:20:12 ---

*** Bug 1157857 has been marked as a duplicate of this bug. ***

--- Additional comment from at 12/16/2014 12:18:38 ---

Can we make a point on this
and published. decide how we want to implement this?
I'd like to see this in Pulp since in my case is a "blocking" feature

Should we follow the old v1 approach? (Create a filter, link a filter to a repo)