Story #1769
Updated by jcline@redhat.com over 8 years ago
h2. Overview
Our RPM plugin claims that it syncs "distributions", but much of the upstream content is not mirrored. We are failing to handle what I am going to call "addon repositories" correctly for at least RHEL6 and RHEL7 (I have not confirmed others, but I expect this is across the board). The problem is that there are additional repositories within a kickstart-able repository and we fail to sync those repositories.
To witness this:
# Sync https://cdn.redhat.com/content/dist/rhel/server/6/6Server/x86_64/kickstart/ (or a some mirror thereof)
# Inspect the published repository. Note that /content/dist/rhel/server/6/6Server/x86_64/kickstart/{LoadBalancer,HighAvailability,ResilientStorage,ScalableFileSystem}/ contain nothing except upstream version of the repomd.xml (which we shouldn't be syncing in any case).
# Inspect the database for one or more of the packages that are in the upstream repository's version of those directories. Note they are not present.
# To see the impact of this, try to kickstart from the Pulp repository and when it comes time to select additional repositories, try to select one of the missing ones (Load Balancer, for example). It will fail.
This story is about getting _our_ story straight when it comes to distribution trees. The first and most important task for this story is making sure we completely understand distribution trees and how to work with them.
Distribution trees seem to (these days) be generated by a tool called pungi[0] (which uses tools like lorax[1]). Red Hat's release engineering team (RCM) has extensive documentation[2] on treeinfo files (which seem to describe a distribution tree) so it would be wise to talk to them to make sure we understand what a distribution tree is and how it will vary from release to release (both in the past and any upcoming changes). It's also worth noting both lorax and RCM's tooling is written in Python and creates/parses the treeinfo files.
Once we understand the role of the treeinfo file and what content falls into the distribution category, we should come up with a solution to handle past and present formats of distribution trees. One solution that has already been attempted is the PULP_DISTRIBUTION.xml, which was a way to have us sync content that's not referenced by a treeinfo file _or_ the repodata. We need to determine whether having such a file is something we require (and it may be required to support older versions of distribution trees only) and if that is the case, we need to model the metadata file and those files it references. If we do choose to have such a file, we should spend some time designing it to make sure it fits our needs (contains validation data, has a well defined schema, has a well defined purpose, etc). We should also make sure it is not a burden on the user (currently RCM has to generate the PULP_DISTRIBUTION.xml manually).
The current version of the treeinfo file may well provide us with everything we need to sync a distribution tree. If that is the case, we should use it.
h2. Files Missing from treeinfo/repodata
This is a list of files I've noticed in various repositories that do not appear to be referenced by either the treeinfo file or the repodata files.
* License, EULA, and README text files
* GPG signing keys
* Release notes (either in a directory or in the root of the repository)
* `isolinux` and `EFI` directories that appear to contain boot images (in addition to the `images` and `LiveOS` directories that are mentioned in the treeinfo file).
* A few images in the images/ directory
[0] https://pagure.io/pungi
[1] https://github.com/rhinstaller/lorax/
[2] https://release-engineering.github.io/productmd/