Maintenance. Planio will be undergoing a scheduled maintenance this weekend. Between Saturday, July 24 at 9:00 UTC and Sunday, July 25, 22:00 UTC your account might observe occasional downtimes which may last up to several minutes in some cases.
filelists and changelog metadata is not parsed properly - Pulp saves incorrect filelists and changelog metadata and generates incorrect metadata
- Foreman 2.5.2
- Katello 4.1.1
I am seeing filelists not being parsed correctly for Alma Linux repositories. Within Katello UI, I have checked a random sample of packages from all of the Alma repositories I have sync'd (18 in total, x86_64, sources and debug-x86_64 for each of baseos, appstream, highavailability, powertools, extras and devel), and none of them list any files. For comparison, CentOS 8 Stream RPMS do list files.
The generated filelist served up by katello contains each of the packages in the repo but with no package contents. For example, the upstream baseos filelist is 2MB whereas the generated filelist for the katello-hosted repo is just 105KB and looks like the below:
<?xml version="1.0" encoding="UTF-8"?> <filelists xmlns="http://linux.duke.edu/metadata/filelists" packages="1985"> <package pkgid="59c2172d6e423d8adc3248c1983146492471678afccdfae8ab0f66e18a1aaaa5" name="kernel-debug" arch="x86_64"> <version epoch="0" ver="4.18.0" rel="305.el8"/> </package> <package pkgid="c8120b541261d0ad425f369b1da5eef0aaad5883760ad0e149137027d102207c" name="avahi" arch="i686"> <version epoch="0" ver="0.7" rel="20.el8"/> </package>
This is breaking system installs which use the local mirrored content, because packages included in the kickstart base package set have dependencies specified using files, and these are all unsatisfied due to the empty filelists. I have triggered resyncs from upstream mirrors without any change in behaviour.
May be related to similar-looking issue #8955 ?
What must be done with the packages which are already synced into the database with the incorrect/missing metadata? If I understand the PR correctly, there are new functions allowing to validate the data in the database with the data in source repository. So the Katello people would have to implement calls to these new functions to allow the broken information in the Katello/pulp3 installation to be fixed?
@gvde Most likely we will just write a script that will do an in-place repair - it hasn't been decided yet whether Katello would do anything to run it automatically. My assumption is - probably not.
We'll try to get 3.14.0 released by the end of the day (Thursday), which should let Katello package it by the end of the week. And then once that's out we'll evaluate the best way to address any existing issues.
Just to provide a bit of context on what happened, this is the same bug from 3.13.0, except the "fix" apparently didn't work on Python 3.6, which is what most Katello users are using.
Basically the fix relied on a feature that wasn't clearly documented as being exclusive to Python 3.8+, and the development was done on Python 3.8 so it didn't show up, and it doesn't return any kind of error, it just silently did the wrong thing.
The CI does run on Python 3.6, so it would have caught it... if our tests covered it, which they did not unfortunately. They do now.
So this is a really unfortunate sequence of events, but we're going to make sure that A) there's guidance for addressing it, hopefully without too much trouble, and B) making sure it doesn't happen again.
Thanks for the information.
But for the future: I guess the metadata won't be fully checked with the metadata in the database during each sync. So in general, it would be probably a good idea to have a check and/or repair function for this. The check could run in a katello cron job every week (like the remove orphans job) and find inconsistencies and send out a notification, if there are any. And the advanced sync repository options could include a metadata repair option to make the corrections.
At least as far as having a common place to put repair operations like this, I agree. Pulpcore has repair functionality which will correct missing or corrupted files in storage, or notify if they can't be redownloaded - and we could probably add plugin hooks for additional checks.
I don't know that parsing the metadata a second time during the repair is realistic because it would need to be a completely different parser in order to come up with a different result than the sync originally had, and that is not really an option. But we do now have a few tests to verify the metadata against the original metadata and we will try to keep expanding that coverage.
Please register to edit this issue