Project

Profile

Help

Maintenance. Planio will be undergoing a scheduled maintenance this weekend. Between Saturday, July 24 at 9:00 UTC and Sunday, July 25, 22:00 UTC your account might observe occasional downtimes which may last up to several minutes in some cases.

Issue #9107

filelists and changelog metadata is not parsed properly - Pulp saves incorrect filelists and changelog metadata and generates incorrect metadata

Added by optiz0r 6 days ago. Updated about 4 hours ago.

Status:
MODIFIED
Priority:
Urgent
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
4. Urgent
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 101
Quarter:

Description

Versions:

  • Foreman 2.5.2
  • Katello 4.1.1
  • pulp-rpm-3.13.3

Upstream: http://lon.mirror.rackspace.com/almalinux/8/BaseOS/x86_64/os/repodata/.

I am seeing filelists not being parsed correctly for Alma Linux repositories. Within Katello UI, I have checked a random sample of packages from all of the Alma repositories I have sync'd (18 in total, x86_64, sources and debug-x86_64 for each of baseos, appstream, highavailability, powertools, extras and devel), and none of them list any files. For comparison, CentOS 8 Stream RPMS do list files.

The generated filelist served up by katello contains each of the packages in the repo but with no package contents. For example, the upstream baseos filelist is 2MB whereas the generated filelist for the katello-hosted repo is just 105KB and looks like the below:

<?xml version="1.0" encoding="UTF-8"?>
<filelists xmlns="http://linux.duke.edu/metadata/filelists" packages="1985">
<package pkgid="59c2172d6e423d8adc3248c1983146492471678afccdfae8ab0f66e18a1aaaa5" name="kernel-debug" arch="x86_64">
  <version epoch="0" ver="4.18.0" rel="305.el8"/>
</package>
<package pkgid="c8120b541261d0ad425f369b1da5eef0aaad5883760ad0e149137027d102207c" name="avahi" arch="i686">
  <version epoch="0" ver="0.7" rel="20.el8"/>
</package>

This is breaking system installs which use the local mirrored content, because packages included in the kickstart base package set have dependencies specified using files, and these are all unsatisfied due to the empty filelists. I have triggered resyncs from upstream mirrors without any change in behaviour.

May be related to similar-looking issue #8955 ?


Related issues

Related to RPM Support - Test #8972: Add test for properly reading and writing the metadataMODIFIED<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision 552b0b23 View on GitHub
Added by dalley about 4 hours ago

Fix unreliable filelist parsing

.findall() is not reliable in the face of namespaces. Don't use it anymore.

closes: #9107 https://pulp.plan.io/issues/9107 closes: #8972 https://pulp.plan.io/issues/8972

History

#1 Updated by dalley 5 days ago

  • Assignee set to dalley
  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High
  • Triaged changed from No to Yes

Reproduced

#2 Updated by dalley 5 days ago

  • Sprint set to Sprint 101

#3 Updated by dalley 5 days ago

  • Sprint/Milestone set to 3.14.0

#4 Updated by dalley 5 days ago

  • Related to Test #8972: Add test for properly reading and writing the metadata added

#5 Updated by pulpbot 5 days ago

  • Status changed from NEW to POST

#6 Updated by dalley 3 days ago

  • Subject changed from AlmaLinux filelists not parsed, generates empty filelists to filelists metadata is not parsed properly, Pulp generates empty filelists
  • Priority changed from High to Urgent
  • Severity changed from 3. High to 4. Urgent

#7 Updated by dalley 3 days ago

  • Subject changed from filelists metadata is not parsed properly, Pulp generates empty filelists to filelists and changelog metadata is not parsed properly - Pulp saves incorrect filelists and changelog metadata and generates incorrect metadata

#9 Updated by gvde 1 day ago

What must be done with the packages which are already synced into the database with the incorrect/missing metadata? If I understand the PR correctly, there are new functions allowing to validate the data in the database with the data in source repository. So the Katello people would have to implement calls to these new functions to allow the broken information in the Katello/pulp3 installation to be fixed?

#10 Updated by dalley 1 day ago

@gvde Most likely we will just write a script that will do an in-place repair - it hasn't been decided yet whether Katello would do anything to run it automatically. My assumption is - probably not.

We'll try to get 3.14.0 released by the end of the day (Thursday), which should let Katello package it by the end of the week. And then once that's out we'll evaluate the best way to address any existing issues.

#11 Updated by dalley 1 day ago

Just to provide a bit of context on what happened, this is the same bug from 3.13.0, except the "fix" apparently didn't work on Python 3.6, which is what most Katello users are using.

Basically the fix relied on a feature that wasn't clearly documented as being exclusive to Python 3.8+, and the development was done on Python 3.8 so it didn't show up, and it doesn't return any kind of error, it just silently did the wrong thing.

The CI does run on Python 3.6, so it would have caught it... if our tests covered it, which they did not unfortunately. They do now.

So this is a really unfortunate sequence of events, but we're going to make sure that A) there's guidance for addressing it, hopefully without too much trouble, and B) making sure it doesn't happen again.

#12 Updated by gvde 1 day ago

Thanks for the information.

But for the future: I guess the metadata won't be fully checked with the metadata in the database during each sync. So in general, it would be probably a good idea to have a check and/or repair function for this. The check could run in a katello cron job every week (like the remove orphans job) and find inconsistencies and send out a notification, if there are any. And the advanced sync repository options could include a metadata repair option to make the corrections.

#13 Updated by dalley 1 day ago

At least as far as having a common place to put repair operations like this, I agree. Pulpcore has repair functionality which will correct missing or corrupted files in storage, or notify if they can't be redownloaded - and we could probably add plugin hooks for additional checks.

I don't know that parsing the metadata a second time during the repair is realistic because it would need to be a completely different parser in order to come up with a different result than the sync originally had, and that is not really an option. But we do now have a few tests to verify the metadata against the original metadata and we will try to keep expanding that coverage.

#14 Updated by dalley about 4 hours ago

  • Status changed from POST to MODIFIED

Please register to edit this issue

Also available in: Atom PDF