Issue #2048
closed
Errata update failure during sync or upload
Status:
CLOSED - CURRENTRELEASE
Description
- Create repo where at least one erratum has no `updated` field or the `updated` field with the wrong datetime format.
- Sync it
- Do something to make the next sync operational (update importer, remove some unit)
- Sync it again and see the error
Task Failed
Could not parse errata `updated` field: expected format '%Y-%m-%d %H:%M:%S'.
Fail to update the existing erratum SOME_ERRATUM_ID.
Errata is not updated if the `updated` field can not be parsed or absent.
This behavior was introduced with this fix.
The malformed `updated` field or the absence of it is found in some RHEL and EPEL repositories/
There are two options to fix it:
1. "can't parse, so never update"
- that was the case for all errata before 2.8.5
- we avoid a potential inconsistency between the erratum pkglist and the rest of the erratum metadata
- but we do not give users an opportunity to have a recent erratum version if the `updated` field is wrong
2. "can't parse, so always update"
- we give an opportunity to update the erratum if the `updated` field is wrong
- there are potentially several scenarios when we can end up with wrong data in the erratum.
Bad scenario 1:
- create repo with some feed pointing to the repo with new errata version and bad `updated` field.
- sync it, now we have bad `updated` in the db.
- update repo with another feed which points to the repo with the same erratum but old version
- all the erratum metadata and package list will be overwritten
If we copied our repository before updating the feed, our update of the erratum in db will affect not only the repo we updated but also al the copied ones.
I think multiple copies of the repo is a common use case for our customers.
Bad scenario 2:
- create repo with some feed pointing to the repo with new errata version and bad `updated` field.
- sync it, now we have bad `updated` in the db.
- create another repo with some feed pointing to the repo with old errata version and bad `updated` field.
- sync it
- the erratum metadata will be overwritten, but package lists will be merged, so in db there will be old metadata and both old and new pkglist (the latter is correct)
- Sprint/Milestone set to 22
- Triaged changed from No to Yes
Neither of these are great options, but I think I prefer option 1 because in terms of Pulp's consistency it's good to "avoid a potential inconsistency between the erratum pkglist and the rest of the erratum metadata".
If an erratum contains bad data, we should open an issue against the CDN tooling so that the data quality problem can be fixed.
Option 2 leaves Pulp in an inconsistent state in several cases so we probably shouldn't do that one.
I agree. Brian brings up a great point that we should encourage content creators to clean up the data and make the updated field parsable. That would allow their content to get updated if we go with option 1.
- Status changed from ASSIGNED to POST
- Priority changed from High to Urgent
Based on the user impact we've seen from pulp-list traffic, we decided to hold 2.8.6 another half-day to get this included.
- Platform Release set to 2.8.6
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
- Has duplicate Issue #2070: Could not parse errata `updated` field: expected format '%Y-%m-%d %H:%M:%S'. added
- Status changed from MODIFIED to 5
The Upgrade automation jobs have been passing.
- Status changed from 5 to 6
Verified
Synced repos with errata pre upgrade to 2.8.6.
Verified that the same repo can be re synced after upgrade to 2.8.6
- Status changed from 6 to CLOSED - CURRENTRELEASE
- Related to Task #2083: Issues common to 2.9.1 and 2.8 stream added
- Sprint/Milestone deleted (
22)
Also available in: Atom
PDF
Make the parsing of the erratum
updated
field more tolerantcloses #2048 https://pulp.plan.io/issues/2048