Issue #858
closedAs a user, I would like to receive updated errata metadata
Description
Pulp currently does not update already-synced errata metadata. For example, if errata ABC-2014:100 is released, Pulp will sync it down once and then not check again for updates of that errata. This is not correct behavior since errata can be updated for a variety of reasons.
This story is for users to receive updates to already-synced errata. Pulp can check the 'updated' timestamp field to see if the existing unit is out of date. If so, Pulp will need to use the erratum from the updateinfo.xml instead of what's in the units collection. Note that we do not always want to overwrite the pkglist entirely. We may want to look at the 'shortname' on the individual elements in the pkglist to determine if we can overwrite just one set of packages. This is useful when an erratum is in multiple repos with different package lists per repo.
Deliverables:
- pulp_rpm changes to check for the 'updated' timestamp when importing errata
- release note for this feature
- additional zoo repos of the same erratum with different timestamps to allow demoing and QE testing of this feature
- testing that this change works correctly when the same erratum is in different repos (example: RHEL6 and 7)
Updated by mhrivnak about 9 years ago
- Platform Release set to 2.8.0
- Groomed set to No
- Sprint Candidate set to Yes
Updated by jortel@redhat.com almost 9 years ago
- Priority changed from Normal to High
- Platform Release deleted (
2.8.0)
Updated by jortel@redhat.com almost 9 years ago
- Tracker changed from Story to Issue
- Severity set to 2. Medium
- Platform Release set to 2.8.1
- Triaged set to No
Updated by ttereshc over 8 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ttereshc
Updated by ttereshc over 8 years ago
This issue turned out to be a tricky one.
Below are my observations, please, correct me, if I am wrong.
Assumptions:
- the erratum id does not change, when its metadata is updated
- the same erratum can be in different repos (example: RHEL6 and 7)
- any errata metadata can be changed
- if errata metadata is updated there is no guarantee that it will be updated simultaneously in both repos (RHEL6 and 7) or even in the same way.
Right now Pulp stores errata in the collection `units_erratum` and there is no information in database about neither the Pulp repo nor the feed related to each erratum.
If erratum id is the same in different repos but the pkglist name is different, Pulp updates pkglist list by adding a new pkglist to it (this change was introduced by this commit).
So we have one record for each erratum (with erratum_id as a unit_key) even if it is presented in different repositories.
So it looks like it is not safe to update errata metadata, if it could be different for different repositories (RHEL6 and 7).
Any thoughts?
Updated by semyers over 8 years ago
ttereshc wrote:
Assumptions:
- the erratum id does not change, when its metadata is updated
- the same erratum can be in different repos (example: RHEL6 and 7)
I believe these are definitely true.
- any errata metadata can be changed
- if errata metadata is updated there is no guarantee that it will be updated simultaneously in both repos (RHEL6 and 7) or even in the same way.
I don't know if these are true, but I think if assumptions are to be made, it's best to assume that they are. So, any errata metadata can be changed, and it is possible to update only the errata metadata in the rhel6 without making any changes to the rhel7 repo.
Right now Pulp stores errata in the collection `units_erratum` and there is no information in database about neither the Pulp repo nor the feed related to each erratum.
If erratum id is the same in different repos but the pkglist name is different, Pulp updates pkglist list by adding a new pkglist to it (this change was introduced by this commit).
To elaborate on this, because we don't keep track (that I know of) of which repository provides which packages, we don't have a reliable way of choosing which packages to remove from existing package lists. So, if the assumption above is true (any errata metadata can be changed), then that includes the package list. If this assumption is false, and specifically if the packagelist is guaranteed not to change for an errata, then this particular problem goes away.
So we have one record for each erratum (with erratum_id as a unit_key) even if it is presented in different repositories.
So it looks like it is not safe to update errata metadata, if it could be different for different repositories (RHEL6 and 7).Any thoughts?
Getting solid answers about the two uncertain assumptions is probably the best choice, which I think means we need the answers to these questions before we can know the best way to proceed:
- Can all errata metadata be changed, or only specific fields? If it's only specific fields, which fields can be changed?
- When errata metadata is changed, is it changed for all errata with that ID in all repos where it exists in updateinfo, or are changes potentially made in only a single repo (or subset of repos) containing that errata?
Orthogonally related thought:
This is drastic, but one way forward is to make the errata primary key be a composite key including the errata_id and repo_id. I have some ideas about how to make this work, but would prefer simpler options for a more immediate solution and plan a much-needed (in my opinion) errata refactor for a later release.
Updated by rbarlow over 8 years ago
On Tuesday, March 22, 2016 7:45:01 PM EDT you wrote:
one way forward is to make the errata primary key be a composite key
including the errata_id and repo_id.
FWIW, this is how I solved the Tag uniqueness problem in pulp_docker.
Updated by ttereshc over 8 years ago
Answers from jluza:
- Can any errata metadata be changed, or only specific fields? (in case
advisory id has not changed)
They can. I would say almost all fields that are filled by human, are potentially
vulnerable to mistakes so they should be changeable.
Can pkglist also be changed without changing the advisory id?
I suppose it shouldn't happen, but I guess it can.
- The same erratum can be in different repositories (for example, in
RHEL6 and RHEL7).
yes. But it should contain only packages that are in the repository.
Sou you can have multi product advisory, but for RHEL-6 repos, generated
advisories in updateinfo should contain only rhel-6 packages in packagelist.
When errata metadata is changed, is it changed for all errata with the
same id in all repositories?
yes
Updated by ttereshc over 8 years ago
So for now I am going to overwrite the relevant pkglist (based on its name) and the rest of the errata metadata. Does anyone think it is still not safe to update errata metadata?
Updated by semyers over 8 years ago
- Platform Release changed from 2.8.1 to 2.8.2
Updated by ttereshc over 8 years ago
New info (at least for me).
Currently the same erratum in different repos (RHEL6 and 7) may have the same name for pkglist (but packages will be different as expected).
That means fix for handling errata from different repositories by concatenating pkglist based on its name does not work :(
Updated by rbarlow over 8 years ago
Hello Tanya!
I think it might help if we change the uniqueness constraint on Errata to be a compound key of its name and the repo it is part of. Here's an example of how I solved a similar problem with the Tag model in pulp_docker:
https://github.com/pulp/pulp_docker/blob/a902085/plugins/pulp_docker/plugins/models.py#L253
If you follow this approach, it does introduce some other problems however. Now whenever Errata are "copied" between repositories, you actually need to create a new one that is like the old one, but has a different repo_id. I did that for the Tag here:
So my proposal isn't all roses and ponies, but it worked out OK for the Tag. Just a thought!
Updated by semyers over 8 years ago
- Platform Release changed from 2.8.2 to 2.8.3
Updated by semyers over 8 years ago
- Platform Release changed from 2.8.3 to 2.8.4
Updated by ttereshc over 8 years ago
- Status changed from ASSIGNED to POST
To handle the same errata in different repositories `_pulp_repo_id` is added to each pkglist collection in the database. Unit key is still errata_id, nothing changed in that regard.
- _pulp_repo_id may be added to the erratum pkglist collection or new erratum pkglist collection may be added only during sync or upload.
- During copy no modifications are made to the erratum unit in the database.
- Pkglists are not removed during repository removal, only erratum as a whole can be removed during orphan removal if no repository contains it.
- During publish duplicates and empty collections may appear in pkglists in the updateinfo file.
Errata metadata is updated based on the `updated` field. If the `updated` field is in the unknown format, the erratum won't be updated.
Added by ttereshc over 8 years ago
Updated by ttereshc over 8 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset a8b0d87ad8724ac58ac8a8fc1f5a92ae9dffa22a.
Updated by semyers over 8 years ago
- Platform Release changed from 2.8.4 to 2.8.5
Updated by semyers over 8 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Fix sync and upload of the same erratum
Handle pkglist for the same errata in different repositories. Update errata metadata based on
updated
field.closes #858 https://pulp.plan.io/issues/858