Issue #494
closedbetter handle repositories with duplicate NVREAs
Description
Description of problem:
Currently users can easily get themselves into a situation where their pulp repository is unusable. If they are syncing a repository that replaces an old package with a new one with the same NVREA pulp will gladly sync the 2nd after syncing the first.
As a result pulp publishes yum metadata with two packages listed but since they only have the same filename, only one package actually makes it to the file system. Then a yum client comes along and tries to install/update that package it will pick on of the entries from the yum meatadata to use and there's a 50/50 chance of it being the wrong metadata and so checksum verification will fail.
I would expect that pulp should not generate metadata like this in the case that there are packages with the same nvrea in the repo. Alternatively the 2nd package should fail to sync/import into the repo.
This is a common problem and has occurred across many different upstream repos.
Version-Release number of selected component (if applicable):
2.4.0-1
Steps to Reproduce:
1. Create two rpms with the same nvrea
2. Upload them or sync them to a single repo
3. Publish the repo
4. Attempt to install that rpm from the repo
Actual results:
Client will throw an error as the checksum will not match
Expected results:
Only one package is in the primary.xml file and it matches whats actually on the file system.
Additional info:
+ This bug was cloned from Bugzilla Bug #1132659 +
More detail (added 4/27): Users will hit this issue if they sync a repo with unsigned RPMs, then resync and publish the repo once RPMs are signed.
Related issues
Updated by ipanova@redhat.com over 9 years ago
***** Bug 1098703 has been marked as a duplicate of this bug. ***
+ This comment was cloned from Bugzilla #1132659 comment 1 +
Updated by bcourt over 9 years ago
One way to fix this would be to add the unit key fields to the repo_content_units table so that an aggregation query could be used to check for duplicates.
Updated by cduryee over 9 years ago
Barnaby and I discussed this, it sounds like there are a few options:
- allow repos to be created with duplicate NEVRAs but raise an error when attempting to publish
- do not allow repos to be created with duplicate NEVRAs
The first option is something we can do today. The second option would require a lot more work since we currently don't have a way for a plugin to inject additional code into the association process. For example, if we wanted to perform a query to find an existing NEVRA for each unit during unit association, there's no way to do that today.
Additional functionality to support this could be added in Pulp 2.8 if needed. There may be alternatives such as using versioned repos which may fix this in a more general way.
For 2.7 we think checking for duplicate NEVRAs during publish and raising an error that contains a list of the duplicates is best. It would be better to not allow the user to get into this state to begin with but I think this is a good first step since it catches the issue in Pulp and not on the clients.
Updated by bmbouter over 9 years ago
cduryee wrote:
For 2.7 we think checking for duplicate NEVRAs during publish and raising an error that contains a list of the duplicates is best. It would be better to not allow the user to get into this state to begin with but I think this is a good first step since it catches the issue in Pulp and not on the clients.
Making this type of change in 2.7 would be good. Perhaps that is what this bug could be about?
Separate from that how can I learn more about the "duplicate NVREA problem" in general. I keep hearing about it, but I don't fully understand it.
Updated by cduryee over 9 years ago
The downstream bz has a bit more info, but the problem is basically if you create a repo somewhere and sync it down into pulp, then sign the rpms and sync/publish the repo, Pulp will attempt to publish both the signed and unsigned RPMs (which have the same filename but different checksums). The last one will win, which can cause issues since the rpm may no longer have the same checksum as what the repodata thinks it should have.
I've seen this firsthand on other systems but have not reproduced it myself.
Updated by fang64@gmail.com over 9 years ago
I can confirm this issue, we were using EPEL third party repository and it prevented our machines from retrieving the correct RPM version.
In the case of a person republishing a RPM of identical version but differing content. This caused machines in the field to fail to update those packages or retrieve them at all. This should be handled by matching the the RPM to the metadata in the remote repository, so the correct one is provided to clients of the pulp repository.
Otherwise this will end up being chaos when a user accidentally creates a new RPM without bumping the version then populates a custom repository or third party repository. We as users do not have control of upstream RPM repositories like EPEL, but it shouldn't prevent the clients connecting to pulp from being served the correct package that matches the metadata in the repository.
Updated by jsherril@redhat.com about 9 years ago
Just walked another user through fixing this
Updated by pgassmann about 9 years ago
jsherril@redhat.com wrote:
Just walked another user through fixing this
There are two repository options related to cleaning packages:
Repository Contents Behavior
--remove-missing - if "true", units that were previously in the external
feed but are no longer found will be removed from the
repository; defaults to false
--retain-old-count - count indicating how many non-latest versions of a unit
to keep in a repository
How are these two options related? How do they combine?
Shouldn't retain-old-count 0 (default) imply remove-missing?
What's the difference between retain-old-count 5 with and without remove-missing?
Updated by mhrivnak about 9 years ago
--retain-old count only considers how many versions of a package are currently in the pulp repo.
--remove-missing only considers whether a package currently exists in the remote repo.
To solve this, let's start by making the sync process remove a package from the pulp repo if a different package with the same nevra is retrieved from the remote feed.
Updated by mhrivnak about 9 years ago
I suggest adding a call before this line: https://github.com/pulp/pulp_rpm/blob/2.6-dev/plugins/pulp_rpm/plugins/importers/yum/listener.py#L85
that removes any units from the repo that have the same NEVRA.
Updated by mhrivnak about 9 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ipanova@redhat.com
Updated by ipanova@redhat.com about 9 years ago
- Status changed from ASSIGNED to POST
- Platform Release set to 2.6.5
Updated by ipanova@redhat.com about 9 years ago
- Platform Release changed from 2.6.5 to 2.6.6
Added by ipanova@redhat.com about 9 years ago
Updated by ipanova@redhat.com about 9 years ago
- Status changed from POST to MODIFIED
Updated by bmbouter almost 9 years ago
- Related to Issue #1406: Uploading the same Content Unit twice causes a 500 error added
Updated by bmbouter almost 9 years ago
- Related to Story #213: Add upload option to have an rpm/srpm/drpm/etc overwrite an existing one with the same unit_key fields (omitting checksums) added
Updated by rbarlow almost 9 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Handle repositories with duplicate NEVRA
closes#494 https://pulp.plan.io/issues/494
Units with duplicate nevra are removed from the repo in case of sync or upload.