Project

Profile

Help

Issue #494

closed

better handle repositories with duplicate NVREAs

Added by jsherril@redhat.com about 9 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.4.2
Platform Release:
2.6.6
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Description of problem:

Currently users can easily get themselves into a situation where their pulp repository is unusable. If they are syncing a repository that replaces an old package with a new one with the same NVREA pulp will gladly sync the 2nd after syncing the first.

As a result pulp publishes yum metadata with two packages listed but since they only have the same filename, only one package actually makes it to the file system. Then a yum client comes along and tries to install/update that package it will pick on of the entries from the yum meatadata to use and there's a 50/50 chance of it being the wrong metadata and so checksum verification will fail.

I would expect that pulp should not generate metadata like this in the case that there are packages with the same nvrea in the repo. Alternatively the 2nd package should fail to sync/import into the repo.

This is a common problem and has occurred across many different upstream repos.

Version-Release number of selected component (if applicable):
2.4.0-1

Steps to Reproduce:
1. Create two rpms with the same nvrea
2. Upload them or sync them to a single repo
3. Publish the repo
4. Attempt to install that rpm from the repo

Actual results:
Client will throw an error as the checksum will not match

Expected results:
Only one package is in the primary.xml file and it matches whats actually on the file system.

Additional info:

+ This bug was cloned from Bugzilla Bug #1132659 +

More detail (added 4/27): Users will hit this issue if they sync a repo with unsigned RPMs, then resync and publish the repo once RPMs are signed.


Related issues

Related to RPM Support - Issue #1406: Uploading the same Content Unit twice causes a 500 errorCLOSED - CURRENTRELEASEdkliban@redhat.comActions
Related to RPM Support - Story #213: Add upload option to have an rpm/srpm/drpm/etc overwrite an existing one with the same unit_key fields (omitting checksums)CLOSED - WONTFIX

Actions
Actions #1

Updated by ipanova@redhat.com about 9 years ago

***** Bug 1098703 has been marked as a duplicate of this bug. ***

+ This comment was cloned from Bugzilla #1132659 comment 1 +

Actions #2

Updated by bcourt about 9 years ago

  • Assignee set to bcourt
Actions #3

Updated by bcourt about 9 years ago

  • Assignee deleted (bcourt)
Actions #4

Updated by bcourt about 9 years ago

One way to fix this would be to add the unit key fields to the repo_content_units table so that an aggregation query could be used to check for duplicates.

Actions #6

Updated by bmbouter about 9 years ago

  • Severity changed from Medium to 2. Medium
Actions #7

Updated by cduryee almost 9 years ago

  • Description updated (diff)
Actions #8

Updated by cduryee almost 9 years ago

Barnaby and I discussed this, it sounds like there are a few options:

  • allow repos to be created with duplicate NEVRAs but raise an error when attempting to publish
  • do not allow repos to be created with duplicate NEVRAs

The first option is something we can do today. The second option would require a lot more work since we currently don't have a way for a plugin to inject additional code into the association process. For example, if we wanted to perform a query to find an existing NEVRA for each unit during unit association, there's no way to do that today.

Additional functionality to support this could be added in Pulp 2.8 if needed. There may be alternatives such as using versioned repos which may fix this in a more general way.

For 2.7 we think checking for duplicate NEVRAs during publish and raising an error that contains a list of the duplicates is best. It would be better to not allow the user to get into this state to begin with but I think this is a good first step since it catches the issue in Pulp and not on the clients.

Actions #9

Updated by bmbouter almost 9 years ago

cduryee wrote:

For 2.7 we think checking for duplicate NEVRAs during publish and raising an error that contains a list of the duplicates is best. It would be better to not allow the user to get into this state to begin with but I think this is a good first step since it catches the issue in Pulp and not on the clients.

Making this type of change in 2.7 would be good. Perhaps that is what this bug could be about?

Separate from that how can I learn more about the "duplicate NVREA problem" in general. I keep hearing about it, but I don't fully understand it.

Actions #10

Updated by cduryee almost 9 years ago

The downstream bz has a bit more info, but the problem is basically if you create a repo somewhere and sync it down into pulp, then sign the rpms and sync/publish the repo, Pulp will attempt to publish both the signed and unsigned RPMs (which have the same filename but different checksums). The last one will win, which can cause issues since the rpm may no longer have the same checksum as what the repodata thinks it should have.

I've seen this firsthand on other systems but have not reproduced it myself.

Actions #11

Updated by fang64@gmail.com almost 9 years ago

I can confirm this issue, we were using EPEL third party repository and it prevented our machines from retrieving the correct RPM version.

In the case of a person republishing a RPM of identical version but differing content. This caused machines in the field to fail to update those packages or retrieve them at all. This should be handled by matching the the RPM to the metadata in the remote repository, so the correct one is provided to clients of the pulp repository.

Otherwise this will end up being chaos when a user accidentally creates a new RPM without bumping the version then populates a custom repository or third party repository. We as users do not have control of upstream RPM repositories like EPEL, but it shouldn't prevent the clients connecting to pulp from being served the correct package that matches the metadata in the repository.

Actions #12

Updated by jsherril@redhat.com over 8 years ago

Just walked another user through fixing this

Actions #13

Updated by pgassmann over 8 years ago

wrote:

Just walked another user through fixing this

There are two repository options related to cleaning packages:

Repository Contents Behavior
--remove-missing - if "true", units that were previously in the external
feed but are no longer found will be removed from the
repository; defaults to false
--retain-old-count - count indicating how many non-latest versions of a unit
to keep in a repository

How are these two options related? How do they combine?
Shouldn't retain-old-count 0 (default) imply remove-missing?
What's the difference between retain-old-count 5 with and without remove-missing?

Actions #14

Updated by mhrivnak over 8 years ago

--retain-old count only considers how many versions of a package are currently in the pulp repo.

--remove-missing only considers whether a package currently exists in the remote repo.

To solve this, let's start by making the sync process remove a package from the pulp repo if a different package with the same nevra is retrieved from the remote feed.

Actions #15

Updated by mhrivnak over 8 years ago

I suggest adding a call before this line: https://github.com/pulp/pulp_rpm/blob/2.6-dev/plugins/pulp_rpm/plugins/importers/yum/listener.py#L85

that removes any units from the repo that have the same NEVRA.

Actions #17

Updated by mhrivnak over 8 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ipanova@redhat.com
Actions #18

Updated by ipanova@redhat.com over 8 years ago

  • Status changed from ASSIGNED to POST
  • Platform Release set to 2.6.5
Actions #19

Updated by ipanova@redhat.com over 8 years ago

  • Platform Release changed from 2.6.5 to 2.6.6

Added by ipanova@redhat.com over 8 years ago

Revision ba84bd0b | View on GitHub

Handle repositories with duplicate NEVRA

closes#494 https://pulp.plan.io/issues/494

Units with duplicate nevra are removed from the repo in case of sync or upload.

Actions #20

Updated by ipanova@redhat.com over 8 years ago

  • Status changed from POST to MODIFIED
Actions #21

Updated by bmbouter over 8 years ago

  • Related to Issue #1406: Uploading the same Content Unit twice causes a 500 error added
Actions #22

Updated by bmbouter over 8 years ago

  • Related to Story #213: Add upload option to have an rpm/srpm/drpm/etc overwrite an existing one with the same unit_key fields (omitting checksums) added
Actions #23

Updated by rbarlow about 8 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #26

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF