Project

Profile

Help

Issue #2560

closed

Non-unique collection names in erratum pkglists causes yum not to show all packages

Added by ttereshc about 7 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.12.2
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 16
Quarter:

Description

Yum relies on the following:

  • collection name in pkglist is unique to the certain erratum across all the repos
  • in case of the same collection names for the certain erratum in multiple repos, the first one is chosen.

To reproduce the issue:

  1. create repo A1 with packages P1, P2 and erratum E and packages P1 and P2 in its pkglist/collection.
  2. create repo A2 which is a copy of repo A1.
  3. remove P1 from repo A1
  4. remove P2 from repo A2
  5. publish both repos
  6. enable both repos for your yum client
  7. run `yum updateinfo list` (assuming you have old versions of P1 and P2 installed)
  8. observe only one package (P1 or P2, it depends which repo was "the first") listed for update

Potential solutions:

  • make collection names unique across repos, maybe by adding repo_id to its name
  • always publish the full pkglist for erratum even if packages are not available in the repo
  • something else?

Related issues

Related to RPM Support - Issue #2623: Errata publish performace degradation CLOSED - CURRENTRELEASEsemyersActions
Actions #2

Updated by bizhang about 7 years ago

  • Sprint/Milestone set to 32
  • Triaged changed from No to Yes
Actions #4

Updated by mhrivnak about 7 years ago

  • Sprint/Milestone changed from 32 to 33
Actions #5

Updated by semyers about 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to semyers
Actions #6

Updated by semyers about 7 years ago

ttereshc wrote:

  • make collection names unique across repos, maybe by adding repo_id to its name

I think this is the best solution, and is something we should be able to do at the distribution step. I also plan to add some protection to prevent re-adding the string we use to make collection names unique in the event a pulp is being synced from another pulp; depending how far down a user goes down that rabbit hole, this protection would prevent, e.g. "collection-name-<repo-id>-<repo-id>-<repo-id>-<repo-id>"

Actions #7

Updated by semyers about 7 years ago

I've been trying to get at this for a while, and after getting my head back into errata-land, peeking at dnf's source to see how it works, etc, put together a real reproducer:

# assumed running in vagrant env, with zoo already added
# serve-http for ssl laziness
$ pulp-admin rpm repo update --repo-id zoo --relative-url zoo --serve-http true --feed https://repos.fedorapeople.org/repos/pulp/pulp/fixtures/rpm/
$ pulp-admin rpm repo create --repo-id zoo2 --relative-url zoo2 --serve-http true --feed https://repos.fedorapeople.org/repos/pulp/pulp/fixtures/rpm/
$ for repo in zoo zoo2; do pulp-admin rpm repo sync run --repo-id $repo; done
# remove a different package in the "Sea_Errata" from each zoo repo
$ pulp-admin rpm repo remove rpm --repo-id zoo --str-eq="name=penguin"
$ pulp-admin rpm repo remove rpm --repo-id zoo2 --str-eq="name=shark"
# republish to ensure removed packages are (not...) represented in repo
$ for repo in zoo zoo2; do pulp-admin rpm repo publish run --repo-id $repo; done
# for fun, check the repoview for zoo and zoo2 to manually verify packages removed
# while you're poking around in there, check to make sure the removed packages are
# also missing from Sea_Erratum in the updateinfo file (they should be)
# create and install penguin and shark 0.0-0, I followed the instructions here:
# http://www.redhat.com/archives/rpm-list/2006-April/msg00015.html
# Set name to penguin or shark, version to 0.0, release to 0
$ sudo dnf -y install /home/vagrant/rpmbuild/RPMS/noarch/*.rpm
# now...verify. Since shark 0 and penging 0 are installed, there are updates available in
# Sea_Erratum, id RHEA-2012:0055 in current fixtures. Add the synced repos and check errata
$ sudo dnf config-manager --add-repo http://localhost/pulp/repos/zoo --add-repo http://localhost/pulp/repos/zoo2
$ dnf updateinfo list
# see shark and penguin updates:
RHEA-2012:0055 security penguin-0.9.1-1.noarch
RHEA-2012:0055 security shark-0.1-1.noarch

So, at least when using our fixtures repo, I'm unable to reproduce this on the latest fedora, indicating to me that it is perhaps a bug in yum/dnf on older platforms. @tteresch, is my reproducer bad and I'm missing something, or is this possibly NOTABUG?

I'm happy to reproduce this again with fedora or rhel repos if desired.

Edit: Forgot to mention that these tests were done with recent (as of yesterday) tips of 2.12-dev on pulp and pulp_rpm

Actions #8

Updated by ttereshc about 7 years ago

semyers, my understanding of the issue matches pretty well with the way you tried to reproduce it.

I would try to reproduce it on a latest RHEL system since it has yum only and no dnf.
AFAIK, dnf has a codebase different from the yum's one and their logic and handling of various things can be different (including their plugins like updateinfo), that's my assumption at least.

If RHEL test will show no issue as well, maybe one more thing worth checking - the reporter in BZ said that the issue was reproducible always and he specified two repositories which we can try to sync and check (you can't use existing fixtures for that, but you can create locally fixtures you need, though I think it is easier just to use those from BZ).
This is mostly to prove that we reproduced everything correctly and only empty pkglists are messing up with yum's merge logic.

Actions #9

Updated by semyers about 7 years ago

Yeah, that's one of two things I've thought of to dig into a little deeper:

  1. Use yum on el7 instead of dnf on recent fedora
  2. Update fixtures to make our updateinfos look and act more like "official" ones (so all pkglists have a "short" name, as one example)

dnf is a backward-incompatible fork of yum, and still carries a lot of the same code, but also seems to contain bug fixes that haven't made their way back to yum, such as the ability to correctly merge errata provided by different repos.

So, like you've recommended, I'll work with yum on el7 using the repos from the bz and see how things go.

Actions #10

Updated by semyers about 7 years ago

(Sorta) good news, I'm able to reproduce this on el6 using the steps above, which means our fixtures do replicate the problem as described in the BZ bug referenced in the previous comment.

$ sudo yum updateinfo list|grep 0055
RHEA-2012:0055 security       shark-0.1-1.noarch

So, to test the rename solution, I updated the errata to publish with different collection names:
c.update({'errata_id': 'RHEA-2012:0055'}, {$set: {'pkglist.0.short': 'foo', 'pkglist.1.short': 'bar'}})

Then I republished, clean the yum metadata, and checked again:

$ sudo yum updateinfo list|grep 0055
RHEA-2012:0055 security       penguin-0.9.1-1.noarch
RHEA-2012:0055 security       shark-0.1-1.noarch

So indeed, changing the collection short name fixes this, and should be easy enough, as mentioned in comment #6. There's still a chance that this is a fixable yum bug, and I'm also curious about what happens in el7. I'll get those questions answered shortly if I can.

Actions #11

Updated by semyers about 7 years ago

I've reproduced this in el7 as well. I'm assuming that since this was fixed in dnf but not in yum that it was considered a backward-incompatibile change, and therefore not backported, which in turn means that means I'll be fixing it in pulp.

Actions #12

Updated by ttereshc about 7 years ago

+1 for your both last comments, semyers

Actions #13

Updated by semyers about 7 years ago

I just tested a fix where I only altered the collection names on publish, and this is still not enough. Pulp does all the right filtering, and ends up making two different package lists.

zoo:

    <pkglist>
      <collection short="zoo">
        <name>1</name>
        <package src="http://www.fedoraproject.org" name="walrus" epoch="0" version="5.21" release="1" arch="noarch">
          <filename>walrus-5.21-1.noarch.rpm</filename>
        </package>
        <package src="http://www.fedoraproject.org" name="shark" epoch="0" version="0.1" release="1" arch="noarch">
          <filename>shark-0.1-1.noarch.rpm</filename>
        </package>
      </collection>
    </pkglist>

zoo2:

    <pkglist>
      <collection short="zoo2">
        <name>1</name>
        <package src="http://www.fedoraproject.org" name="walrus" epoch="0" version="5.21" release="1" arch="noarch">
          <filename>walrus-5.21-1.noarch.rpm</filename>
        </package>
        <package src="http://www.fedoraproject.org" name="penguin" epoch="0" version="0.9.1" release="1" arch="noarch">
          <filename>penguin-0.9.1-1.noarch.rpm</filename>
        </package>
      </collection>
    </pkglist>

It looks like there was an issue with my testing in comment 10, and changing the collection's short name isn't good enough.

Actions #14

Updated by semyers about 7 years ago

semyers wrote:

It looks like there was an issue with my testing in comment 10, and changing the collection's short name isn't good enough.

I'm still unable to reproduce my results in comment 10, which leads me to believe I forgot a step in testing, like I forgot to 'sudo' the 'yum clean metadata' command, or I forgot to republish with --force-full after deleting a package from zoo/zoo2, or some other oversight.

Poking around at the yum source, it's the collection "name" element that matters, not the "short" attribute in the collection element (yay xml). My main problem was that I was focused on "collection name in pkglist" to mean "the 'short' attribute in the pkglist element used to name collections", not "the name element in individual collections that are found under a pkglist element". The difference is that the "short" attr of pkglists, as far as I can tell, is only used when writing updateinfo XML out; it's imported for the sole purpose of being exported, and never actually used. It's this bit of code that does the merging:

https://github.com/rpm-software-management/yum/blob/master/yum/update_md.py#L545-L553

Note that pkg['name'] in that code is actually the value of the "name" element inside a package collection inside a pkglist. You'd think this was maybe a package name, but you would be wrong. Or at least I was wrong.

tl;dr yum talks about "pkglist name" a lot, but the pkglist attr that looks like a name ("short") isn't the pkglist's name. The thing that matters to yum is a collection's <name> tag, which is obvious (now) when re-reading the issue description. pkglists can have multiple collections that don't have to have the same name, so the idea of a "pkglist name" is...ambiguous.

Time to make my short-name uniquifier work on that "name" element instead, and see how that gets yum working but (I assume) breaks dnf. :)

Fun aside: According to the pkglist dtd, "name" is a valid attribute on the collection tag, and a valid subelement. yum clearly uses the subelement, as does pulp.

Actions #15

Updated by semyers about 7 years ago

  • Status changed from ASSIGNED to POST

Sweet. After re-aiming the short name uniquifier at the collection name element, this happens on el6 and el7:

$ yum clean metadata; yum updateinfo list|grep 0055                                                                                                                                                      
RHEA-2012:0055 security penguin-0.9.1-1.noarch
RHEA-2012:0055 security shark-0.1-1.noarch

Here's a PR!

https://github.com/pulp/pulp_rpm/pull/1029

Test coverage for this bit was surprisingly good. I started writing some tests just for this, but found that I could get coverage for both branches (repo_id is or is not None) by adding one line each to two existing tests. Woot.

Actions #16

Updated by mhrivnak about 7 years ago

  • Sprint/Milestone changed from 33 to 34
Actions #17

Updated by semyers about 7 years ago

ttereshc posted a PR that conflicted with mine, and made some more work necessary. I've got all the tests passing again, so now I'm going through and double-checking docstrings an comments to make sure things aren't telling lies after my changes.

Actions #18

Updated by semyers about 7 years ago

I filed https://github.com/PulpQE/pulp-smash/issues/569 as an alternate take for testing that doesn't require using yum/dnf to test.

Added by semyers about 7 years ago

Revision 7705651b | View on GitHub

Errata collection names are unique per-repo

This ended up being a pain because of how much filtering work we were doing in the updateinfo XML context. I was able to make it work as-is, and just about had a PR up, but then Tanya posted this PR:

https://github.com/pulp/pulp_rpm/pull/1024

That PR opened up a possibility that I hasn't seriously considered, which was to move the unit/repo filtering logic out of the XML generation context. Now, the UpdatinfoXMLFileContext writes whatever you tell it to write in the add_unit_metadata step, and the filtering takes place in calling Publish steps. Since the PublishErrataStep knows what repo it's publishing (er, "distributing"), all of the business of trying to pull a repo_id off the conduit can go away.

Buried deep in all of this is the single line that actually fixes the bug, which is to set errata pkglist names to repo_id when publishing, as well as related tests to make sure that happens.

If desired, I can break out refactor from the collection name bugfix, and even open up a refactor issue in redmine going into more detail about why just making the change right in UpdateinfoXMLFileContext was not the right solution.

tl;dr we painted ourselves into a corner, Tanya gave us a way out, and I took it. :)

closes #2560 https://pulp.plan.io/issues/2560

Actions #19

Updated by semyers about 7 years ago

  • Status changed from POST to MODIFIED
Actions #20

Updated by semyers about 7 years ago

  • Platform Release set to 2.12.2
Actions #21

Updated by semyers about 7 years ago

  • Status changed from MODIFIED to POST

Tanya discovered that my fix introduced an unacceptable performance hit to publish errata, this is fixed in https://github.com/pulp/pulp_rpm/pull/1036

Actions #22

Updated by semyers about 7 years ago

  • Related to Issue #2623: Errata publish performace degradation added
Actions #23

Updated by semyers about 7 years ago

  • Status changed from POST to MODIFIED

I completely forgot that #2623 was created specifically for the performance regression, so I'm reverting this to its previous state and will update that issue appropriately.

Actions #25

Updated by semyers about 7 years ago

  • Status changed from MODIFIED to 5
Actions #26

Updated by pthomas@redhat.com about 7 years ago

verified

[root@bkr-hv01-guest11 ~]# pulp-admin rpm repo create --repo-id zoo --relative-url zoo --serve-http true --feed https://repos.fedorapeople.org/repos/pulp/pulp/fixtures/rpm/
Successfully created repository [zoo]

[root@bkr-hv01-guest11 ~]# pulp-admin rpm repo create --repo-id zoo2 --relative-url zoo2 --serve-http true --feed https://repos.fedorapeople.org/repos/pulp/pulp/fixtures/rpm/
Successfully created repository [zoo2]

[root@bkr-hv01-guest11 ~]# 
[root@bkr-hv01-guest11 ~]# 
[root@bkr-hv01-guest11 ~]# for repo in zoo zoo2; do pulp-admin rpm repo sync run --repo-id $repo; done
+----------------------------------------------------------------------+
                     Synchronizing Repository [zoo]
+----------------------------------------------------------------------+

This command may be exited via ctrl+c without affecting the request.

Downloading metadata...
[|]
... completed

Downloading repository content...
[==================================================] 100%
RPMs:       32/32 items
Delta RPMs: 0/0 items

... completed

Downloading distribution files...
[==================================================] 100%
Distributions: 0/0 items
... completed

Importing errata...
[-]
... completed

Importing package groups/categories...
[-]
... completed

Cleaning duplicate packages...
[\]
... completed

Task Succeeded

Initializing repo metadata
[-]
... completed

Publishing Distribution files
[-]
... completed

Publishing RPMs
[==================================================] 100%
32 of 32 items
... completed

Publishing Delta RPMs
... skipped

Publishing Errata
[==================================================] 100%
4 of 4 items
... completed

Publishing Comps file
[==================================================] 100%
4 of 4 items
... completed

Publishing Metadata.
[-]
... completed

Closing repo metadata
[-]
... completed

Generating sqlite files
... skipped

Generating HTML files
... skipped

Publishing files to web
[-]
... completed

Writing Listings File
[-]
... completed

Writing Listings File
[-]
... completed

Task Succeeded

+----------------------------------------------------------------------+
                    Synchronizing Repository [zoo2]
+----------------------------------------------------------------------+

This command may be exited via ctrl+c without affecting the request.

Downloading metadata...
[|]
... completed

Downloading repository content...
[-]
[==================================================] 100%
RPMs:       0/0 items
Delta RPMs: 0/0 items

... completed

Downloading distribution files...
[==================================================] 100%
Distributions: 0/0 items
... completed

Importing errata...
[-]
... completed

Importing package groups/categories...
[-]
... completed

Cleaning duplicate packages...
[\]
... completed

Task Succeeded

Initializing repo metadata
[-]
... completed

Publishing Distribution files
[-]
... completed

Publishing RPMs
[==================================================] 100%
32 of 32 items
... completed

Publishing Delta RPMs
... skipped

Publishing Errata
[==================================================] 100%
4 of 4 items
... completed

Publishing Comps file
[==================================================] 100%
4 of 4 items
... completed

Publishing Metadata.
[-]
... completed

Closing repo metadata
[-]
... completed

Generating sqlite files
... skipped

Generating HTML files
... skipped

Publishing files to web
[-]
... completed

Writing Listings File
[-]
... completed

Writing Listings File
[-]
... completed

Task Succeeded

[root@bkr-hv01-guest11 ~]# yum clean all
Loaded plugins: product-id, search-disabled-repos, subscription-manager
Cleaning repos: beaker-HighAvailability beaker-ResilientStorage beaker-Server
              : beaker-Server-NFV beaker-Server-NFV-debuginfo beaker-Server-RT
              : beaker-Server-RT-debuginfo beaker-Server-SAP beaker-Server-SAP-debuginfo
              : beaker-Server-SAPHANA beaker-Server-SAPHANA-debuginfo beaker-Server-debuginfo
              : beaker-Server-optional beaker-Server-optional-debuginfo beaker-harness
              : beaker-tasks epel pulp rhel-7-server-extras-rpms rhel-7-server-optional-rpms
              : rhel-7-server-rpms rhel-atomic-host-rpms zoo zoo2
Cleaning up everything
[root@bkr-hv01-guest11 ~]# pulp-admin rpm repo remove rpm --repo-id zoo --str-eq="name=penguin" 
This command may be exited via ctrl+c without affecting the request.

[\]
Running...

Units Removed:
  penguin-0.9.1-1-noarch

[root@bkr-hv01-guest11 ~]# pulp-admin rpm repo remove rpm --repo-id zoo2 --str-eq="name=shark" 
This command may be exited via ctrl+c without affecting the request.

[\]
Running...

Units Removed:
  shark-0.1-1-noarch

[root@bkr-hv01-guest11 ~]# for repo in zoo zoo2; do pulp-admin rpm repo publish run --force-full --repo-id $repo; done
+----------------------------------------------------------------------+
                      Publishing Repository [zoo]
+----------------------------------------------------------------------+

The following publish configuration options will be used:

Force Full:  True

This command may be exited via ctrl+c without affecting the request.

Initializing repo metadata
[-]
... completed

Publishing Distribution files
[-]
... completed

Publishing RPMs
[==================================================] 100%
31 of 31 items
... completed

Publishing Delta RPMs
... skipped

Publishing Errata
[==================================================] 100%
4 of 4 items
... completed

Publishing Comps file
[==================================================] 100%
4 of 4 items
... completed

Publishing Metadata.
[-]
... completed

Closing repo metadata
[-]
... completed

Generating sqlite files
... skipped

Generating HTML files
... skipped

Publishing files to web
[-]
... completed

Writing Listings File
[-]
... completed

Writing Listings File
[-]
... completed

Task Succeeded

+----------------------------------------------------------------------+
                      Publishing Repository [zoo2]
+----------------------------------------------------------------------+

The following publish configuration options will be used:

Force Full:  True

This command may be exited via ctrl+c without affecting the request.

Initializing repo metadata
[-]
... completed

Publishing Distribution files
[-]
... completed

Publishing RPMs
[==================================================] 100%
31 of 31 items
... completed

Publishing Delta RPMs
... skipped

Publishing Errata
[==================================================] 100%
4 of 4 items
... completed

Publishing Comps file
[==================================================] 100%
4 of 4 items
... completed

Publishing Metadata.
[-]
... completed

Closing repo metadata
[-]
... completed

Generating sqlite files
... skipped

Generating HTML files
... skipped

Publishing files to web
[-]
... completed

Writing Listings File
[-]
... completed

Writing Listings File
[-]
... completed

Task Succeeded

[root@bkr-hv01-guest11 ~]# yum updateinfo list |grep shark
RHEA-2012:0055 security       shark-0.1-1.noarch
[root@bkr-hv01-guest11 ~]# 
[root@bkr-hv01-guest11 ~]# yum updateinfo list |grep penguin
RHEA-2012:0055 security       penguin-0.9.1-1.noarch
[root@bkr-hv01-guest11 ~]# 
Actions #27

Updated by semyers about 7 years ago

We had a good team chat about the "Verification Required" flag on Monday, and decided that the release of 2.12.2 should not be blocked on the verification of this issue.

Actions #28

Updated by bizhang about 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #31

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 16
Actions #32

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (34)
Actions #33

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF