Project

Profile

Help

Issue #1843

closed

Pulp publishes invalid PULP_DISTRIBUTION.xml metadata

Added by jcline@redhat.com almost 8 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.8.0
Platform Release:
2.8.3
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 1
Quarter:

Description

If a repository contains a PULP_DISTRIBUTION.xml metadata file, it is possible for Pulp to re-publish it with invalid data. This causes a second Pulp server syncing from the first to fail. Specifically, files are referenced in the PULP_DISTRIBUTION.xml file that do no exist in the version published by Pulp[0] (but do exist upstream).

For example, the RHEL6[2] kickstart repository contains a PULP_DISTRIBUTION.xml file that references `repodata/productid`. During sync this is downloaded along with the XML file, but when the repository is published, it is explicitly skipped.

Ultimately, this occurs because Pulp blindly syncs and publishes this PULP_DISTRIBUTION.xml file[1] while filtering content retrieved using it.

To fix this, we should be generating/altering the PULP_DISTRIBUTION.xml file we publish to ensure we don't create invalid metadata. However, a bigger question is whether or not filtering content[0] is even appropriate. I suspect it is not. This issue is not meant to address that problem, though.

[0] https://github.com/pulp/pulp_rpm/blob/pulp-rpm-2.8.2-1/plugins/pulp_rpm/plugins/distributors/yum/publish.py#L796-L797
[1] https://github.com/pulp/pulp_rpm/blob/pulp-rpm-2.8.2-1/plugins/pulp_rpm/plugins/importers/yum/parse/treeinfo.py#L437-L441
[2] https://cdn.redhat.com/content/dist/rhel/server/6/6Server/x86_64/kickstart/

Actions #1

Updated by mmccune@redhat.com almost 8 years ago

  • Severity changed from 2. Medium to 3. High
  • Version set to 2.8.0

this is fairly severe in that it breaks a good porting of RHEL provisioning. moved to High severity

Actions #2

Updated by jcline@redhat.com almost 8 years ago

  • Description updated (diff)
Actions #3

Updated by jcline@redhat.com almost 8 years ago

  • Subject changed from Pulp-to-pulp distribution syncing is almost certainly broken in some cases to Pulp publishes invalid PULP_DISTRIBUTION.xml metadata
  • Description updated (diff)
  • Status changed from NEW to ASSIGNED
  • Assignee set to jcline@redhat.com

I've re-written the issue to narrow the focus, since the original was very broad. There are already several known issues with distributions (issue #1768 which was only a very short-term fix and doesn't address the incorrect modeling and #1769 which describes content we fail to mirror).

I intend to ensure Pulp doesn't publish metadata that references files that doesn't exist. However, it may be that it won't reference files that need to exist. I don't know what is using (or not using) `repodata/productid` and I find it troubling that we don't mirror upstream, but I don't think I should to tackle all the problems we have as part of this issue.

Actions #4

Updated by mhrivnak almost 8 years ago

A simple work-around that would improve, but not fix the situation, would be to do the same filtering during sync that we do during publish. Then at least pulp deployments with that change would happily ignore the same files that publish ignores.

As you point out, a better option is to modify the XML at publish time to filter out any files that don't actually get published. This would be more effort, but is still very doable.

And of course the best option would require figuring out why exactly pulp ignores those files, document that somewhere (at least in the code if not elsewhere), and determine if skipping those files is in fact appropriate.

To unblock katello, perhaps a combination of the first two would be valuable. You could probably make a PR for the first work-around very quickly, and then follow with the second option shortly thereafter. That would buy us time to further investigate why pulp is doing this at all. What do you think of that?

Actions #5

Updated by mhrivnak almost 8 years ago

  • Priority changed from Normal to High
  • Sprint/Milestone set to 19
  • Platform Release set to 2.8.3
Actions #6

Updated by mhrivnak almost 8 years ago

  • Triaged changed from No to Yes
Actions #7

Updated by jcline@redhat.com almost 8 years ago

  • Status changed from ASSIGNED to POST

https://github.com/pulp/pulp_rpm/pull/846

Note that the first suggested work-around in note 4 isn't possible because it would break lazy syncs.

Added by Jeremy Cline almost 8 years ago

Revision 9f97669b | View on GitHub

Regenerate PULP_DISTRIBUTION.xml on publish if necessary

The PULP_DISTRIBUTION.xml file used to be saved from an upstream repository and republished without modification. This is problematic because files referenced by that file are filtered out during a publish. This commit is a short-term work-around to that problematic workflow. Without it, Pulp (or anything else using PULP_DISTRIBUTION.xml) will attempt to download files that don't exist in the published repository.

fixes #1843

Actions #8

Updated by Anonymous almost 8 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #9

Updated by semyers almost 8 years ago

  • Status changed from MODIFIED to 5
Actions #10

Updated by pthomas@redhat.com almost 8 years ago

  • Status changed from 5 to 6

verified


[root@ibm-x3250m4-03 ~]# pulp-admin rpm repo sync run --repo-id rhel6 
+----------------------------------------------------------------------+
                    Synchronizing Repository [rhel6]
+----------------------------------------------------------------------+

This command may be exited via ctrl+c without affecting the request.

Downloading metadata...
[|]
... completed

Downloading repository content...
[-]
[==================================================] 100%
RPMs:       0/0 items
Delta RPMs: 0/0 items

... completed

Downloading distribution files...
[==================================================] 100%
Distributions: 0/0 items
... completed

Importing errata...
[-]
... completed

Importing package groups/categories...
[-]
... completed

Cleaning duplicate packages...
[-]
... completed

Task Succeeded

Copying files
[-]
... completed

Initializing repo metadata
[-]
... completed

Publishing Distribution files
[|]
... completed

Publishing RPMs
[/]
... completed

Publishing Delta RPMs
... skipped

Publishing Errata
[-]
... completed

Publishing Comps file
[==================================================] 100%
212 of 212 items
... completed

Publishing Metadata.
[-]
... completed

Closing repo metadata
[-]
... completed

Generating sqlite files
... skipped

Publishing files to web
[\]
... completed

Writing Listings File
[-]
... completed

Writing Listings File
[-]
... completed

Task Succeeded
Actions #11

Updated by semyers almost 8 years ago

  • Status changed from 6 to CLOSED - CURRENTRELEASE
Actions #13

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 1
Actions #14

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (19)
Actions #15

Updated by bmbouter almost 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF