Issue #1088
closedIncremental publish could use the wrong primary.xml during fast-forward publish
Added by bcourt over 9 years ago. Updated over 5 years ago.
Description
The incremental publish searches for the most recently modified file with a filename that matches a regex. It would be more robust to identify the file from the old repomd.xml instead of searching for one on the filesystem. For example, if there are two primary.xml files in the repodata directory, the incremental publish should use the one from the repodata.xml and not the first one it finds.
This regex is performed here[0].
Files
Related issues
Updated by bmbouter almost 9 years ago
- Related to Issue #1087: publishing of distribution units should ignore files in repodata directory added
Updated by bmbouter almost 9 years ago
- Related to Issue #1086: pulp_distribution.xml sync should ignore repodata/* added
Updated by bmbouter almost 9 years ago
- Tracker changed from Issue to Refactor
- Description updated (diff)
- Groomed set to No
- Sprint Candidate set to Yes
Updated by dgregor@redhat.com over 8 years ago
If I understand the description correctly, pulp could use the wrong metadata files for a fast-forward publish, which would result in incorrect metadata being published. Please change this to a bug and raise the priority.
Updated by bmbouter over 8 years ago
- Tracker changed from Refactor to Issue
- Subject changed from incremental publish should use the repomd.xml to determine files to fast-forward to Incremental publish could use the wrong primary.xml during fast-forward publish
- Severity set to 2. Medium
- Triaged set to No
Switching from a refactor to a issue because this could produce incorrect behavior.
Updated by bmbouter over 8 years ago
- Groomed changed from No to Yes
After discussion in #pulp-dev produced a +1 to grooming this issue, I'm marking it as groomed.
Updated by mhrivnak over 8 years ago
- Severity changed from 2. Medium to 1. Low
- Triaged changed from No to Yes
Updated by jortel@redhat.com over 8 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to jortel@redhat.com
Updated by jortel@redhat.com over 8 years ago
After digging into this, I found several things that need to be addressed.
- The fast forward context in the platform is biased toward RPM and is trying to do too much. I suggest the platform classes should only provide the capability of editing an existing XML file that can be used at the discretion of the plugin. The plugin can determine if the file exists and have the writer open it (for edit) instead of creating a new file. I don't think much of the existing logic should be reused. It's not very robust. It makes assumptions about the complexity and structure of the XML document and does not account for namespaces. This is part of the RPM bias.
- The distributor needs to read the existing repomd.xml before the new repomd.xml context is created. This way the information can be used to safely locate existing files. See attached example object for reading the repomd.xml.
- The distributor changed to use the repomd information to locate existing files for fast forward.
Updated by jortel@redhat.com over 8 years ago
- Status changed from ASSIGNED to NEW
- Assignee deleted (
jortel@redhat.com)
Updated by jortel@redhat.com about 8 years ago
- Priority changed from Normal to High
- Severity changed from 3. High to 1. Low
Updated by mhrivnak about 8 years ago
bmbouter, jortel, tteresch and I just talked through this problem.
Next steps: bmbouter is going to try to reproduce and will add notes here.
We all agreed that the likely culprit is the general one described previously in this bug, that timestamps are being used to choose the correct updateinfo file to use during publish, and those timestamps are likely being changed (mangled) by a preceding copy operation.
Open question: Does updateinfo get fast-forwarded at all?
Open question: At some point, the wrong updateinfo file is being chosen for use. Is that either 1) during fast-forward, when the previous publish is being inspected, or 2) at the very end when a new repomd file is being created, and the distributor just wants to know which file was created during the current publish.
We might be able to resolve the issue being seen currently, which only appears to affect updateinfo files, by doing issue #2096 and stop preserving old updateinfo files from one publish to the next. That would need to be vetted as an acceptable behavioral change.
If we find that the problem could be resolved by correctly inspecting a repomd file to identify the current updateinfo file, we could use the createrepo_c python library to do much of that work.
http://rpm-software-management.github.io/createrepo_c/python/lib.html#createrepo_c.Repomd
Updated by bmbouter about 8 years ago
In addition to the reproducer, I manually verified that the updateinfo file that is not referenced by the repo.md file does contain the expected errata that should be included by the incremental publish.
Updated by bmbouter about 8 years ago
Based on some of my testing, I don't think that updateinfo.xml is written incrementally during an incremental publish. The link in the original ticket description links to a codepath which does not provide handling for updateinfo.xml. It does provide incremental publish handling for several other metadata types.
I think that the repo.md writing logic is selecting the wrong updateinfo.xml. I am looking for where that codepath lives.
Updated by bmbouter about 8 years ago
- Priority changed from High to Low
- Sprint/Milestone deleted (
28) - Sprint Candidate changed from Yes to No
This bug was not the cause of the symptoms we were investigating like we originally thought. Since it was put onto the sprint on that basis I'm removing it from the sprint.
Updated by bmbouter over 5 years ago
- Status changed from NEW to CLOSED - WONTFIX
Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.