Story #1724
closed
Publish should be a no-op if no units and no settings have changed since the last successful publish
Status:
CLOSED - CURRENTRELEASE
Description
User case: a user has a script to publish 10 repositories and 1 of them fails. They re-run the script and it doesn't know/track which one failed so it publishes all 10 again. For the original 9 that succeeded Pulp should no-op since no settings and units have been modified since the last successful publish. Even though publishes with no changed data are quick, subsequent publishes will cause files in the repodata to be rewritten and they will change in small ways. Due to checksums used in those filenames the second publish will have a completely different filename. This is undesirable to some users.
Publish should be a no-op if all of the following are true:
- no units have been added since the last successful publish.
- no units have been removed since the last successful publish.
- no distributor has been modified since the last successful publish.
Here is a pulp-smash issue which corresponds with this: https://github.com/PulpQE/pulp-smash/issues/127
- Description updated (diff)
This could be done by having the platform track when the last time is that the distributor config changed. There is a similar desire to track when an importer config last changed, to determine when a full sync should happen.
In talking with @ipanova about this, why can't we make this a bug instead and not introduce an option to turn this on. In that case the behavior described would be on by default.
I like the idea of this being the default behavior. The absence of that behavior is inefficient, but is it a bug? I could see it both ways.
Regardless of what we call it, if we make that the default behavior, we need an option to force a full publish. We got into trouble with the yum importer when we tried a similar optimization that skips most of the work, we didn't provide an option to force a full sync, and we have several times needed to do tricks to help a user force a full sync. Imagine that /var/lib/pulp/published is lost or corrupted and needs to be recreated. Or a user goes poking around in there and messes up some published data. Or there's a bug in pulp that produces incorrect published data, and users need to re-publish after getting a fix.
mhrivnak, thanks for the input. We definitely need to be able to force a republish. The good news is we are tracking that work as https://pulp.plan.io/issues/1158
I'm going to rewrite this into a bug and drop the usage of a distributor option. Two different pulp user groups have identified this behavior as the expected behavior and it makes sense to me too.
- Tracker changed from Story to Issue
- Subject changed from As a user, I have a distributor option which causes publish to be a no-op if no units and no settings have changed since the last successful publish to Publish should be a no-op if no units and no settings have changed since the last successful publish
- Description updated (diff)
- Severity set to 2. Medium
- Triaged set to No
- Blocked by Story #1158: As a user, I can force full/fresh publish of rpms and not do an incremental publish added
- Severity changed from 2. Medium to 1. Low
- Triaged changed from No to Yes
Distributors should not be impacted by changes to importer settings, so I feel comfortable not considering the importer or its settings at all for this behavior. In the very small number of cases where we've needed the importer to make the distributor aware of something, we've managed to do so without require the distributor to look at the importer. Otherwise, the line between importers and distributors had been kept quite rigid.
- Sprint/Milestone set to 19
- Description updated (diff)
repo_id is immutable, so that won't be a problem.
display_name is not used programmatically to my knowledge. It's intended only for human consumption. I suppose in theory a human-readable name could reasonably be used to form a URL, like many blogs do, but I don't think any part of pulp does that currently. I'm in favor of explicitly documenting that it is not for programmatic use.
thanks mhrivnak. It sounds like the story as it is currently written will guard against the necessary cases. I think it's ready for working on.
- Status changed from NEW to ASSIGNED
- Assignee set to ipanova@redhat.com
- Blocked by Issue #1847: last_unit_added is not added in mongo repo collection records added
- Description updated (diff)
- Project changed from RPM Support to Pulp
- Status changed from ASSIGNED to POST
- Blocked by deleted (Issue #1847: last_unit_added is not added in mongo repo collection records)
- Sprint/Milestone changed from 19 to 20
- Platform Release set to 2.9.0
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
- Tracker changed from Issue to Story
- Groomed set to No
- Sprint Candidate set to No
- Groomed changed from No to Yes
- Sprint Candidate changed from No to Yes
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
- Sprint/Milestone deleted (
20)
Also available in: Atom
PDF
Publish should be a no-op if nothing changed since the last publish.
closes #1724 https://pulp.plan.io/issues/1724