Story #1724
closedPublish should be a no-op if no units and no settings have changed since the last successful publish
100%
Description
User case: a user has a script to publish 10 repositories and 1 of them fails. They re-run the script and it doesn't know/track which one failed so it publishes all 10 again. For the original 9 that succeeded Pulp should no-op since no settings and units have been modified since the last successful publish. Even though publishes with no changed data are quick, subsequent publishes will cause files in the repodata to be rewritten and they will change in small ways. Due to checksums used in those filenames the second publish will have a completely different filename. This is undesirable to some users.
Publish should be a no-op if all of the following are true:
- no units have been added since the last successful publish.
- no units have been removed since the last successful publish.
- no distributor has been modified since the last successful publish.
Here is a pulp-smash issue which corresponds with this: https://github.com/PulpQE/pulp-smash/issues/127
Related issues
Updated by mhrivnak almost 9 years ago
This could be done by having the platform track when the last time is that the distributor config changed. There is a similar desire to track when an importer config last changed, to determine when a full sync should happen.
Updated by bmbouter over 8 years ago
In talking with @ipanova about this, why can't we make this a bug instead and not introduce an option to turn this on. In that case the behavior described would be on by default.
Updated by mhrivnak over 8 years ago
I like the idea of this being the default behavior. The absence of that behavior is inefficient, but is it a bug? I could see it both ways.
Regardless of what we call it, if we make that the default behavior, we need an option to force a full publish. We got into trouble with the yum importer when we tried a similar optimization that skips most of the work, we didn't provide an option to force a full sync, and we have several times needed to do tricks to help a user force a full sync. Imagine that /var/lib/pulp/published is lost or corrupted and needs to be recreated. Or a user goes poking around in there and messes up some published data. Or there's a bug in pulp that produces incorrect published data, and users need to re-publish after getting a fix.
Updated by bmbouter over 8 years ago
mhrivnak, thanks for the input. We definitely need to be able to force a republish. The good news is we are tracking that work as https://pulp.plan.io/issues/1158
I'm going to rewrite this into a bug and drop the usage of a distributor option. Two different pulp user groups have identified this behavior as the expected behavior and it makes sense to me too.
Updated by bmbouter over 8 years ago
- Tracker changed from Story to Issue
- Subject changed from As a user, I have a distributor option which causes publish to be a no-op if no units and no settings have changed since the last successful publish to Publish should be a no-op if no units and no settings have changed since the last successful publish
- Description updated (diff)
- Severity set to 2. Medium
- Triaged set to No
Updated by mhrivnak over 8 years ago
- Blocked by Story #1158: As a user, I can force full/fresh publish of rpms and not do an incremental publish added
Updated by mhrivnak over 8 years ago
- Severity changed from 2. Medium to 1. Low
- Triaged changed from No to Yes
Updated by mhrivnak over 8 years ago
Distributors should not be impacted by changes to importer settings, so I feel comfortable not considering the importer or its settings at all for this behavior. In the very small number of cases where we've needed the importer to make the distributor aware of something, we've managed to do so without require the distributor to look at the importer. Otherwise, the line between importers and distributors had been kept quite rigid.
Updated by bmbouter over 8 years ago
- Description updated (diff)
+1 to not considering the importer settings. I've removed it from the description.
What about the attributes on a repo itself[0]. Those probably also need to be monitored right? For example the repo-id or the display_name would affect the published path so modifying those and republishing should cause a publish to happen even if no distributors were modified and no units in the repo were modified.
Updated by mhrivnak over 8 years ago
repo_id is immutable, so that won't be a problem.
display_name is not used programmatically to my knowledge. It's intended only for human consumption. I suppose in theory a human-readable name could reasonably be used to form a URL, like many blogs do, but I don't think any part of pulp does that currently. I'm in favor of explicitly documenting that it is not for programmatic use.
Updated by bmbouter over 8 years ago
thanks mhrivnak. It sounds like the story as it is currently written will guard against the necessary cases. I think it's ready for working on.
Updated by ipanova@redhat.com over 8 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ipanova@redhat.com
Updated by ipanova@redhat.com over 8 years ago
- Blocked by Issue #1847: last_unit_added is not added in mongo repo collection records added
Updated by bmbouter over 8 years ago
Currently this is planned an RPM feature, but I think it could implemented almost as easily as a platform feature which would make it available for all plugin types.
In the publish task itself you could have it run all the checks described in this ticket happen before the call to _do_publish(). That would get implemented here[0].
Updated by ipanova@redhat.com over 8 years ago
- Project changed from RPM Support to Pulp
- Status changed from ASSIGNED to POST
Updated by ipanova@redhat.com over 8 years ago
- Blocked by deleted (Issue #1847: last_unit_added is not added in mongo repo collection records)
Updated by ipanova@redhat.com over 8 years ago
- Sprint/Milestone changed from 19 to 20
Added by ipanova@redhat.com over 8 years ago
Added by ipanova@redhat.com over 8 years ago
Revision 22b4eda3 | View on GitHub
Publish should be a no-op if nothing changed since the last publish.
Updated by ipanova@redhat.com over 8 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset pulp|22b4eda3d894221a9b37433cfefd6a4e1071f332.
Updated by ipanova@redhat.com over 8 years ago
- Tracker changed from Issue to Story
- Groomed set to No
- Sprint Candidate set to No
Updated by ipanova@redhat.com over 8 years ago
- Groomed changed from No to Yes
- Sprint Candidate changed from No to Yes
Updated by pthomas@redhat.com over 8 years ago
Updated by semyers over 8 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Publish should be a no-op if nothing changed since the last publish.
closes #1724 https://pulp.plan.io/issues/1724