Publish should be a no-op if no units and no settings have changed since the last successful publish
User case: a user has a script to publish 10 repositories and 1 of them fails. They re-run the script and it doesn't know/track which one failed so it publishes all 10 again. For the original 9 that succeeded Pulp should no-op since no settings and units have been modified since the last successful publish. Even though publishes with no changed data are quick, subsequent publishes will cause files in the repodata to be rewritten and they will change in small ways. Due to checksums used in those filenames the second publish will have a completely different filename. This is undesirable to some users.
Publish should be a no-op if all of the following are true:
- no units have been added since the last successful publish.
- no units have been removed since the last successful publish.
- no distributor has been modified since the last successful publish.
Here is a pulp-smash issue which corresponds with this: https://github.com/PulpQE/pulp-smash/issues/127
#4 Updated by mhrivnak almost 6 years ago
I like the idea of this being the default behavior. The absence of that behavior is inefficient, but is it a bug? I could see it both ways.
Regardless of what we call it, if we make that the default behavior, we need an option to force a full publish. We got into trouble with the yum importer when we tried a similar optimization that skips most of the work, we didn't provide an option to force a full sync, and we have several times needed to do tricks to help a user force a full sync. Imagine that /var/lib/pulp/published is lost or corrupted and needs to be recreated. Or a user goes poking around in there and messes up some published data. Or there's a bug in pulp that produces incorrect published data, and users need to re-publish after getting a fix.
#5 Updated by bmbouter almost 6 years ago
I'm going to rewrite this into a bug and drop the usage of a distributor option. Two different pulp user groups have identified this behavior as the expected behavior and it makes sense to me too.
#6 Updated by bmbouter almost 6 years ago
- Tracker changed from Story to Issue
- Subject changed from As a user, I have a distributor option which causes publish to be a no-op if no units and no settings have changed since the last successful publish to Publish should be a no-op if no units and no settings have changed since the last successful publish
- Description updated (diff)
- Severity set to 2. Medium
- Triaged set to No
#9 Updated by mhrivnak almost 6 years ago
Distributors should not be impacted by changes to importer settings, so I feel comfortable not considering the importer or its settings at all for this behavior. In the very small number of cases where we've needed the importer to make the distributor aware of something, we've managed to do so without require the distributor to look at the importer. Otherwise, the line between importers and distributors had been kept quite rigid.
#11 Updated by bmbouter almost 6 years ago
- Description updated (diff)
+1 to not considering the importer settings. I've removed it from the description.
What about the attributes on a repo itself. Those probably also need to be monitored right? For example the repo-id or the display_name would affect the published path so modifying those and republishing should cause a publish to happen even if no distributors were modified and no units in the repo were modified.
#12 Updated by mhrivnak almost 6 years ago
repo_id is immutable, so that won't be a problem.
display_name is not used programmatically to my knowledge. It's intended only for human consumption. I suppose in theory a human-readable name could reasonably be used to form a URL, like many blogs do, but I don't think any part of pulp does that currently. I'm in favor of explicitly documenting that it is not for programmatic use.
#16 Updated by bmbouter almost 6 years ago
Currently this is planned an RPM feature, but I think it could implemented almost as easily as a platform feature which would make it available for all plugin types.
In the publish task itself you could have it run all the checks described in this ticket happen before the call to _do_publish(). That would get implemented here.