Issue #3357
closeddocker_distributor_web is racy, not atomic
Description
docker_distributor_web's publish of the redirect file (used by crane) is:
- racy: there is a period of time during publish where neither old nor new file can be accessed
- not atomic: if publish fails part way through, the system can be left with no published redirect file, leaving the repo broken in crane indefinitely
This behavior comes from the order of operations here:
plugins/pulp_docker/plugins/distributors/publish_steps.py:
class V2WebPublisher(publish_step.PublishStep):
"""
This class performs the work of publishing a v2 Docker repository.
"""
def __init__(self, repo, publish_conduit, config, repo_content_unit_q=None):
(...)
app_file = configuration.get_redirect_file_name(repo)
app_publish_location = os.path.join(
configuration.get_app_publish_dir(config, docker_api_version), app_file)
self.working_dir = os.path.join(self.get_working_dir(), docker_api_version)
misc.mkdir(self.working_dir)
self.web_working_dir = os.path.join(self.get_working_dir(), 'web')
master_publish_dir = configuration.get_master_publish_dir(repo, config, docker_api_version)
atomic_publish_step = publish_step.AtomicDirectoryPublishStep(
self.get_working_dir(), [('', publish_dir), (app_file, app_publish_location)],
master_publish_dir, step_type=constants.PUBLISH_STEP_OVER_HTTP)
atomic_publish_step.description = _('Making v2 files available via web.')
self.add_child(PublishBlobsStep(repo_content_unit_q=repo_content_unit_q))
self.publish_manifests_step = PublishManifestsStep(
repo_content_unit_q=repo_content_unit_q)
self.add_child(self.publish_manifests_step)
self.publish_manifest_lists_step = PublishManifestListsStep(
repo_content_unit_q=repo_content_unit_q)
self.add_child(self.publish_manifest_lists_step)
self.add_child(PublishTagsStep())
self.add_child(atomic_publish_step)
self.add_child(RedirectFileStep(app_publish_location))
Note that the atomic_publish_step is second-last step, and then RedirectFileStep happens after that, writing to a file within the publish directory.
Example: I have a repo named redhat-e2e-container-e2e-container-test-product whose redirect file read by crane is /var/lib/pulp/published/docker/v2/app/redhat-e2e-container-e2e-container-test-product.json - a symlink to /var/lib/pulp/published/docker/v2/master/redhat-e2e-container-e2e-container-test-product/<timestamp>/redhat-e2e-container-e2e-container-test-product.json
In the publish sequence above, the last two steps are:
1. Rewrite symlinks, replacing old with new publish directory; includes rewrite of /var/lib/pulp/published/docker/v2/app/redhat-e2e-container-e2e-container-test-product.json to /var/lib/pulp/published/docker/v2/master/redhat-e2e-container-e2e-container-test-product/<this_publish_timestamp>/redhat-e2e-container-e2e-container-test-product.json
2. Write redirect file to /var/lib/pulp/published/docker/v2/web/redhat-e2e-container-e2e-container-test-product/redhat-e2e-container-e2e-container-test-product.json
Therefore there's a period of time between steps 1 and 2 where the redirect file doesn't exist. Also, if step 2 fails for any reason, the redirect file will be left missing indefinitely. In either case, if crane loads redirect files during that time, it will behave as though the repo doesn't exist.
To reproduce:¶
1. Publish a docker repo X, ensure it has redirect file and can be pulled by crane.
2. Set up next publish to fail during write of redirect file.
Easiest way to do this involves patching the source to trigger a failure:
diff --git a/plugins/pulp_docker/plugins/distributors/publish_steps.py b/plugins/pulp_docker/plugins/distributors/publish_steps.py
index 365e229..e4eb10a 100644
--- a/plugins/pulp_docker/plugins/distributors/publish_steps.py
+++ b/plugins/pulp_docker/plugins/distributors/publish_steps.py
@@ -209,6 +209,7 @@ class RedirectFileStep(publish_step.PublishStep):
"""
Publish the JSON file for Crane.
"""
+ raise RuntimeError("Simulated error")
registry = configuration.get_repo_registry_id(self.get_repo(), self.get_config())
redirect_url = configuration.get_redirect_url(self.get_config(), self.get_repo(), 'v2')
(Similarly a time.sleep could be introduced to simulate "writing the redirect file is slow" case)
Ensure Pulp services are restarted after the patch.
3. Trigger a publish of repo X and wait for it to fail.
4. Check redirect file under /var/lib/pulp/published/docker/v2/app for repo X. Check pull from crane.
Actual results¶
Redirect file for repo X does not exist at expected path.
Content of repo can't be pulled from crane. (Note: since crane loading of files is based on async polling, it could take a little while for this problem to appear)
Expected result¶
Redirect file for repo X exists at expected path, with content same as prior to the failed publish.
Content of repo can be pulled from crane, using the content same as prior to the failed publish.
Updated by dalley almost 7 years ago
- Severity changed from 2. Medium to 3. High
- Triaged changed from No to Yes
- Sprint Candidate changed from No to Yes
Updated by rmcgover over 6 years ago
Pull request: https://github.com/pulp/pulp_docker/pull/225
Updated by ipanova@redhat.com over 6 years ago
- Status changed from NEW to POST
- Assignee set to rmcgover
Added by rmcgover over 6 years ago
Added by rmcgover over 6 years ago
Revision 178d3280 | View on GitHub
Fix race in docker v2 publish
Previously, the last two steps of docker v2 publish were:
- Put symlink in place pointing at redirect file under master directory
- Create the redirect file
That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.
This commit fixes the issue by moving the symlink rewrite to the last step.
Added by rmcgover over 6 years ago
Revision 178d3280 | View on GitHub
Fix race in docker v2 publish
Previously, the last two steps of docker v2 publish were:
- Put symlink in place pointing at redirect file under master directory
- Create the redirect file
That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.
This commit fixes the issue by moving the symlink rewrite to the last step.
Added by rmcgover over 6 years ago
Revision 178d3280 | View on GitHub
Fix race in docker v2 publish
Previously, the last two steps of docker v2 publish were:
- Put symlink in place pointing at redirect file under master directory
- Create the redirect file
That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.
This commit fixes the issue by moving the symlink rewrite to the last step.
Updated by rmcgover over 6 years ago
- Status changed from POST to MODIFIED
Applied in changeset 178d32807ffad3d9a2758cf5702a1d84b5ecd842.
Added by rmcgover over 6 years ago
Revision c00b493f | View on GitHub
Fix race in docker v2 publish
Previously, the last two steps of docker v2 publish were:
- Put symlink in place pointing at redirect file under master directory
- Create the redirect file
That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.
This commit fixes the issue by moving the symlink rewrite to the last step.
closes #3357 https://pulp.plan.io/issues/3357
(cherry picked from commit 178d32807ffad3d9a2758cf5702a1d84b5ecd842)
Added by rmcgover over 6 years ago
Revision c00b493f | View on GitHub
Fix race in docker v2 publish
Previously, the last two steps of docker v2 publish were:
- Put symlink in place pointing at redirect file under master directory
- Create the redirect file
That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.
This commit fixes the issue by moving the symlink rewrite to the last step.
closes #3357 https://pulp.plan.io/issues/3357
(cherry picked from commit 178d32807ffad3d9a2758cf5702a1d84b5ecd842)
Updated by rmcgover over 6 years ago
Applied in changeset c00b493f89371e2080728dbb5288e34ee1bbff78.
Updated by bmbouter over 6 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Fix race in docker v2 publish
Previously, the last two steps of docker v2 publish were:
That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.
This commit fixes the issue by moving the symlink rewrite to the last step.
closes #3357 https://pulp.plan.io/issues/3357