Project

Profile

Help

Issue #3357

closed

docker_distributor_web is racy, not atomic

Added by rmcgover almost 7 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Start date:
Due date:
Estimated time:
Severity:
3. High
Version - Docker:
3.1.0
Platform Release:
2.15.3
Target Release - Docker:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Sprint 33
Quarter:

Description

docker_distributor_web's publish of the redirect file (used by crane) is:

- racy: there is a period of time during publish where neither old nor new file can be accessed
- not atomic: if publish fails part way through, the system can be left with no published redirect file, leaving the repo broken in crane indefinitely

This behavior comes from the order of operations here:

plugins/pulp_docker/plugins/distributors/publish_steps.py:
class V2WebPublisher(publish_step.PublishStep):
    """
    This class performs the work of publishing a v2 Docker repository.
    """
    def __init__(self, repo, publish_conduit, config, repo_content_unit_q=None):
    (...)
        app_file = configuration.get_redirect_file_name(repo)
        app_publish_location = os.path.join(
            configuration.get_app_publish_dir(config, docker_api_version), app_file)
        self.working_dir = os.path.join(self.get_working_dir(), docker_api_version)
        misc.mkdir(self.working_dir)
        self.web_working_dir = os.path.join(self.get_working_dir(), 'web')
        master_publish_dir = configuration.get_master_publish_dir(repo, config, docker_api_version)
        atomic_publish_step = publish_step.AtomicDirectoryPublishStep(
            self.get_working_dir(), [('', publish_dir), (app_file, app_publish_location)],
            master_publish_dir, step_type=constants.PUBLISH_STEP_OVER_HTTP)
        atomic_publish_step.description = _('Making v2 files available via web.')
        self.add_child(PublishBlobsStep(repo_content_unit_q=repo_content_unit_q))
        self.publish_manifests_step = PublishManifestsStep(
            repo_content_unit_q=repo_content_unit_q)
        self.add_child(self.publish_manifests_step)
        self.publish_manifest_lists_step = PublishManifestListsStep(
            repo_content_unit_q=repo_content_unit_q)
        self.add_child(self.publish_manifest_lists_step)
        self.add_child(PublishTagsStep())
        self.add_child(atomic_publish_step)
        self.add_child(RedirectFileStep(app_publish_location))

Note that the atomic_publish_step is second-last step, and then RedirectFileStep happens after that, writing to a file within the publish directory.

Example: I have a repo named redhat-e2e-container-e2e-container-test-product whose redirect file read by crane is /var/lib/pulp/published/docker/v2/app/redhat-e2e-container-e2e-container-test-product.json - a symlink to /var/lib/pulp/published/docker/v2/master/redhat-e2e-container-e2e-container-test-product/<timestamp>/redhat-e2e-container-e2e-container-test-product.json

In the publish sequence above, the last two steps are:

1. Rewrite symlinks, replacing old with new publish directory; includes rewrite of /var/lib/pulp/published/docker/v2/app/redhat-e2e-container-e2e-container-test-product.json to /var/lib/pulp/published/docker/v2/master/redhat-e2e-container-e2e-container-test-product/<this_publish_timestamp>/redhat-e2e-container-e2e-container-test-product.json
2. Write redirect file to /var/lib/pulp/published/docker/v2/web/redhat-e2e-container-e2e-container-test-product/redhat-e2e-container-e2e-container-test-product.json

Therefore there's a period of time between steps 1 and 2 where the redirect file doesn't exist. Also, if step 2 fails for any reason, the redirect file will be left missing indefinitely. In either case, if crane loads redirect files during that time, it will behave as though the repo doesn't exist.

To reproduce:

1. Publish a docker repo X, ensure it has redirect file and can be pulled by crane.

2. Set up next publish to fail during write of redirect file.

Easiest way to do this involves patching the source to trigger a failure:

diff --git a/plugins/pulp_docker/plugins/distributors/publish_steps.py b/plugins/pulp_docker/plugins/distributors/publish_steps.py
index 365e229..e4eb10a 100644
--- a/plugins/pulp_docker/plugins/distributors/publish_steps.py
+++ b/plugins/pulp_docker/plugins/distributors/publish_steps.py
@@ -209,6 +209,7 @@ class RedirectFileStep(publish_step.PublishStep):
         """
         Publish the JSON file for Crane.
         """
+        raise RuntimeError("Simulated error")
         registry = configuration.get_repo_registry_id(self.get_repo(), self.get_config())
         redirect_url = configuration.get_redirect_url(self.get_config(), self.get_repo(), 'v2')

(Similarly a time.sleep could be introduced to simulate "writing the redirect file is slow" case)

Ensure Pulp services are restarted after the patch.

3. Trigger a publish of repo X and wait for it to fail.

4. Check redirect file under /var/lib/pulp/published/docker/v2/app for repo X. Check pull from crane.

Actual results

Redirect file for repo X does not exist at expected path.

Content of repo can't be pulled from crane. (Note: since crane loading of files is based on async polling, it could take a little while for this problem to appear)

Expected result

Redirect file for repo X exists at expected path, with content same as prior to the failed publish.

Content of repo can be pulled from crane, using the content same as prior to the failed publish.

Actions #2

Updated by rmcgover almost 7 years ago

  • Description updated (diff)
Actions #3

Updated by dalley almost 7 years ago

  • Severity changed from 2. Medium to 3. High
  • Triaged changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #5

Updated by ipanova@redhat.com almost 7 years ago

  • Status changed from NEW to POST
  • Assignee set to rmcgover
Actions #6

Updated by ipanova@redhat.com almost 7 years ago

  • Sprint/Milestone set to 56

Added by rmcgover almost 7 years ago

Revision 178d3280 | View on GitHub

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

  • Put symlink in place pointing at redirect file under master directory
  • Create the redirect file

That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last step.

closes #3357 https://pulp.plan.io/issues/3357

Added by rmcgover almost 7 years ago

Revision 178d3280 | View on GitHub

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

  • Put symlink in place pointing at redirect file under master directory
  • Create the redirect file

That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last step.

closes #3357 https://pulp.plan.io/issues/3357

Added by rmcgover almost 7 years ago

Revision 178d3280 | View on GitHub

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

  • Put symlink in place pointing at redirect file under master directory
  • Create the redirect file

That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last step.

closes #3357 https://pulp.plan.io/issues/3357

Added by rmcgover almost 7 years ago

Revision 178d3280 | View on GitHub

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

  • Put symlink in place pointing at redirect file under master directory
  • Create the redirect file

That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last step.

closes #3357 https://pulp.plan.io/issues/3357

Actions #7

Updated by rmcgover almost 7 years ago

  • Status changed from POST to MODIFIED
Actions #8

Updated by bmbouter almost 7 years ago

  • Platform Release set to 2.15.3
Actions #9

Updated by bmbouter almost 7 years ago

  • Sprint set to Sprint 33
Actions #10

Updated by bmbouter almost 7 years ago

  • Sprint/Milestone deleted (56)

Added by rmcgover almost 7 years ago

Revision c00b493f | View on GitHub

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

  • Put symlink in place pointing at redirect file under master directory
  • Create the redirect file

That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last step.

closes #3357 https://pulp.plan.io/issues/3357

(cherry picked from commit 178d32807ffad3d9a2758cf5702a1d84b5ecd842)

Added by rmcgover almost 7 years ago

Revision c00b493f | View on GitHub

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

  • Put symlink in place pointing at redirect file under master directory
  • Create the redirect file

That means there's a period of time during publish where the redirect file for the repo doesn't exist at the expected path. If crane loads metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last step.

closes #3357 https://pulp.plan.io/issues/3357

(cherry picked from commit 178d32807ffad3d9a2758cf5702a1d84b5ecd842)

Actions #11

Updated by rmcgover almost 7 years ago

Actions #12

Updated by bmbouter almost 7 years ago

  • Status changed from MODIFIED to 5
Actions #13

Updated by bmbouter almost 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #14

Updated by bmbouter over 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF