Project

Profile

Help

Issue #3357

docker_distributor_web is racy, not atomic

Added by rmcgover over 1 year ago. Updated 2 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Severity:
3. High
Version - Docker:
3.1.0
Platform Release:
2.15.3
Blocks Release:
Target Release - Docker:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
Yes
Tags:
Pulp 2
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 33

Description

docker_distributor_web's publish of the redirect file (used by crane) is:

- racy: there is a period of time during publish where neither old nor new file can be accessed
- not atomic: if publish fails part way through, the system can be left with no published redirect file, leaving the repo broken in crane indefinitely

This behavior comes from the order of operations here:

plugins/pulp_docker/plugins/distributors/publish_steps.py:
class V2WebPublisher(publish_step.PublishStep):
    """ 
    This class performs the work of publishing a v2 Docker repository.
    """ 
    def __init__(self, repo, publish_conduit, config, repo_content_unit_q=None):
    (...)
        app_file = configuration.get_redirect_file_name(repo)
        app_publish_location = os.path.join(
            configuration.get_app_publish_dir(config, docker_api_version), app_file)
        self.working_dir = os.path.join(self.get_working_dir(), docker_api_version)
        misc.mkdir(self.working_dir)
        self.web_working_dir = os.path.join(self.get_working_dir(), 'web')
        master_publish_dir = configuration.get_master_publish_dir(repo, config, docker_api_version)
        atomic_publish_step = publish_step.AtomicDirectoryPublishStep(
            self.get_working_dir(), [('', publish_dir), (app_file, app_publish_location)],
            master_publish_dir, step_type=constants.PUBLISH_STEP_OVER_HTTP)
        atomic_publish_step.description = _('Making v2 files available via web.')
        self.add_child(PublishBlobsStep(repo_content_unit_q=repo_content_unit_q))
        self.publish_manifests_step = PublishManifestsStep(
            repo_content_unit_q=repo_content_unit_q)
        self.add_child(self.publish_manifests_step)
        self.publish_manifest_lists_step = PublishManifestListsStep(
            repo_content_unit_q=repo_content_unit_q)
        self.add_child(self.publish_manifest_lists_step)
        self.add_child(PublishTagsStep())
        self.add_child(atomic_publish_step)
        self.add_child(RedirectFileStep(app_publish_location))

Note that the atomic_publish_step is second-last step, and then RedirectFileStep happens after that, writing to a file within the publish directory.

Example: I have a repo named redhat-e2e-container-e2e-container-test-product whose redirect file read by crane is /var/lib/pulp/published/docker/v2/app/redhat-e2e-container-e2e-container-test-product.json - a symlink to /var/lib/pulp/published/docker/v2/master/redhat-e2e-container-e2e-container-test-product/<timestamp>/redhat-e2e-container-e2e-container-test-product.json

In the publish sequence above, the last two steps are:

1. Rewrite symlinks, replacing old with new publish directory; includes rewrite of /var/lib/pulp/published/docker/v2/app/redhat-e2e-container-e2e-container-test-product.json to /var/lib/pulp/published/docker/v2/master/redhat-e2e-container-e2e-container-test-product/<this_publish_timestamp>/redhat-e2e-container-e2e-container-test-product.json
2. Write redirect file to /var/lib/pulp/published/docker/v2/web/redhat-e2e-container-e2e-container-test-product/redhat-e2e-container-e2e-container-test-product.json

Therefore there's a period of time between steps 1 and 2 where the redirect file doesn't exist. Also, if step 2 fails for any reason, the redirect file will be left missing indefinitely. In either case, if crane loads redirect files during that time, it will behave as though the repo doesn't exist.

To reproduce:

1. Publish a docker repo X, ensure it has redirect file and can be pulled by crane.

2. Set up next publish to fail during write of redirect file.

Easiest way to do this involves patching the source to trigger a failure:

diff --git a/plugins/pulp_docker/plugins/distributors/publish_steps.py b/plugins/pulp_docker/plugins/distributors/publish_steps.py
index 365e229..e4eb10a 100644
--- a/plugins/pulp_docker/plugins/distributors/publish_steps.py
+++ b/plugins/pulp_docker/plugins/distributors/publish_steps.py
@@ -209,6 +209,7 @@ class RedirectFileStep(publish_step.PublishStep):
         """ 
         Publish the JSON file for Crane.
         """ 
+        raise RuntimeError("Simulated error")
         registry = configuration.get_repo_registry_id(self.get_repo(), self.get_config())
         redirect_url = configuration.get_redirect_url(self.get_config(), self.get_repo(), 'v2')

(Similarly a time.sleep could be introduced to simulate "writing the redirect file is slow" case)

Ensure Pulp services are restarted after the patch.

3. Trigger a publish of repo X and wait for it to fail.

4. Check redirect file under /var/lib/pulp/published/docker/v2/app for repo X. Check pull from crane.

Actual results

Redirect file for repo X does not exist at expected path.

Content of repo can't be pulled from crane. (Note: since crane loading of files is based on async polling, it could take a little while for this problem to appear)

Expected result

Redirect file for repo X exists at expected path, with content same as prior to the failed publish.

Content of repo can be pulled from crane, using the content same as prior to the failed publish.

Associated revisions

Revision 178d3280 View on GitHub
Added by rmcgover over 1 year ago

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

- Put symlink in place pointing at redirect file under master directory
- Create the redirect file

That means there's a period of time during publish where the redirect
file for the repo doesn't exist at the expected path. If crane loads
metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last
step.

closes #3357
https://pulp.plan.io/issues/3357

Revision 178d3280 View on GitHub
Added by rmcgover over 1 year ago

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

- Put symlink in place pointing at redirect file under master directory
- Create the redirect file

That means there's a period of time during publish where the redirect
file for the repo doesn't exist at the expected path. If crane loads
metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last
step.

closes #3357
https://pulp.plan.io/issues/3357

Revision c00b493f View on GitHub
Added by rmcgover over 1 year ago

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

- Put symlink in place pointing at redirect file under master directory
- Create the redirect file

That means there's a period of time during publish where the redirect
file for the repo doesn't exist at the expected path. If crane loads
metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last
step.

closes #3357
https://pulp.plan.io/issues/3357

(cherry picked from commit 178d32807ffad3d9a2758cf5702a1d84b5ecd842)

Revision c00b493f View on GitHub
Added by rmcgover over 1 year ago

Fix race in docker v2 publish

Previously, the last two steps of docker v2 publish were:

- Put symlink in place pointing at redirect file under master directory
- Create the redirect file

That means there's a period of time during publish where the redirect
file for the repo doesn't exist at the expected path. If crane loads
metadata at this time, it will fail to load metadata for the repo.

This commit fixes the issue by moving the symlink rewrite to the last
step.

closes #3357
https://pulp.plan.io/issues/3357

(cherry picked from commit 178d32807ffad3d9a2758cf5702a1d84b5ecd842)

History

#2 Updated by rmcgover over 1 year ago

  • Description updated (diff)

#3 Updated by dalley over 1 year ago

  • Severity changed from 2. Medium to 3. High
  • Triaged changed from No to Yes
  • Sprint Candidate changed from No to Yes

#5 Updated by ipanova@redhat.com over 1 year ago

  • Status changed from NEW to POST
  • Assignee set to rmcgover

#6 Updated by ipanova@redhat.com over 1 year ago

  • Sprint/Milestone set to 56

#7 Updated by rmcgover over 1 year ago

  • Status changed from POST to MODIFIED

#8 Updated by bmbouter over 1 year ago

  • Platform Release set to 2.15.3

#9 Updated by bmbouter over 1 year ago

  • Sprint set to Sprint 33

#10 Updated by bmbouter over 1 year ago

  • Sprint/Milestone deleted (56)

#11 Updated by rmcgover over 1 year ago

#12 Updated by bmbouter over 1 year ago

  • Status changed from MODIFIED to ON_QA

#13 Updated by bmbouter over 1 year ago

  • Status changed from ON_QA to CLOSED - CURRENTRELEASE

#14 Updated by bmbouter 2 months ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF