Project

Profile

Help

Issue #3098

closed

Docker publish may fail with "OSError: [Errno 17] File exists" if two publishes triggered at same time

Added by rmcgover over 6 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.19.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 53
Quarter:

Description

If two docker_web_distributor_name_cli publishes are triggered on the same Pulp installation for two different repos at the same time, where "same time" means that str(time.time()) evaluates to the same for both publish tasks (i.e. tasks are scheduled within same 100th of a second), then the publishes race with each other for access to the same path, which can lead to a failed publish.

A publish may fail with a backtrace like this:

 Task pulp.server.managers.repo.publish.publish[28381da7-812d-436f-ad5b-b23785b4928b] raised unexpected: OSError(17, 'File exists')
 Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
     R = retval = fun(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 473, in __call__
     return super(Task, self).__call__(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 103, in __call__
     return super(PulpTask, self).__call__(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
     return self.run(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 968, in publish
     result = _do_publish(repo_obj, dist_id, dist_inst, transfer_repo, conduit, call_config)
   File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 1020, in _do_publish
     publish_report = publish_repo(transfer_repo, conduit, call_config)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 658, in wrap_f
     return f(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp_docker/plugins/distributors/distributor_web.py", line 123, in publish_repo
     return self._publisher.publish()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 697, in publish
     return self.process_lifecycle()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 562, in process_lifecycle
     super(PluginStep, self).process_lifecycle()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 159, in process_lifecycle
     step.process()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 249, in process
     self._process_block()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 293, in _process_block
     self.process_main()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 910, in process_main
     os.symlink(timestamp_master_location, tmp_link_name)
 OSError: [Errno 17] File exists

This would be difficult to reproduce, but can be revealed by reading the code for the atomic directory publish step.

Here's some of the code from server/pulp/plugins/util/publish_step.py :

            # Create the parent directory of the published repository tree, if needed
            publish_dir_parent = os.path.dirname(publish_location)
            if not os.path.exists(publish_dir_parent):
                misc.mkdir(publish_dir_parent, 0750)

            if not self.only_publish_directory_contents:
                # Create a temporary symlink in the parent of the published directory tree
                tmp_link_name = os.path.join(publish_dir_parent, self.parent.timestamp)
                os.symlink(timestamp_master_location, tmp_link_name)

The last statement is the one which raised "File exists".

When publishing the docker v1 redirect file, then the relevant variables here will have values such as:

publish_location = "/var/lib/pulp/published/docker/v1/app/<repo-id>.json"
publish_dir_parent = "/var/lib/pulp/published/docker/v1/app"
tmp_link_name = "/var/lib/pulp/published/docker/v1/app/<timestamp>"

If there are two publishes triggered with the same timestamp, then tmp_link_name will be equal for both tasks, and the tasks will therefore both attempt to create a symlink at the same path.

I realize this sounds unlikely, but a crash due to this has happened on our installation.

This was observed on Pulp 2.8, but from review, all the relevant code seems unchanged in 2.14.

Also available in: Atom PDF