Issue #3098
closedDocker publish may fail with "OSError: [Errno 17] File exists" if two publishes triggered at same time
Description
If two docker_web_distributor_name_cli publishes are triggered on the same Pulp installation for two different repos at the same time, where "same time" means that str(time.time()) evaluates to the same for both publish tasks (i.e. tasks are scheduled within same 100th of a second), then the publishes race with each other for access to the same path, which can lead to a failed publish.
A publish may fail with a backtrace like this:
Task pulp.server.managers.repo.publish.publish[28381da7-812d-436f-ad5b-b23785b4928b] raised unexpected: OSError(17, 'File exists')
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 473, in __call__
return super(Task, self).__call__(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 103, in __call__
return super(PulpTask, self).__call__(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 968, in publish
result = _do_publish(repo_obj, dist_id, dist_inst, transfer_repo, conduit, call_config)
File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 1020, in _do_publish
publish_report = publish_repo(transfer_repo, conduit, call_config)
File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 658, in wrap_f
return f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pulp_docker/plugins/distributors/distributor_web.py", line 123, in publish_repo
return self._publisher.publish()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 697, in publish
return self.process_lifecycle()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 562, in process_lifecycle
super(PluginStep, self).process_lifecycle()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 159, in process_lifecycle
step.process()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 249, in process
self._process_block()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 293, in _process_block
self.process_main()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 910, in process_main
os.symlink(timestamp_master_location, tmp_link_name)
OSError: [Errno 17] File exists
This would be difficult to reproduce, but can be revealed by reading the code for the atomic directory publish step.
Here's some of the code from server/pulp/plugins/util/publish_step.py :
# Create the parent directory of the published repository tree, if needed
publish_dir_parent = os.path.dirname(publish_location)
if not os.path.exists(publish_dir_parent):
misc.mkdir(publish_dir_parent, 0750)
if not self.only_publish_directory_contents:
# Create a temporary symlink in the parent of the published directory tree
tmp_link_name = os.path.join(publish_dir_parent, self.parent.timestamp)
os.symlink(timestamp_master_location, tmp_link_name)
The last statement is the one which raised "File exists".
When publishing the docker v1 redirect file, then the relevant variables here will have values such as:
publish_location = "/var/lib/pulp/published/docker/v1/app/<repo-id>.json"
publish_dir_parent = "/var/lib/pulp/published/docker/v1/app"
tmp_link_name = "/var/lib/pulp/published/docker/v1/app/<timestamp>"
If there are two publishes triggered with the same timestamp, then tmp_link_name will be equal for both tasks, and the tasks will therefore both attempt to create a symlink at the same path.
I realize this sounds unlikely, but a crash due to this has happened on our installation.
This was observed on Pulp 2.8, but from review, all the relevant code seems unchanged in 2.14.
Use timestamp and repo_id in the temporary directory name
To avoid race condition when multiple repositories are published at the same time.
closes #3098 https://pulp.plan.io/issues/3098