Issue #3098
closedDocker publish may fail with "OSError: [Errno 17] File exists" if two publishes triggered at same time
Description
If two docker_web_distributor_name_cli publishes are triggered on the same Pulp installation for two different repos at the same time, where "same time" means that str(time.time()) evaluates to the same for both publish tasks (i.e. tasks are scheduled within same 100th of a second), then the publishes race with each other for access to the same path, which can lead to a failed publish.
A publish may fail with a backtrace like this:
Task pulp.server.managers.repo.publish.publish[28381da7-812d-436f-ad5b-b23785b4928b] raised unexpected: OSError(17, 'File exists')
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 473, in __call__
return super(Task, self).__call__(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 103, in __call__
return super(PulpTask, self).__call__(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 968, in publish
result = _do_publish(repo_obj, dist_id, dist_inst, transfer_repo, conduit, call_config)
File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 1020, in _do_publish
publish_report = publish_repo(transfer_repo, conduit, call_config)
File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 658, in wrap_f
return f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pulp_docker/plugins/distributors/distributor_web.py", line 123, in publish_repo
return self._publisher.publish()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 697, in publish
return self.process_lifecycle()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 562, in process_lifecycle
super(PluginStep, self).process_lifecycle()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 159, in process_lifecycle
step.process()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 249, in process
self._process_block()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 293, in _process_block
self.process_main()
File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 910, in process_main
os.symlink(timestamp_master_location, tmp_link_name)
OSError: [Errno 17] File exists
This would be difficult to reproduce, but can be revealed by reading the code for the atomic directory publish step.
Here's some of the code from server/pulp/plugins/util/publish_step.py :
# Create the parent directory of the published repository tree, if needed
publish_dir_parent = os.path.dirname(publish_location)
if not os.path.exists(publish_dir_parent):
misc.mkdir(publish_dir_parent, 0750)
if not self.only_publish_directory_contents:
# Create a temporary symlink in the parent of the published directory tree
tmp_link_name = os.path.join(publish_dir_parent, self.parent.timestamp)
os.symlink(timestamp_master_location, tmp_link_name)
The last statement is the one which raised "File exists".
When publishing the docker v1 redirect file, then the relevant variables here will have values such as:
publish_location = "/var/lib/pulp/published/docker/v1/app/<repo-id>.json"
publish_dir_parent = "/var/lib/pulp/published/docker/v1/app"
tmp_link_name = "/var/lib/pulp/published/docker/v1/app/<timestamp>"
If there are two publishes triggered with the same timestamp, then tmp_link_name will be equal for both tasks, and the tasks will therefore both attempt to create a symlink at the same path.
I realize this sounds unlikely, but a crash due to this has happened on our installation.
This was observed on Pulp 2.8, but from review, all the relevant code seems unchanged in 2.14.
Updated by dalley about 7 years ago
- Priority changed from Normal to High
- Triaged changed from No to Yes
Updated by mhrivnak about 7 years ago
- Project changed from Docker Support to Pulp
- Sprint/Milestone set to 47
Updated by bmbouter about 7 years ago
I don't expect Pulp to silence this error. If Pulp is publishing a file and that file is already there, I don't think we know its safe to blindly overwrite it or to not write it and fail to write silently.
The real issue here is that I don't expect two publishes to be running concurrently that publish to an area of the filesystem that is shared.
Updated by twaugh about 7 years ago
Perhaps the reason for this was an implicit publish (after a sync when auto-publish=true) being performed at the same time as an explicit publish.
Updated by rmcgover about 7 years ago
I think you mean two publishes of a single repo. Multiple publishes for same repo won't trigger this bug since the reserved_resources mechanism prevents them from scheduling at the same time. The bug has to be triggered by concurrent publishes of multiple repos.
Updated by jortel@redhat.com almost 7 years ago
- Sprint/Milestone changed from 53 to 54
Updated by jortel@redhat.com almost 7 years ago
- Sprint changed from Sprint 33 to Sprint 34
Updated by ttereshc almost 7 years ago
- Sprint changed from Sprint 34 to Sprint 33
Updated by jortel@redhat.com almost 7 years ago
- Sprint Candidate changed from No to Yes
Updated by amacdona@redhat.com over 6 years ago
- Sprint Candidate changed from Yes to No
Updated by ttereshc over 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ttereshc
Added by ttereshc over 5 years ago
Updated by ttereshc over 5 years ago
- Status changed from ASSIGNED to POST
Updated by ttereshc over 5 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp|62917bdfa2f3918693e8309bd049b7a86a22d03a.
Added by ttereshc over 5 years ago
Revision a78cecee | View on GitHub
Use timestamp and repo_id in the temporary directory name
To avoid race condition when multiple repositories are published at the same time.
closes #3098 https://pulp.plan.io/issues/3098
(cherry picked from commit 62917bdfa2f3918693e8309bd049b7a86a22d03a)
Updated by ttereshc over 5 years ago
Applied in changeset pulp|a78ceceee2811cf558247d2dd6dc73b5fa75c37f.
Updated by dkliban@redhat.com over 5 years ago
- Status changed from MODIFIED to 5
Updated by dkliban@redhat.com over 5 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Use timestamp and repo_id in the temporary directory name
To avoid race condition when multiple repositories are published at the same time.
closes #3098 https://pulp.plan.io/issues/3098