Project

Profile

Help

Issue #3098

Docker publish may fail with "OSError: [Errno 17] File exists" if two publishes triggered at same time

Added by rmcgover about 2 years ago. Updated 6 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
2.19.1
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 53

Description

If two docker_web_distributor_name_cli publishes are triggered on the same Pulp installation for two different repos at the same time, where "same time" means that str(time.time()) evaluates to the same for both publish tasks (i.e. tasks are scheduled within same 100th of a second), then the publishes race with each other for access to the same path, which can lead to a failed publish.

A publish may fail with a backtrace like this:

 Task pulp.server.managers.repo.publish.publish[28381da7-812d-436f-ad5b-b23785b4928b] raised unexpected: OSError(17, 'File exists')
 Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
     R = retval = fun(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 473, in __call__
     return super(Task, self).__call__(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 103, in __call__
     return super(PulpTask, self).__call__(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
     return self.run(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 968, in publish
     result = _do_publish(repo_obj, dist_id, dist_inst, transfer_repo, conduit, call_config)
   File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 1020, in _do_publish
     publish_report = publish_repo(transfer_repo, conduit, call_config)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 658, in wrap_f
     return f(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp_docker/plugins/distributors/distributor_web.py", line 123, in publish_repo
     return self._publisher.publish()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 697, in publish
     return self.process_lifecycle()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 562, in process_lifecycle
     super(PluginStep, self).process_lifecycle()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 159, in process_lifecycle
     step.process()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 249, in process
     self._process_block()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 293, in _process_block
     self.process_main()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 910, in process_main
     os.symlink(timestamp_master_location, tmp_link_name)
 OSError: [Errno 17] File exists

This would be difficult to reproduce, but can be revealed by reading the code for the atomic directory publish step.

Here's some of the code from server/pulp/plugins/util/publish_step.py :

            # Create the parent directory of the published repository tree, if needed
            publish_dir_parent = os.path.dirname(publish_location)
            if not os.path.exists(publish_dir_parent):
                misc.mkdir(publish_dir_parent, 0750)

            if not self.only_publish_directory_contents:
                # Create a temporary symlink in the parent of the published directory tree
                tmp_link_name = os.path.join(publish_dir_parent, self.parent.timestamp)
                os.symlink(timestamp_master_location, tmp_link_name)

The last statement is the one which raised "File exists".

When publishing the docker v1 redirect file, then the relevant variables here will have values such as:

publish_location = "/var/lib/pulp/published/docker/v1/app/<repo-id>.json" 
publish_dir_parent = "/var/lib/pulp/published/docker/v1/app" 
tmp_link_name = "/var/lib/pulp/published/docker/v1/app/<timestamp>" 

If there are two publishes triggered with the same timestamp, then tmp_link_name will be equal for both tasks, and the tasks will therefore both attempt to create a symlink at the same path.

I realize this sounds unlikely, but a crash due to this has happened on our installation.

This was observed on Pulp 2.8, but from review, all the relevant code seems unchanged in 2.14.

Associated revisions

Revision 62917bdf View on GitHub
Added by ttereshc 7 months ago

Use timestamp and repo_id in the temporary directory name

To avoid race condition when multiple repositories are published
at the same time.

closes #3098
https://pulp.plan.io/issues/3098

Revision a78cecee View on GitHub
Added by ttereshc 6 months ago

Use timestamp and repo_id in the temporary directory name

To avoid race condition when multiple repositories are published
at the same time.

closes #3098
https://pulp.plan.io/issues/3098

(cherry picked from commit 62917bdfa2f3918693e8309bd049b7a86a22d03a)

History

#2 Updated by dalley about 2 years ago

  • Priority changed from Normal to High
  • Triaged changed from No to Yes

#3 Updated by mhrivnak about 2 years ago

  • Project changed from Docker Support to Pulp
  • Sprint/Milestone set to 47

#4 Updated by bmbouter about 2 years ago

I don't expect Pulp to silence this error. If Pulp is publishing a file and that file is already there, I don't think we know its safe to blindly overwrite it or to not write it and fail to write silently.

The real issue here is that I don't expect two publishes to be running concurrently that publish to an area of the filesystem that is shared.

#5 Updated by rchan almost 2 years ago

  • Sprint/Milestone changed from 47 to 48

#8 Updated by rchan almost 2 years ago

  • Sprint/Milestone changed from 48 to 52

#9 Updated by rchan almost 2 years ago

  • Sprint/Milestone changed from 52 to 53

#10 Updated by twaugh almost 2 years ago

Perhaps the reason for this was an implicit publish (after a sync when auto-publish=true) being performed at the same time as an explicit publish.

#11 Updated by rmcgover almost 2 years ago

I think you mean two publishes of a single repo. Multiple publishes for same repo won't trigger this bug since the reserved_resources mechanism prevents them from scheduling at the same time. The bug has to be triggered by concurrent publishes of multiple repos.

#12 Updated by jortel@redhat.com almost 2 years ago

  • Sprint/Milestone changed from 53 to 54

#13 Updated by rchan over 1 year ago

  • Sprint/Milestone changed from 54 to 56

#14 Updated by bmbouter over 1 year ago

  • Sprint set to Sprint 33

#15 Updated by bmbouter over 1 year ago

  • Sprint/Milestone deleted (56)

#16 Updated by jortel@redhat.com over 1 year ago

  • Sprint changed from Sprint 33 to Sprint 34

#17 Updated by ttereshc over 1 year ago

  • Sprint changed from Sprint 34 to Sprint 33

#18 Updated by jortel@redhat.com over 1 year ago

  • Sprint Candidate changed from No to Yes

#19 Updated by jortel@redhat.com over 1 year ago

  • Sprint deleted (Sprint 33)

#20 Updated by amacdona@redhat.com about 1 year ago

  • Sprint Candidate changed from Yes to No

#21 Updated by dkliban@redhat.com 8 months ago

  • Sprint set to Sprint 50

#22 Updated by rchan 8 months ago

  • Sprint changed from Sprint 50 to Sprint 51

#23 Updated by bmbouter 7 months ago

  • Tags Pulp 2 added

#24 Updated by rchan 7 months ago

  • Sprint changed from Sprint 51 to Sprint 52

#25 Updated by dkliban@redhat.com 7 months ago

  • Platform Release set to 2.19.1

#26 Updated by ttereshc 7 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc

#27 Updated by dkliban@redhat.com 7 months ago

  • Sprint/Milestone set to 2.19.1

#28 Updated by ttereshc 7 months ago

  • Status changed from ASSIGNED to POST

#29 Updated by rchan 6 months ago

  • Sprint changed from Sprint 52 to Sprint 53

#30 Updated by ttereshc 6 months ago

  • Status changed from POST to MODIFIED

#32 Updated by dkliban@redhat.com 6 months ago

  • Status changed from MODIFIED to ON_QA

#33 Updated by dkliban@redhat.com 6 months ago

  • Status changed from ON_QA to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF