Project

Profile

Help

Issue #3098

closed

Docker publish may fail with "OSError: [Errno 17] File exists" if two publishes triggered at same time

Added by rmcgover over 6 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.19.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 53
Quarter:

Description

If two docker_web_distributor_name_cli publishes are triggered on the same Pulp installation for two different repos at the same time, where "same time" means that str(time.time()) evaluates to the same for both publish tasks (i.e. tasks are scheduled within same 100th of a second), then the publishes race with each other for access to the same path, which can lead to a failed publish.

A publish may fail with a backtrace like this:

 Task pulp.server.managers.repo.publish.publish[28381da7-812d-436f-ad5b-b23785b4928b] raised unexpected: OSError(17, 'File exists')
 Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
     R = retval = fun(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 473, in __call__
     return super(Task, self).__call__(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 103, in __call__
     return super(PulpTask, self).__call__(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
     return self.run(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 968, in publish
     result = _do_publish(repo_obj, dist_id, dist_inst, transfer_repo, conduit, call_config)
   File "/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py", line 1020, in _do_publish
     publish_report = publish_repo(transfer_repo, conduit, call_config)
   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 658, in wrap_f
     return f(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/pulp_docker/plugins/distributors/distributor_web.py", line 123, in publish_repo
     return self._publisher.publish()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 697, in publish
     return self.process_lifecycle()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 562, in process_lifecycle
     super(PluginStep, self).process_lifecycle()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 159, in process_lifecycle
     step.process()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 249, in process
     self._process_block()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 293, in _process_block
     self.process_main()
   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 910, in process_main
     os.symlink(timestamp_master_location, tmp_link_name)
 OSError: [Errno 17] File exists

This would be difficult to reproduce, but can be revealed by reading the code for the atomic directory publish step.

Here's some of the code from server/pulp/plugins/util/publish_step.py :

            # Create the parent directory of the published repository tree, if needed
            publish_dir_parent = os.path.dirname(publish_location)
            if not os.path.exists(publish_dir_parent):
                misc.mkdir(publish_dir_parent, 0750)

            if not self.only_publish_directory_contents:
                # Create a temporary symlink in the parent of the published directory tree
                tmp_link_name = os.path.join(publish_dir_parent, self.parent.timestamp)
                os.symlink(timestamp_master_location, tmp_link_name)

The last statement is the one which raised "File exists".

When publishing the docker v1 redirect file, then the relevant variables here will have values such as:

publish_location = "/var/lib/pulp/published/docker/v1/app/<repo-id>.json"
publish_dir_parent = "/var/lib/pulp/published/docker/v1/app"
tmp_link_name = "/var/lib/pulp/published/docker/v1/app/<timestamp>"

If there are two publishes triggered with the same timestamp, then tmp_link_name will be equal for both tasks, and the tasks will therefore both attempt to create a symlink at the same path.

I realize this sounds unlikely, but a crash due to this has happened on our installation.

This was observed on Pulp 2.8, but from review, all the relevant code seems unchanged in 2.14.

Actions #2

Updated by dalley over 6 years ago

  • Priority changed from Normal to High
  • Triaged changed from No to Yes
Actions #3

Updated by mhrivnak over 6 years ago

  • Project changed from Docker Support to Pulp
  • Sprint/Milestone set to 47
Actions #4

Updated by bmbouter over 6 years ago

I don't expect Pulp to silence this error. If Pulp is publishing a file and that file is already there, I don't think we know its safe to blindly overwrite it or to not write it and fail to write silently.

The real issue here is that I don't expect two publishes to be running concurrently that publish to an area of the filesystem that is shared.

Actions #5

Updated by rchan over 6 years ago

  • Sprint/Milestone changed from 47 to 48
Actions #8

Updated by rchan over 6 years ago

  • Sprint/Milestone changed from 48 to 52
Actions #9

Updated by rchan about 6 years ago

  • Sprint/Milestone changed from 52 to 53
Actions #10

Updated by twaugh about 6 years ago

Perhaps the reason for this was an implicit publish (after a sync when auto-publish=true) being performed at the same time as an explicit publish.

Actions #11

Updated by rmcgover about 6 years ago

I think you mean two publishes of a single repo. Multiple publishes for same repo won't trigger this bug since the reserved_resources mechanism prevents them from scheduling at the same time. The bug has to be triggered by concurrent publishes of multiple repos.

Actions #12

Updated by jortel@redhat.com about 6 years ago

  • Sprint/Milestone changed from 53 to 54
Actions #13

Updated by rchan about 6 years ago

  • Sprint/Milestone changed from 54 to 56
Actions #14

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 33
Actions #15

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (56)
Actions #16

Updated by jortel@redhat.com about 6 years ago

  • Sprint changed from Sprint 33 to Sprint 34
Actions #17

Updated by ttereshc about 6 years ago

  • Sprint changed from Sprint 34 to Sprint 33
Actions #18

Updated by jortel@redhat.com about 6 years ago

  • Sprint Candidate changed from No to Yes
Actions #19

Updated by jortel@redhat.com about 6 years ago

  • Sprint deleted (Sprint 33)
Actions #20

Updated by amacdona@redhat.com over 5 years ago

  • Sprint Candidate changed from Yes to No
Actions #21

Updated by dkliban@redhat.com about 5 years ago

  • Sprint set to Sprint 50
Actions #22

Updated by rchan almost 5 years ago

  • Sprint changed from Sprint 50 to Sprint 51
Actions #23

Updated by bmbouter almost 5 years ago

  • Tags Pulp 2 added
Actions #24

Updated by rchan almost 5 years ago

  • Sprint changed from Sprint 51 to Sprint 52
Actions #25

Updated by dkliban@redhat.com almost 5 years ago

  • Platform Release set to 2.19.1
Actions #26

Updated by ttereshc almost 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc
Actions #27

Updated by dkliban@redhat.com almost 5 years ago

  • Sprint/Milestone set to 2.19.1

Added by ttereshc almost 5 years ago

Revision 62917bdf | View on GitHub

Use timestamp and repo_id in the temporary directory name

To avoid race condition when multiple repositories are published at the same time.

closes #3098 https://pulp.plan.io/issues/3098

Actions #28

Updated by ttereshc almost 5 years ago

  • Status changed from ASSIGNED to POST
Actions #29

Updated by rchan almost 5 years ago

  • Sprint changed from Sprint 52 to Sprint 53
Actions #30

Updated by ttereshc almost 5 years ago

  • Status changed from POST to MODIFIED

Added by ttereshc almost 5 years ago

Revision a78cecee | View on GitHub

Use timestamp and repo_id in the temporary directory name

To avoid race condition when multiple repositories are published at the same time.

closes #3098 https://pulp.plan.io/issues/3098

(cherry picked from commit 62917bdfa2f3918693e8309bd049b7a86a22d03a)

Actions #32

Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from MODIFIED to 5
Actions #33

Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE

Also available in: Atom PDF