Project

Profile

Help

Issue #2979

closed

Celery workers may deadlock when PULP_MAX_TASKS_PER_CHILD is used

Added by rmcgover over 6 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.14.0
Platform Release:
2.14.2
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 26
Quarter:

Description

If the PULP_MAX_TASKS_PER_CHILD option in /etc/default/pulp_workers is uncommented, celery workers are prone to deadlock when handling tasks. When this happens, all tasks assigned to the deadlocked worker will not be processed. The parent celery process still participates in the worker heartbeat, so the system doesn't automatically recover from this.

In this scenario, the celery workers will log the warning:

UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>

The problem seems to be exactly as mentioned on the linked page: "Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to inherent incompatibilities between fork(), threads, and locks. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring."

I was able to reproduce this on the Pulp development environment in 2.14-dev ( bd1f496bd99175db45e585418327fc9834b8ee53 )

Starting from a clean development environment, it could be reproduced by:

1. Adjust following config to use replica set and max tasks per child:

--- /etc/mongod.conf.old    2017-08-15 00:44:15.092104023 +0000
+++ /etc/mongod.conf    2017-08-15 00:49:29.418767574 +0000
@@ -112,7 +112,8 @@
 #operationProfiling:

 # replication Options - ReplSet settings
-#replication:
+replication:
+  replSetName: rs

 # sharding Options - Shard settings
 #sharding:

--- /etc/pulp/server.conf.old    2017-08-15 00:49:54.364820235 +0000
+++ /etc/pulp/server.conf    2017-08-15 00:50:02.265836914 +0000
@@ -50,7 +50,7 @@
 # seeds: localhost:27017
 # username:
 # password:
-# replica_set:
+replica_set: rs
 # ssl: false
 # ssl_keyfile:
 # ssl_certfile:

--- /etc/default/pulp_workers.old    2017-08-15 01:23:41.052240219 +0000
+++ /etc/default/pulp_workers    2017-08-15 01:23:47.120262349 +0000
@@ -9,4 +9,4 @@

 # To avoid memory leaks, Pulp can terminate and replace a worker after processing X tasks. If
 # left commented, process recycling is disabled. PULP_MAX_TASKS_PER_CHILD must be > 0.
-# PULP_MAX_TASKS_PER_CHILD=2
+PULP_MAX_TASKS_PER_CHILD=2

2. Initialize mongo replica set - in mongo shell:

 rs.initiate({"_id":"rs",members:[{_id:0,host:"localhost:27017"}]})

3. Make a couple of timing changes to code so the issue is more easily reproducible

(It should be possible to reproduce without this, but may require being very patient)

diff --git a/server/pulp/server/db/connection.py b/server/pulp/server/db/connection.py
index 9cf01e0..f38ad92 100644
--- a/server/pulp/server/db/connection.py
+++ b/server/pulp/server/db/connection.py
@@ -63,6 +63,9 @@ def initialize(name=None, seeds=None, max_pool_size=None, replica_set=None, max_
     try:
         connection_kwargs = {}

+        # more frequent heartbeat makes race easier to reproduce
+        connection_kwargs['heartbeatfrequencyms'] = 1000
+
         if name is None:
             name = config.config.get('database', 'name')

--- /usr/lib64/python2.7/site-packages/pymongo/pool.py.old    2017-08-15 01:16:41.625710547 +0000
+++ /usr/lib64/python2.7/site-packages/pymongo/pool.py    2017-08-15 01:17:14.421830156 +0000
@@ -16,6 +16,7 @@
 import os
 import socket
 import threading
+import time

 from bson import DEFAULT_CODEC_OPTIONS
 from bson.py3compat import itervalues
@@ -608,6 +609,8 @@
                 # set.pop() isn't atomic in Jython less than 2.7, see
                 # http://bugs.jython.org/issue1854
                 with self.lock:
+                    # simulate unfavorable scheduling while holding lock
+                    time.sleep(.1)
                     sock_info, from_pool = self.sockets.pop(), True
             except KeyError:
                 # Can raise ConnectionFailure or CertificateError.

4. Restart mongo and pulp services

5. Start watching logs for completed tasks, e.g.

journalctl -f | grep 'succeeded in'

6. Flood celery worker with many tasks

e.g. I used this at the shell:

enqueue(){ celery --app=pulp.server.async.app call --exchange=C.dq --routing-key=reserved_resource_worker-2@pulp2.dev pulp.server.async.tasks._release_resource '--args=["test"]'; }
while true; do for i in $(seq 1 5); do for j in $(seq 1 20); do enqueue & done; sleep 1; done; wait; done

7. Observe logs

Watch the logs for some time (I needed about 1 hour to reproduce it most recently).

If messages about tasks succeeding stop appearing in logs, it may mean the deadlock has occurred.

When a worker has deadlocked like this then the following symptoms can be observed:

pstree looks like:

  |-celery,20139 /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/v
ar/run/pulp/reserved_resource_worker-2.pid --maxtasksperchild=2
  |   |-celery,13916 /usr/bin/celery worker -n reserved_resource_worker-2@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-2.pid --maxtasksperchild=2
  |   |   `-{celery},13917
  |   |-{celery},20315
  |   |-{celery},20316
  |   `-{celery},20421

strace looks like:

[vagrant@pulp2 ~]$ sudo strace -p 13916 -f
strace: Process 13916 attached with 2 threads
[pid 13917] futex(0x7f89478f0bf0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff <unfinished ...>
[pid 13916] futex(0x7f89478f0bf0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff

(threads waiting on a lock which will never be unlocked)

python backtraces (collected with gdb) look like:

Thread 2 (Thread 0x7f8920910700 (LWP 13917)):
Traceback (most recent call first):
  Waiting for the GIL
  File "/usr/lib64/python2.7/site-packages/pymongo/topology.py", line 338, in update_pool
    with self._lock:
  File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 1075, in _process_periodic_tasks
    self._topology.update_pool()
  File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 406, in target
    MongoClient._process_periodic_tasks(client)
  File "/usr/lib64/python2.7/site-packages/pymongo/periodic_executor.py", line 93, in _run
    if not self._target():
  File "/usr/lib64/python2.7/threading.py", line 757, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/threading.py", line 804, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 777, in __bootstrap
    self.__bootstrap_inner()

Thread 1 (Thread 0x7f8945632700 (LWP 13916)):
Traceback (most recent call first):
  Waiting for the GIL
  File "/usr/lib64/python2.7/site-packages/pymongo/topology.py", line 137, in open
    with self._lock:
  File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 757, in _get_topology
    self._topology.open()
  File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 819, in _send_message_with_response
    topology = self._get_topology()
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 850, in __send_message
    **kwargs)
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1012, in _refresh
    self.__read_concern))
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1090, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 1407, in next
    raw_doc = self._cursor.next()
  File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 289, in _release_resource
    for running_task in running_task_qs:
  File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/vagrant/devel/pulp/server/pulp/server/async/tasks.py", line 107, in __call__
    return super(PulpTask, self).__call__(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 349, in _fast_trace_task
    return _tasks[task].__trace__(uuid, args, kwargs, request)[0]
  File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 367, in workloop
    result = (True, prepare_result(fun(*args, **kwargs)))
  File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 295, in run
    sys.exit(self.workloop(pid=pid))
  File "/usr/lib64/python2.7/site-packages/billiard/process.py", line 292, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/site-packages/billiard/forking.py", line 105, in __init__
    code = process_obj._bootstrap()
  File "/usr/lib64/python2.7/site-packages/billiard/process.py", line 137, in start
    self._popen = Popen(self)
  File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1068, in _create_worker_process
    w.start()
  File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1237, in _repopulate_pool
    self._create_worker_process(self._avail_index())
  File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1252, in _maintain_pool
    self._repopulate_pool(joined)
  File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1260, in maintain_pool
    self._maintain_pool()
  File "/usr/lib/python2.7/site-packages/celery/concurrency/asynpool.py", line 415, in _event_process_exit
    self.maintain_pool()
  File "/usr/lib/python2.7/site-packages/kombu/async/hub.py", line 340, in create_loop
    cb(*cbargs)
  File "/usr/lib/python2.7/site-packages/celery/worker/loops.py", line 76, in asynloop
    next(loop)
  File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 838, in start
    c.loop(*c.loop_args())
  File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
    step.start(parent)
  File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 279, in start
    blueprint.start(self)
  File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 374, in start
    return self.obj.start()
  File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
    step.start(parent)
  File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
    self.blueprint.start(self)
  File "/usr/lib/python2.7/site-packages/celery/bin/worker.py", line 212, in run
    state_db=self.node_format(state_db, hostname), **kwargs
  File "/usr/lib/python2.7/site-packages/celery/bin/base.py", line 274, in __call__
    ret = self.run(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/celery/bin/worker.py", line 179, in run_from_argv
    return self(*args, **options)
  File "/usr/lib/python2.7/site-packages/celery/bin/celery.py", line 694, in execute
    ).run_from_argv(self.prog_name, argv[1:], command=argv[0])
  File "/usr/lib/python2.7/site-packages/celery/bin/celery.py", line 762, in handle_argv
    return self.execute(command, argv)
  File "/usr/lib/python2.7/site-packages/celery/bin/base.py", line 311, in execute_from_commandline
    return self.handle_argv(self.prog_name, argv[1:])
  File "/usr/lib/python2.7/site-packages/celery/bin/celery.py", line 770, in execute_from_commandline
    super(CeleryCommand, self).execute_from_commandline(argv)))
  File "/usr/lib/python2.7/site-packages/celery/bin/celery.py", line 81, in main
    cmd.execute_from_commandline(argv)
  File "/usr/lib/python2.7/site-packages/celery/__main__.py", line 30, in main
    main()
  File "/usr/bin/celery", line 9, in <module>
    load_entry_point('celery', 'console_scripts', 'celery')()

Related issues

Related to Pulp - Task #3048: Ensure that qpid connections aren't being affected by celery's forking mechanismCLOSED - WONTFIX

Actions
Actions #1

Updated by rmcgover over 6 years ago

  • Description updated (diff)
Actions #3

Updated by rmcgover over 6 years ago

  • Description updated (diff)
Actions #4

Updated by bmbouter over 6 years ago

  • Severity changed from 2. Medium to 3. High

To clarify, this is not unique to replica sets. Pymongo spawns those threads when used with or without replica sets.

The probability of a deadlock occurring on fork is low, but it always there. Every time Celery forks it can occur. By enabling PULP_MAX_TASKS_PER_CHILD, instead of celery forking exactly once on startup, now it will fork often depending on the value of PULP_MAX_TASKS_PER_CHILD. So this is new in the sense that the PULP_MAX_TASKS_PER_CHILD feature makes encountering a deadlock significantly more likely. In practice it was likely enough for a large Pulp user (the OP) to report it, so it's more than theoretical.

The severity should be high because when it occurs your entire system is deadlocked which from a production usage is a highly severe problem.

In terms of solving the problem, we should consider exploring running Celery without forking altogether. Now that all of our supported distros we can derive similar benefits if we have systemd perform re-launching after the process exits, perhaps due to a segfault.

Actions #5

Updated by bmbouter over 6 years ago

Specifically to configure celery not to fork, we could run celery in "solo" mode: http://docs.celeryproject.org/en/latest/reference/celery.bin.worker.html?highlight=solo#cmdoption-celery-worker-p

Then systemd would need to be configured to auto-relaunch on shutdown.

Actions #6

Updated by ttereshc over 6 years ago

  • Sprint/Milestone set to 43
  • Triaged changed from No to Yes
Actions #7

Updated by jortel@redhat.com over 6 years ago

  • Sprint/Milestone changed from 43 to 44
Actions #8

Updated by bmbouter over 6 years ago

We met to discuss some of our options. There are two new ones proposed:

1) Register atexit handlers to close the database connection. This should release all of the shared mutex based resources allowing the deadlock to be avoided.

2) Adjust the usage of pymongo's db connection in the parent process to disconnect after it's done talking to the DB instead of holding the connection open. The worker heartbeats would be affected by this so there would be a lot of disconnecting/reconnecting occurring. It is one possible mitigation though.

NOTE1: that neither of ^ approaches will help us avoid similar problems with Qpid or psycopg2.

NOTE2: that the issue w/ solo mode and cancellation has not yet been verified. Perhaps cancellation w/ solo mode will work. We're not sure.

Actions #9

Updated by daviddavis over 6 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
Actions #11

Updated by daviddavis over 6 years ago

Doing some more research, it looks like celery closes db connections on worker init when using django:

https://github.com/celery/celery/blob/master/celery/fixups/django.py#L135-L155

I think we can apply this same logic to mongo and use the signal "worker-process-init" to close connections to mongodb when a child starts.

Also, I think the above code indicates that we're ok in regards to Pulp 3.

Actions #12

Updated by mhrivnak over 6 years ago

  • Sprint/Milestone changed from 44 to 45
Actions #13

Updated by daviddavis over 6 years ago

  • Status changed from ASSIGNED to POST
Actions #14

Updated by daviddavis over 6 years ago

  • Related to Task #3048: Ensure that qpid connections aren't being affected by celery's forking mechanism added
Actions #16

Updated by daviddavis over 6 years ago

I'm submitting a fix that appears to solve the problem (at least from my testing). Using the reproducer outlined in the description, in three trials I was seeing deadlocking occurring at 20 min, 60 min, and around 100 min.

Using the fix, I ran the reproducer overnight (> 12 hours) without seeing any deadlocking twice.

Note that the warning message from Mongo still remains. From some investigation, it looks like mongoengine is storing data on the Document meta class that includes a pymongo Topology object which has a process id. Even though the connection is being closed and a new one opened, this id value isn't being updated so we still see the warning.

Added by daviddavis over 6 years ago

Revision 0fdd2515 | View on GitHub

Reconnecting to mongodb after celery fork

fixes #2979 https://pulp.plan.io/issues/2979

Actions #17

Updated by daviddavis over 6 years ago

  • Status changed from POST to MODIFIED
Actions #18

Updated by daviddavis over 6 years ago

  • Subject changed from Celery workers may deadlock when PULP_MAX_TASKS_PER_CHILD and mongo replica set are used to Celery workers may deadlock when PULP_MAX_TASKS_PER_CHILD is used
  • Description updated (diff)
Actions #19

Updated by daviddavis over 6 years ago

This issue isn't strictly limited to instances of pulp running with mongo replica sets. Updated the title and description to that end.

Actions #20

Updated by pcreech over 6 years ago

  • Platform Release set to 2.14.2
Actions #21

Updated by pcreech over 6 years ago

  • Status changed from MODIFIED to 5
Actions #22

Updated by pthomas@redhat.com over 6 years ago

Verified

# Update or uncomment PULP_MAX_TASKS_PER_CHILD=2 in /etc/default/pulp_workers
# Make a change to /usr/lib64/python2.7/site-packages/pymongo/pool.py to add an extra sleep (see the comments). Remove pool.pyc and pool.pyo (if they exist)
# Restart pulp
# In one window, run:
>  enqueue(){ celery --app=pulp.server.async.app call --exchange=C.dq --routing-key=reserved_resource_worker-2@pulp2.dev pulp.server.async.tasks._release_resource '--args=["test"]'; }; while true; do for i in $(seq 1 5); do for j in $(seq 1 20); do enqueue & done; sleep 1; done; wait; done
# In another window, run: journalctl -f | grep 'succeeded in'
# Wait up to two hours. If in If in the second window you see no more output, then the process has deadlocked.
Actions #23

Updated by pcreech over 6 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #24

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 26
Actions #25

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (45)
Actions #27

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF