Project

Profile

Help

Issue #1801

closed

Pulp celery_beat and resource_manager are running, but logs say they are not running

Added by bmbouter almost 8 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.8.0
Platform Release:
2.8.3
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 1
Quarter:

Description

After some unknown amount of time Pulp infrastructure processes appear to die and we receive these messages in the journal / logs:

pulp.server.async.scheduler:ERROR: There are 0 pulp_resource_manager processes running. Pulp will not operate correctly without at least one pulp_resource_mananger process running.
pulp.server.async.scheduler:ERROR: There are 0 pulp_celerybeat processes running. Pulp will not operate correctly without at least one pulp_celerybeat process running.

A restart resolves the issue but restarting shouldn't be required for normal operation

Actions #1

Updated by mhrivnak almost 8 years ago

  • Sprint/Milestone set to 19
Actions #2

Updated by bmbouter almost 8 years ago

I reproduced this in my environment, and pulp_celerybeat appears to be deadlocking in the kombu transport. A gdb trace of a deadlocked pulp_celerybeat process shows the thread which processes event callbacks of incoming heartbeat messages is halted at this line. See the GDB py-list output:

Thread 5 (Thread 0x7f737da33700 (LWP 6551)):
1433                    'The Python package "qpid.messaging" is missing. Install it '
1434                    'with your package manager. You can also try `pip install '
1435                    'qpid-python`.')
1436    
1437        def _qpid_message_ready_handler(self, session):
>1438            os.write(self._w, '0')
1439    
1440        def _qpid_async_exception_notify_handler(self, obj_with_exception, exc):
1441            os.write(self._w, 'e')
1442    
1443        def on_readable(self, connection, loop):

That line corresponds with this line in the kombu code: https://github.com/celery/kombu/blob/93f6606e0a758c9cffb9b3c2ef6a239ed7027309/kombu/transport/qpid.py#L1474

That os.write call is the point of deadlock. I don't yet understand why it is deadlocking, but it is likely a thread safety issue around that pipe. The investigation continues.

Actions #3

Updated by bmbouter almost 8 years ago

The root cause is identified, and I filed it in the Kombu upstream issue tracker. https://github.com/celery/kombu/issues/577

I'll be fixing it upstream and then we'll cherry pick that commit as a patch to the version of python-kombu that Pulp carries along with the version in Rawhide.

Actions #4

Updated by rbarlow almost 8 years ago

On Thursday, March 31, 2016 9:17:01 PM EDT you wrote:

I'll be fixing it upstream and then we'll cherry pick that commit as a

patch

to the version of python-kombu that Pulp carries along with the version

in

Rawhide.

Consider trying to get the patch into Fedora 24 as well so we don't have
this problem there. Thanks!

Actions #5

Updated by bmbouter almost 8 years ago

rbarlow wrote:

On Thursday, March 31, 2016 9:17:01 PM EDT you wrote:

I'll be fixing it upstream and then we'll cherry pick that commit as a

patch

to the version of python-kombu that Pulp carries along with the version

in

Rawhide.

Consider trying to get the patch into Fedora 24 as well so we don't have
this problem there. Thanks!

Oh yes I will do this. I forgot Fedora 24 had branched. I'll submit the update to both Rawhide and F24.

Actions #6

Updated by mhrivnak almost 8 years ago

  • Triaged changed from No to Yes
Actions #7

Updated by bmbouter almost 8 years ago

This commit needs to be cherry picked into the version we carry https://github.com/celery/kombu/commit/277309f47a713a31885248b78df45e41d8d5e490.

This regression was introduced with kombu 3.0.33. This fix needs to be on pulp-dev and newer branches. No existing 2.7 users use 3.0.33 so we can fix it in 2.7-dev and not have to make a new 2.7 release to make the fix available to existing users. The fix will be included with 2.8.2 from the merge forward to master.

Actions #8

Updated by bmbouter almost 8 years ago

  • Status changed from ASSIGNED to POST

Added by bmbouter almost 8 years ago

Revision c54adba5 | View on GitHub

Adds patch to python-kombu to fix pulp_celerybeat deadlock

closes #1801 https://pulp.plan.io/issues/1801

Added by bmbouter almost 8 years ago

Revision c54adba5 | View on GitHub

Adds patch to python-kombu to fix pulp_celerybeat deadlock

closes #1801 https://pulp.plan.io/issues/1801

Actions #9

Updated by dgregor@redhat.com almost 8 years ago

  • Version set to 2.8.0
Actions #11

Updated by bmbouter almost 8 years ago

  • Private changed from No to Yes
Actions #12

Updated by bmbouter almost 8 years ago

  • Private changed from Yes to No
Actions #14

Updated by pthomas@redhat.com almost 8 years ago

Before updating kombu


[root@ibm-x3550m3-12 ~]# rpm -qa |grep kombu
python-kombu-3.0.33-4.pulp.el7.noarch
[root@ibm-x3550m3-12 ~]# 

[root@ibm-x3550m3-12 ~]# sudo qpid-stat  -q |grep  celeryev
  celeryev.223a4cfb-e1bd-4f6e-b146-0198d295e33a                                         Y              20.4k  86.0k  65.5k   18.0m  75.5m    57.6m        1     2
[root@ibm-x3550m3-12 ~]# journalctl -f -l
-- Logs begin at Mon 2016-04-04 21:51:34 CEST. --
Apr 05 13:55:02 ibm-x3550m3-12.lab.eng.brq.redhat.com pulp[32000]: pulp.server.async.scheduler:ERROR: There are 0 pulp_resource_manager processes running. Pulp will not operate correctly without at least one pulp_resource_mananger process running.
Apr 05 13:55:02 ibm-x3550m3-12.lab.eng.brq.redhat.com pulp[32000]: pulp.server.async.scheduler:ERROR: There are 0 pulp_celerybeat processes running. Pulp will not operate correctly without at least one pulp_celerybeat process running.
Actions #15

Updated by bmbouter almost 8 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #16

Updated by pthomas@redhat.com almost 8 years ago

Verified that msgIn & msgOut are the same and msgOut doesn't stop after 65k

[root@pulp-el7 ~]# rpm -qa |grep kombu
python-kombu-3.0.33-5.pulp.el7.noarch
[root@pulp-el7 ~]# sudo qpid-stat  -q |grep  celeryev
Queues
  queue                                                                            dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  =========================================================================================================================================================
 celeryev.9631492f-a29e-4bdc-b843-23911d505f2d                                         Y                 0   145k   145k      0    128m     128m        1     2


[root@pulp-el6 ~]# rpm -qa |grep kombu
python-kombu-3.0.33-5.pulp.el6.noarch
[root@pulp-el6 ~]# 

Queues
  queue                                                                            dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  =========================================================================================================================================================
 celeryev.0caa15b8-8829-441f-8ed2-231cd34a94dd                                                   Y                 0   156k   156k      0    142m     142m        1     2
Actions #17

Updated by bmbouter almost 8 years ago

The patch has been applied in rawhide and is currently available.
I've submitted an update to F24 also here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-ec038bbf19

Actions #18

Updated by semyers almost 8 years ago

  • Platform Release changed from 2.8.2 to 2.8.3
Actions #26

Updated by semyers almost 8 years ago

  • Status changed from MODIFIED to 5
Actions #28

Updated by pthomas@redhat.com almost 8 years ago

  • Status changed from 5 to 6
Actions #29

Updated by semyers almost 8 years ago

  • Status changed from 6 to CLOSED - CURRENTRELEASE
Actions #31

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 1
Actions #32

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (19)
Actions #33

Updated by bmbouter almost 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF