Actions
Issue #2331
closedThe "unsafe_autoretry" feature does not retry with mongoengine.QuerySet codepaths
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
OS:
CentOS 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 11
Quarter:
Description
If on a running pulp deployment you stop mongod, pulp_celerybeat will eventually die and produce the following log output, including a traceback.
Instead, it should handle the connection failure gracefully and recover when mongod becomes available again.
This behavior has been seen "in the wild" when a full disk caused mongod to die.
Oct 11 14:04:09 localhost.localdomain pulp[12792]: pulp.server.async.worker_watcher:DEBUG: 'worker-heartbeat' sent at time 2016-10-11 19:04:09.617800 from resource_manager@localhost.localdomain, received at time: 2016-10-11 18:04:09.635230
Oct 11 14:04:39 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:ERROR: localhost:27017: [Errno 111] Connection refused
Oct 11 14:04:47 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:DEBUG: Checking if pulp_workers, pulp_celerybeat, or pulp_resource_manager processes are missing for more than 300 seconds
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.transport.qpid:DEBUG: Attempting to connect to qpid with SASL mechanism ANONYMOUS
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Oct 11 14:04:49 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:INFO: Event Monitor Starting
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.transport.qpid:DEBUG: Attempting to connect to qpid with SASL mechanism ANONYMOUS
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.mixins:INFO: Connected to qpid://localhost:5672//
Oct 11 14:04:49 localhost.localdomain pulp[12792]: pulp.server.async.worker_watcher:DEBUG: 'worker-heartbeat' sent at time 2016-10-11 19:04:49.940497 from resource_manager@localhost.localdomain, received at time: 2016-10-11 18:04:49.961244
Oct 11 14:05:17 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:ERROR: localhost:27017: [Errno 111] Connection refused
Oct 11 14:05:18 localhost.localdomain pulp[12792]: pulp.server.async.worker_watcher:DEBUG: 'scheduler-event' sent at time 2016-10-11 18:05:18.236002 from scheduler@localhost.localdomain, received at time: 2016-10-11 18:05:18.236004
Oct 11 14:05:20 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:ERROR: localhost:27017: [Errno 111] Connection refused
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.transport.qpid:DEBUG: Attempting to connect to qpid with SASL mechanism ANONYMOUS
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Oct 11 14:05:30 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:INFO: Event Monitor Starting
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.transport.qpid:DEBUG: Attempting to connect to qpid with SASL mechanism ANONYMOUS
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.mixins:INFO: Connected to qpid://localhost:5672//
Oct 11 14:05:30 localhost.localdomain pulp[12792]: pulp.server.async.worker_watcher:DEBUG: 'worker-heartbeat' sent at time 2016-10-11 19:05:30.561665 from reserved_resource_worker-0@localhost.localdomain, received at time: 2016-10-11 18:05:30.579888
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) beat raised exception <class 'pymongo.errors.ServerSelectionTimeoutError'>: ServerSelectionTimeoutError('localhost:27017: [Errno 111] Connection refused',)
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) Traceback (most recent call last):
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib/python2.7/site-packages/celery/apps/beat.py", line 112, in start_scheduler
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) beat.start()
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib/python2.7/site-packages/celery/beat.py", line 462, in start
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) interval = self.scheduler.tick()
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 252, in tick
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) worker_watcher.handle_worker_heartbeat(scheduler_event)
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib/python2.7/site-packages/pulp/server/async/worker_watcher.py", line 75, in handle_worker_heartbeat
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) worker = Worker.objects(name=event_info['worker_name']).first()
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 264, in first
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) result = queryset[0]
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 161, in __getitem__
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) return queryset._document._from_son(queryset._cursor[key],
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 565, in __getitem__
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) for doc in clone:
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1097, in next
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) if len(self.__data) or self._refresh():
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1019, in _refresh
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) self.__read_concern))
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 850, in __send_message
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) **kwargs)
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 777, in _send_message_with_response
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) server = topology.select_server(selector)
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib64/python2.7/site-packages/pymongo/topology.py", line 141, in select_server
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) address))
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) File "/usr/lib64/python2.7/site-packages/pymongo/topology.py", line 117, in select_servers
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) self._error_message(selector))
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused
Related issues
Actions