Project

Profile

Help

Issue #2331

closed

The "unsafe_autoretry" feature does not retry with mongoengine.QuerySet codepaths

Added by mhrivnak over 7 years ago. Updated almost 5 years ago.

Status:
CLOSED - WONTFIX
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
OS:
CentOS 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 11
Quarter:

Description

If on a running pulp deployment you stop mongod, pulp_celerybeat will eventually die and produce the following log output, including a traceback.

Instead, it should handle the connection failure gracefully and recover when mongod becomes available again.

This behavior has been seen "in the wild" when a full disk caused mongod to die.

Oct 11 14:04:09 localhost.localdomain pulp[12792]: pulp.server.async.worker_watcher:DEBUG: 'worker-heartbeat' sent at time 2016-10-11 19:04:09.617800 from resource_manager@localhost.localdomain, received at time: 2016-10-11 18:04:09.635230
Oct 11 14:04:39 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:ERROR: localhost:27017: [Errno 111] Connection refused
Oct 11 14:04:47 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:DEBUG: Checking if pulp_workers, pulp_celerybeat, or pulp_resource_manager processes are missing for more than 300 seconds
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.transport.qpid:DEBUG: Attempting to connect to qpid with SASL mechanism ANONYMOUS
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Oct 11 14:04:49 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:INFO: Event Monitor Starting
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.transport.qpid:DEBUG: Attempting to connect to qpid with SASL mechanism ANONYMOUS
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Oct 11 14:04:49 localhost.localdomain pulp[12792]: kombu.mixins:INFO: Connected to qpid://localhost:5672//
Oct 11 14:04:49 localhost.localdomain pulp[12792]: pulp.server.async.worker_watcher:DEBUG: 'worker-heartbeat' sent at time 2016-10-11 19:04:49.940497 from resource_manager@localhost.localdomain, received at time: 2016-10-11 18:04:49.961244
Oct 11 14:05:17 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:ERROR: localhost:27017: [Errno 111] Connection refused
Oct 11 14:05:18 localhost.localdomain pulp[12792]: pulp.server.async.worker_watcher:DEBUG: 'scheduler-event' sent at time 2016-10-11 18:05:18.236002 from scheduler@localhost.localdomain, received at time: 2016-10-11 18:05:18.236004
Oct 11 14:05:20 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:ERROR: localhost:27017: [Errno 111] Connection refused
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.transport.qpid:DEBUG: Attempting to connect to qpid with SASL mechanism ANONYMOUS
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Oct 11 14:05:30 localhost.localdomain pulp[12792]: pulp.server.async.scheduler:INFO: Event Monitor Starting
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.transport.qpid:DEBUG: Attempting to connect to qpid with SASL mechanism ANONYMOUS
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
Oct 11 14:05:30 localhost.localdomain pulp[12792]: kombu.mixins:INFO: Connected to qpid://localhost:5672//
Oct 11 14:05:30 localhost.localdomain pulp[12792]: pulp.server.async.worker_watcher:DEBUG: 'worker-heartbeat' sent at time 2016-10-11 19:05:30.561665 from reserved_resource_worker-0@localhost.localdomain, received at time: 2016-10-11 18:05:30.579888
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) beat raised exception <class 'pymongo.errors.ServerSelectionTimeoutError'>: ServerSelectionTimeoutError('localhost:27017: [Errno 111] Connection refused',)
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) Traceback (most recent call last):
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib/python2.7/site-packages/celery/apps/beat.py", line 112, in start_scheduler
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     beat.start()
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib/python2.7/site-packages/celery/beat.py", line 462, in start
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     interval = self.scheduler.tick()
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 252, in tick
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     worker_watcher.handle_worker_heartbeat(scheduler_event)
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib/python2.7/site-packages/pulp/server/async/worker_watcher.py", line 75, in handle_worker_heartbeat
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     worker = Worker.objects(name=event_info['worker_name']).first()
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 264, in first
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     result = queryset[0]
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 161, in __getitem__
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     return queryset._document._from_son(queryset._cursor[key],
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 565, in __getitem__
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     for doc in clone:
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1097, in next
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     if len(self.__data) or self._refresh():
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1019, in _refresh
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     self.__read_concern))
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 850, in __send_message
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     **kwargs)
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 777, in _send_message_with_response
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     server = topology.select_server(selector)
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib64/python2.7/site-packages/pymongo/topology.py", line 141, in select_server
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     address))
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)   File "/usr/lib64/python2.7/site-packages/pymongo/topology.py", line 117, in select_servers
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504)     self._error_message(selector))
Oct 11 14:05:48 localhost.localdomain pulp[12792]: celery.beat:CRITICAL: (12792-69504) ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused

Related issues

Related to Pulp - Issue #2423: As a user, I want pulp_celerybeat to survive a brief database outage without having to use the unsafe_autoretry feature.CLOSED - WORKSFORMEdaviddavisActions
Related to Pulp - Task #2417: Ensure all processes have initial and reconnect support for PostgreSQLCLOSED - CURRENTRELEASEbmbouter

Actions

Also available in: Atom PDF