Project

Profile

Help

Issue #956

Updated by bmbouter almost 9 years ago

This tracks the issue, but the fix is refactor #1084. Anyone who assigns this issue to them needs to also assign #1084 because they go together. 

 We had an issue today where it was discovered that Pulp cannot truly use Mongo replica sets with automatic failover. Pulp uses MongoDB as Celery's results backend, and this is the component that fails with the following traceback: 

 <pre> 
 <code class=python> 
 pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._reserve_resource[8c66b5b7-3236-4da0-9ab2-00c476e7196f] 
 pulp: celery.worker.job:CRITICAL: Task pulp.server.async.tasks._reserve_resource[8c66b5b7-3236-4da0-9ab2-00c476e7196f] INTERNAL ERROR: AutoReconnect('not master',) 
 pulp: celery.worker.job:CRITICAL: Traceback (most recent call last): 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 283, in trace_task 
 pulp: celery.worker.job:CRITICAL:       uuid, retval, SUCCESS, request=task_request, 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib/python2.6/site-packages/celery/backends/base.py", line 254, in store_result 
 pulp: celery.worker.job:CRITICAL:       request=request, **kwargs) 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib/python2.6/site-packages/celery/backends/mongodb.py", line 145, in _store_result 
 pulp: celery.worker.job:CRITICAL:       self.collection.save(meta) 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib/python2.6/site-packages/kombu/utils/__init__.py", line 322, in __get__ 
 pulp: celery.worker.job:CRITICAL:       value = obj.__dict__[self.__name__] = self.__get(obj) 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib/python2.6/site-packages/celery/backends/mongodb.py", line 240, in collection 
 pulp: celery.worker.job:CRITICAL:       collection.ensure_index('date_done', background='true') 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 916, in ensure_index 
 pulp: celery.worker.job:CRITICAL:       return self.create_index(key_or_list, cache_for, **kwargs) 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 823, in create_index 
 pulp: celery.worker.job:CRITICAL:       **self._get_wc_override()) 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 357, in insert 
 pulp: celery.worker.job:CRITICAL:       continue_on_error, self.__uuid_subtype), safe) 
 pulp: celery.worker.job:CRITICAL:     File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 929, in _send_message 
 pulp: celery.worker.job:CRITICAL:       raise AutoReconnect(str(e)) 
 pulp: celery.worker.job:CRITICAL: AutoReconnect: not master 
 </code> 
 </pre> 

 There is a comment claiming that Celery 3.1 does not support replica sets above this code block[0]. I have not independently verified this claim, but we'll need to either fix Celery so that it does support this, or find some other way around this problem so that replica sets are fully supported by Pulp, including automatic failover. 

 Steps to reproduce: 

 1) Deploy a pool of three mongod's, configured to be a replica set. 
 2) Deploy Pulp, and configure it's database connection with the three mongo replicas. Put the current primary as the first seed in the list. 
 3) Perform a few actions to ensure everything is working correctly. 
 4) Now reconfigure Pulp's seed list so that one of the secondaries is the first in the list. 
 5) Perform an action that uses the results backend, such as a repository sync. This will fail with a traceback similar to the above. 

 Alternatively: 

 1) Deploy a pool of three mongod's, configured to be a replica set. 
 2) Deploy Pulp, and configure it's database connection with the three mongo replicas. Put the current primary as the first seed in the list. 
 3) Perform a few actions to ensure everything is working correctly. 
 4) Kill the current Mongo primary. 
 5) Perform an action that uses the results backend, such as a repository sync. This will fail with a traceback similar to the above. 


 Expected behavior: 

 The order of the seeds in server.conf should not be important for Pulp to operate correctly. It should also be possible to kill the current Mongo primary, and Pulp should continue operating smoothly. 

 I've filed this against 2.4.0, as it affects every version of Pulp that has used Celery. 

 h2. QE instructions 

 You're actually verifying things that were done in #1080, but we're doing the verification on this issue. 

 * Verify that the migration removes the celery_taskmeta collection 
 * Verify the release notes 
 * Verify that the fix which includes refactor #1080 passes a full regression test 

 [0] https://github.com/pulp/pulp/blob/01fcf261c38f9b4b057839980f892f85a8697a27/server/pulp/server/async/celery_instance.py#L48-L53

Back