Project

Profile

Help

Issue #956

closed

Task #1014: Short Term Improvements for Pulp's use of MongoDB

Pulp's Celery result backend connection cannot use Mongo replica sets with automatic failover

Added by rbarlow over 9 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.4.0
Platform Release:
2.7.0
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

This tracks the issue, but the fix is refactor #1084. Anyone who assigns this issue to them needs to also assign #1084 because they go together.

We had an issue today where it was discovered that Pulp cannot truly use Mongo replica sets with automatic failover. Pulp uses MongoDB as Celery's results backend, and this is the component that fails with the following traceback:

pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._reserve_resource[8c66b5b7-3236-4da0-9ab2-00c476e7196f]
pulp: celery.worker.job:CRITICAL: Task pulp.server.async.tasks._reserve_resource[8c66b5b7-3236-4da0-9ab2-00c476e7196f] INTERNAL ERROR: AutoReconnect('not master',)
pulp: celery.worker.job:CRITICAL: Traceback (most recent call last):
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 283, in trace_task
pulp: celery.worker.job:CRITICAL:     uuid, retval, SUCCESS, request=task_request,
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/backends/base.py", line 254, in store_result
pulp: celery.worker.job:CRITICAL:     request=request, **kwargs)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/backends/mongodb.py", line 145, in _store_result
pulp: celery.worker.job:CRITICAL:     self.collection.save(meta)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/kombu/utils/__init__.py", line 322, in __get__
pulp: celery.worker.job:CRITICAL:     value = obj.__dict__[self.__name__] = self.__get(obj)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/backends/mongodb.py", line 240, in collection
pulp: celery.worker.job:CRITICAL:     collection.ensure_index('date_done', background='true')
pulp: celery.worker.job:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 916, in ensure_index
pulp: celery.worker.job:CRITICAL:     return self.create_index(key_or_list, cache_for, **kwargs)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 823, in create_index
pulp: celery.worker.job:CRITICAL:     **self._get_wc_override())
pulp: celery.worker.job:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 357, in insert
pulp: celery.worker.job:CRITICAL:     continue_on_error, self.__uuid_subtype), safe)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 929, in _send_message
pulp: celery.worker.job:CRITICAL:     raise AutoReconnect(str(e))
pulp: celery.worker.job:CRITICAL: AutoReconnect: not master

There is a comment claiming that Celery 3.1 does not support replica sets above this code block[0]. I have not independently verified this claim, but we'll need to either fix Celery so that it does support this, or find some other way around this problem so that replica sets are fully supported by Pulp, including automatic failover.

Steps to reproduce:

1) Deploy a pool of three mongod's, configured to be a replica set.
2) Deploy Pulp, and configure it's database connection with the three mongo replicas. Put the current primary as the first seed in the list.
3) Perform a few actions to ensure everything is working correctly.
4) Now reconfigure Pulp's seed list so that one of the secondaries is the first in the list.
5) Perform an action that uses the results backend, such as a repository sync. This will fail with a traceback similar to the above.

Alternatively:

1) Deploy a pool of three mongod's, configured to be a replica set.
2) Deploy Pulp, and configure it's database connection with the three mongo replicas. Put the current primary as the first seed in the list.
3) Perform a few actions to ensure everything is working correctly.
4) Kill the current Mongo primary.
5) Perform an action that uses the results backend, such as a repository sync. This will fail with a traceback similar to the above.

Expected behavior:

The order of the seeds in server.conf should not be important for Pulp to operate correctly. It should also be possible to kill the current Mongo primary, and Pulp should continue operating smoothly.

I've filed this against 2.4.0, as it affects every version of Pulp that has used Celery.

QE instructions

You're actually verifying things that were done in #1080, but we're doing the verification on this issue.

  • Verify that the migration removes the celery_taskmeta collection
  • Verify the release notes
  • Verify that the fix which includes refactor #1080 passes a full regression test

[0] https://github.com/pulp/pulp/blob/01fcf261c38f9b4b057839980f892f85a8697a27/server/pulp/server/async/celery_instance.py#L48-L53


Related issues

Related to Pulp - Refactor #1084: Stop Pulp from using the Celery results backendCLOSED - CURRENTRELEASEdkliban@redhat.com

Actions

Also available in: Atom PDF