Issue #956
closedTask #1014: Short Term Improvements for Pulp's use of MongoDB
Pulp's Celery result backend connection cannot use Mongo replica sets with automatic failover
Description
This tracks the issue, but the fix is refactor #1084. Anyone who assigns this issue to them needs to also assign #1084 because they go together.
We had an issue today where it was discovered that Pulp cannot truly use Mongo replica sets with automatic failover. Pulp uses MongoDB as Celery's results backend, and this is the component that fails with the following traceback:
pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._reserve_resource[8c66b5b7-3236-4da0-9ab2-00c476e7196f]
pulp: celery.worker.job:CRITICAL: Task pulp.server.async.tasks._reserve_resource[8c66b5b7-3236-4da0-9ab2-00c476e7196f] INTERNAL ERROR: AutoReconnect('not master',)
pulp: celery.worker.job:CRITICAL: Traceback (most recent call last):
pulp: celery.worker.job:CRITICAL: File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 283, in trace_task
pulp: celery.worker.job:CRITICAL: uuid, retval, SUCCESS, request=task_request,
pulp: celery.worker.job:CRITICAL: File "/usr/lib/python2.6/site-packages/celery/backends/base.py", line 254, in store_result
pulp: celery.worker.job:CRITICAL: request=request, **kwargs)
pulp: celery.worker.job:CRITICAL: File "/usr/lib/python2.6/site-packages/celery/backends/mongodb.py", line 145, in _store_result
pulp: celery.worker.job:CRITICAL: self.collection.save(meta)
pulp: celery.worker.job:CRITICAL: File "/usr/lib/python2.6/site-packages/kombu/utils/__init__.py", line 322, in __get__
pulp: celery.worker.job:CRITICAL: value = obj.__dict__[self.__name__] = self.__get(obj)
pulp: celery.worker.job:CRITICAL: File "/usr/lib/python2.6/site-packages/celery/backends/mongodb.py", line 240, in collection
pulp: celery.worker.job:CRITICAL: collection.ensure_index('date_done', background='true')
pulp: celery.worker.job:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 916, in ensure_index
pulp: celery.worker.job:CRITICAL: return self.create_index(key_or_list, cache_for, **kwargs)
pulp: celery.worker.job:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 823, in create_index
pulp: celery.worker.job:CRITICAL: **self._get_wc_override())
pulp: celery.worker.job:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 357, in insert
pulp: celery.worker.job:CRITICAL: continue_on_error, self.__uuid_subtype), safe)
pulp: celery.worker.job:CRITICAL: File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 929, in _send_message
pulp: celery.worker.job:CRITICAL: raise AutoReconnect(str(e))
pulp: celery.worker.job:CRITICAL: AutoReconnect: not master
There is a comment claiming that Celery 3.1 does not support replica sets above this code block[0]. I have not independently verified this claim, but we'll need to either fix Celery so that it does support this, or find some other way around this problem so that replica sets are fully supported by Pulp, including automatic failover.
Steps to reproduce:
1) Deploy a pool of three mongod's, configured to be a replica set.
2) Deploy Pulp, and configure it's database connection with the three mongo replicas. Put the current primary as the first seed in the list.
3) Perform a few actions to ensure everything is working correctly.
4) Now reconfigure Pulp's seed list so that one of the secondaries is the first in the list.
5) Perform an action that uses the results backend, such as a repository sync. This will fail with a traceback similar to the above.
Alternatively:
1) Deploy a pool of three mongod's, configured to be a replica set.
2) Deploy Pulp, and configure it's database connection with the three mongo replicas. Put the current primary as the first seed in the list.
3) Perform a few actions to ensure everything is working correctly.
4) Kill the current Mongo primary.
5) Perform an action that uses the results backend, such as a repository sync. This will fail with a traceback similar to the above.
Expected behavior:
The order of the seeds in server.conf should not be important for Pulp to operate correctly. It should also be possible to kill the current Mongo primary, and Pulp should continue operating smoothly.
I've filed this against 2.4.0, as it affects every version of Pulp that has used Celery.
QE instructions¶
You're actually verifying things that were done in #1080, but we're doing the verification on this issue.
- Verify that the migration removes the celery_taskmeta collection
- Verify the release notes
- Verify that the fix which includes refactor #1080 passes a full regression test
Related issues
Updated by rbarlow over 9 years ago
A workaround is to make sure the current primary is always the first in the list of seeds in server.conf. If the primary changes, the seed order will need to be adjusted and Pulp (all services) will need to be restarted.
Updated by jortel@redhat.com over 9 years ago
- Priority changed from Normal to High
- Triaged changed from No to Yes
Updated by mhrivnak over 9 years ago
Code for reference: https://github.com/celery/celery/blob/3.1/celery/backends/mongodb.py
Updated by dkliban@redhat.com over 9 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dkliban@redhat.com
Updated by dkliban@redhat.com over 9 years ago
- Status changed from ASSIGNED to NEW
Updated by dkliban@redhat.com over 9 years ago
- Assignee deleted (
dkliban@redhat.com)
Updated by bmbouter over 9 years ago
- Related to Refactor #1084: Stop Pulp from using the Celery results backend added
Updated by dkliban@redhat.com over 9 years ago
- Status changed from NEW to POST
Added by dkliban@redhat.com over 9 years ago
Added by dkliban@redhat.com over 9 years ago
Revision aa4d57df | View on GitHub
Removes MongoDB as celery result backend
This patch also removes the FailureHandler which relied on checking the results backend to determine if a scheduled task needs to have it's schedule disabled after reaching a failure threshold. The logic is moved to the on_success and on_failure methods for Task defined in Pulp.
https://pulp.plan.io/issues/956 fixes #956 https://pulp.plan.io/issues/1084 fixes #1084
Updated by dkliban@redhat.com over 9 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset pulp|aa4d57dfd60cb9ac34ec0ce79d95f5c03e1dbf55.
Updated by dkliban@redhat.com over 9 years ago
- Assignee set to dkliban@redhat.com
Updated by dkliban@redhat.com about 9 years ago
- Status changed from MODIFIED to 5
Updated by pthomas@redhat.com about 9 years ago
- Status changed from 5 to 6
verified
[root@sparks ~]#
[root@sparks ~]# ps -awx |grep mongo
9730 ? Sl 0:31 mongod --fork --nojournal --syslog --port 27017 --dbpath /root/rs0-0 --replSet rs0
10209 ? Sl 0:32 mongod --fork --nojournal --syslog --port 27018 --dbpath /root/rs0-1 --replSet rs0
10421 ? Sl 0:27 mongod --fork --nojournal --syslog --port 27019 --dbpath /root/rs0-2 --replSet rs0
17825 pts/0 S+ 0:00 grep --color=auto mongo
[root@sparks ~]#
[root@sparks ~]#
[root@sparks ~]# kill -9 10209
[root@sparks ~]# pulp-admin rpm repo sync run --repo-id zoo
+----------------------------------------------------------------------+
Synchronizing Repository [zoo]
+----------------------------------------------------------------------+
This command may be exited via ctrl+c without affecting the request.
Downloading metadata...
[\]
... completed
Downloading repository content...
[==================================================] 100%
RPMs: 0/0 items
Delta RPMs: 0/0 items
... completed
Downloading distribution files...
[==================================================] 100%
Distributions: 0/0 items
... completed
Importing errata...
[-]
... completed
Importing package groups/categories...
[-]
... completed
Task Succeeded
Copying files
[-]
... completed
Initializing repo metadata
[-]
... completed
Publishing Distribution files
[-]
... completed
Publishing RPMs
[-]
... completed
Publishing Delta RPMs
... skipped
Publishing Errata
[==================================================] 100%
4 of 4 items
... completed
Publishing Comps file
[==================================================] 100%
3 of 3 items
... completed
Publishing Metadata.
[-]
... completed
Closing repo metadata
[-]
... completed
Generating sqlite files
... skipped
Publishing files to web
[-]
... completed
Writing Listings File
[-]
... completed
Task Succeeded
Updated by amacdona@redhat.com about 9 years ago
- Status changed from 6 to CLOSED - CURRENTRELEASE
Removes MongoDB as celery result backend
This patch also removes the FailureHandler which relied on checking the results backend to determine if a scheduled task needs to have it's schedule disabled after reaching a failure threshold. The logic is moved to the on_success and on_failure methods for Task defined in Pulp.
https://pulp.plan.io/issues/956 fixes #956 https://pulp.plan.io/issues/1084 fixes #1084