Project

Profile

Help

Issue #956

closed

Task #1014: Short Term Improvements for Pulp's use of MongoDB

Pulp's Celery result backend connection cannot use Mongo replica sets with automatic failover

Added by rbarlow almost 9 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.4.0
Platform Release:
2.7.0
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

This tracks the issue, but the fix is refactor #1084. Anyone who assigns this issue to them needs to also assign #1084 because they go together.

We had an issue today where it was discovered that Pulp cannot truly use Mongo replica sets with automatic failover. Pulp uses MongoDB as Celery's results backend, and this is the component that fails with the following traceback:

pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._reserve_resource[8c66b5b7-3236-4da0-9ab2-00c476e7196f]
pulp: celery.worker.job:CRITICAL: Task pulp.server.async.tasks._reserve_resource[8c66b5b7-3236-4da0-9ab2-00c476e7196f] INTERNAL ERROR: AutoReconnect('not master',)
pulp: celery.worker.job:CRITICAL: Traceback (most recent call last):
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 283, in trace_task
pulp: celery.worker.job:CRITICAL:     uuid, retval, SUCCESS, request=task_request,
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/backends/base.py", line 254, in store_result
pulp: celery.worker.job:CRITICAL:     request=request, **kwargs)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/backends/mongodb.py", line 145, in _store_result
pulp: celery.worker.job:CRITICAL:     self.collection.save(meta)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/kombu/utils/__init__.py", line 322, in __get__
pulp: celery.worker.job:CRITICAL:     value = obj.__dict__[self.__name__] = self.__get(obj)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib/python2.6/site-packages/celery/backends/mongodb.py", line 240, in collection
pulp: celery.worker.job:CRITICAL:     collection.ensure_index('date_done', background='true')
pulp: celery.worker.job:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 916, in ensure_index
pulp: celery.worker.job:CRITICAL:     return self.create_index(key_or_list, cache_for, **kwargs)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 823, in create_index
pulp: celery.worker.job:CRITICAL:     **self._get_wc_override())
pulp: celery.worker.job:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 357, in insert
pulp: celery.worker.job:CRITICAL:     continue_on_error, self.__uuid_subtype), safe)
pulp: celery.worker.job:CRITICAL:   File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 929, in _send_message
pulp: celery.worker.job:CRITICAL:     raise AutoReconnect(str(e))
pulp: celery.worker.job:CRITICAL: AutoReconnect: not master

There is a comment claiming that Celery 3.1 does not support replica sets above this code block[0]. I have not independently verified this claim, but we'll need to either fix Celery so that it does support this, or find some other way around this problem so that replica sets are fully supported by Pulp, including automatic failover.

Steps to reproduce:

1) Deploy a pool of three mongod's, configured to be a replica set.
2) Deploy Pulp, and configure it's database connection with the three mongo replicas. Put the current primary as the first seed in the list.
3) Perform a few actions to ensure everything is working correctly.
4) Now reconfigure Pulp's seed list so that one of the secondaries is the first in the list.
5) Perform an action that uses the results backend, such as a repository sync. This will fail with a traceback similar to the above.

Alternatively:

1) Deploy a pool of three mongod's, configured to be a replica set.
2) Deploy Pulp, and configure it's database connection with the three mongo replicas. Put the current primary as the first seed in the list.
3) Perform a few actions to ensure everything is working correctly.
4) Kill the current Mongo primary.
5) Perform an action that uses the results backend, such as a repository sync. This will fail with a traceback similar to the above.

Expected behavior:

The order of the seeds in server.conf should not be important for Pulp to operate correctly. It should also be possible to kill the current Mongo primary, and Pulp should continue operating smoothly.

I've filed this against 2.4.0, as it affects every version of Pulp that has used Celery.

QE instructions

You're actually verifying things that were done in #1080, but we're doing the verification on this issue.

  • Verify that the migration removes the celery_taskmeta collection
  • Verify the release notes
  • Verify that the fix which includes refactor #1080 passes a full regression test

[0] https://github.com/pulp/pulp/blob/01fcf261c38f9b4b057839980f892f85a8697a27/server/pulp/server/async/celery_instance.py#L48-L53


Related issues

Related to Pulp - Refactor #1084: Stop Pulp from using the Celery results backendCLOSED - CURRENTRELEASEdkliban@redhat.com

Actions
Actions #1

Updated by rbarlow almost 9 years ago

A workaround is to make sure the current primary is always the first in the list of seeds in server.conf. If the primary changes, the seed order will need to be adjusted and Pulp (all services) will need to be restarted.

Actions #2

Updated by jortel@redhat.com almost 9 years ago

  • Priority changed from Normal to High
  • Triaged changed from No to Yes
Actions #4

Updated by dkliban@redhat.com almost 9 years ago

  • Parent issue set to #1014
Actions #5

Updated by dkliban@redhat.com almost 9 years ago

  • Platform Release set to 2.7.0
Actions #6

Updated by dkliban@redhat.com almost 9 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com
Actions #7

Updated by dkliban@redhat.com almost 9 years ago

  • Status changed from ASSIGNED to NEW
Actions #8

Updated by dkliban@redhat.com almost 9 years ago

  • Assignee deleted (dkliban@redhat.com)
Actions #9

Updated by bmbouter almost 9 years ago

  • Related to Refactor #1084: Stop Pulp from using the Celery results backend added
Actions #10

Updated by bmbouter almost 9 years ago

  • Description updated (diff)
Actions #11

Updated by bmbouter almost 9 years ago

  • Platform Release deleted (2.7.0)
Actions #12

Updated by dkliban@redhat.com over 8 years ago

  • Status changed from NEW to POST

Added by dkliban@redhat.com over 8 years ago

Revision aa4d57df | View on GitHub

Removes MongoDB as celery result backend

This patch also removes the FailureHandler which relied on checking the results backend to determine if a scheduled task needs to have it's schedule disabled after reaching a failure threshold. The logic is moved to the on_success and on_failure methods for Task defined in Pulp.

https://pulp.plan.io/issues/956 fixes #956 https://pulp.plan.io/issues/1084 fixes #1084

Added by dkliban@redhat.com over 8 years ago

Revision aa4d57df | View on GitHub

Removes MongoDB as celery result backend

This patch also removes the FailureHandler which relied on checking the results backend to determine if a scheduled task needs to have it's schedule disabled after reaching a failure threshold. The logic is moved to the on_success and on_failure methods for Task defined in Pulp.

https://pulp.plan.io/issues/956 fixes #956 https://pulp.plan.io/issues/1084 fixes #1084

Actions #13

Updated by dkliban@redhat.com over 8 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #14

Updated by dkliban@redhat.com over 8 years ago

  • Platform Release set to 2.7.0
Actions #15

Updated by dkliban@redhat.com over 8 years ago

  • Assignee set to dkliban@redhat.com
Actions #16

Updated by dkliban@redhat.com over 8 years ago

  • Status changed from MODIFIED to 5
Actions #17

Updated by pthomas@redhat.com over 8 years ago

  • Status changed from 5 to 6

verified

[root@sparks ~]# 
[root@sparks ~]# ps -awx |grep mongo
 9730 ?        Sl     0:31 mongod --fork --nojournal --syslog --port 27017 --dbpath /root/rs0-0 --replSet rs0
10209 ?        Sl     0:32 mongod --fork --nojournal --syslog --port 27018 --dbpath /root/rs0-1 --replSet rs0
10421 ?        Sl     0:27 mongod --fork --nojournal --syslog --port 27019 --dbpath /root/rs0-2 --replSet rs0
17825 pts/0    S+     0:00 grep --color=auto mongo
[root@sparks ~]# 
[root@sparks ~]# 
[root@sparks ~]# kill -9 10209
[root@sparks ~]# pulp-admin rpm repo sync run --repo-id zoo
+----------------------------------------------------------------------+
                     Synchronizing Repository [zoo]
+----------------------------------------------------------------------+

This command may be exited via ctrl+c without affecting the request.

Downloading metadata...
[\]
... completed

Downloading repository content...
[==================================================] 100%
RPMs:       0/0 items
Delta RPMs: 0/0 items

... completed

Downloading distribution files...
[==================================================] 100%
Distributions: 0/0 items
... completed

Importing errata...
[-]
... completed

Importing package groups/categories...
[-]
... completed

Task Succeeded

Copying files
[-]
... completed

Initializing repo metadata
[-]
... completed

Publishing Distribution files
[-]
... completed

Publishing RPMs
[-]
... completed

Publishing Delta RPMs
... skipped

Publishing Errata
[==================================================] 100%
4 of 4 items
... completed

Publishing Comps file
[==================================================] 100%
3 of 3 items
... completed

Publishing Metadata.
[-]
... completed

Closing repo metadata
[-]
... completed

Generating sqlite files
... skipped

Publishing files to web
[-]
... completed

Writing Listings File
[-]
... completed

Task Succeeded
Actions #18

Updated by amacdona@redhat.com over 8 years ago

  • Status changed from 6 to CLOSED - CURRENTRELEASE
Actions #20

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF