Project

Profile

Help

Task #1014

closed

Short Term Improvements for Pulp's use of MongoDB

Added by dkliban@redhat.com almost 9 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Estimated time:
(Total: 0:00 h)
Platform Release:
2.7.1
Groomed:
Yes
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

This is a tracker bug to make sure that we get all the issues related to MongoDB and replica sets resolved in a timely manner.


Sub-issues 8 (0 open8 closed)

Issue #974: GET of a task sometimes 404s on first requestCLOSED - CURRENTRELEASEdkliban@redhat.comActions
Issue #1012: If the first database seed listed in server.conf is not the current Primary replica, Pulp will not be able to write to the databaseCLOSED - CURRENTRELEASEdkliban@redhat.comActions
Issue #956: Pulp's Celery result backend connection cannot use Mongo replica sets with automatic failoverCLOSED - CURRENTRELEASEdkliban@redhat.comActions
Story #1: As a user, I can have Pulp attempt use auto_retry application wide using the 'unsafe_autoretry' parameterCLOSED - CURRENTRELEASEamacdona@redhat.com

Actions
Story #934: Update mongoengine dependency to 0.9CLOSED - CURRENTRELEASEbmbouter

Actions
Refactor #1084: Stop Pulp from using the Celery results backendCLOSED - CURRENTRELEASEdkliban@redhat.com

Actions
Issue #1065: DeprecationWarning on pulp-manage-dbCLOSED - CURRENTRELEASEamacdona@redhat.comActions
Issue #1139: Fix Pulp's use of replica sets replica setsCLOSED - CURRENTRELEASEActions
Actions #1

Updated by dkliban@redhat.com almost 9 years ago

  • Related to Issue #974: GET of a task sometimes 404s on first request added
Actions #2

Updated by dkliban@redhat.com almost 9 years ago

  • Related to deleted (Issue #974: GET of a task sometimes 404s on first request)
Actions #3

Updated by dkliban@redhat.com almost 9 years ago

  • Blocked by Issue #974: GET of a task sometimes 404s on first request added
Actions #4

Updated by dkliban@redhat.com almost 9 years ago

  • Blocked by deleted (Issue #974: GET of a task sometimes 404s on first request)
Actions #5

Updated by jortel@redhat.com almost 9 years ago

  • Triaged changed from No to Yes
Actions #6

Updated by rbarlow almost 9 years ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (rbarlow)

I've spent a lot of time working on this issue, and I am sad to report that I think it will be too great of an effort to justify its priority at this time. The details of this recommendation follow.

Converting our connections to use pymongo's replica set settings is straightforward enough. It turns out that we had simply misspelled the setting in our code (we had "replica_set" instead of "replicaSet".) Similarly, it was necessary to adjust Celery's MongoDB result backend to use this setting which was also straightforward.

The problem arises when testing Celery with MongoDB replica sets configured. When a MongoDB primary falls over, there are a few moments when there is no primary in the replica set. During this time, any write queries will raise pymongo.errors.AutoReconnect. Read queries may also raise this Exception if the read_preference is configured to PRIMARY (the default, and also the value currently used for Pulp). Once the new primary is elected, queries will begin to work again. Unfortunately, I was not able to determine how to stop Celery from destroying the MongoDB and qpidd connections when the primary was disabled. This causes Celery to start a whole new MongoDB connection instead of reusing the existing one and taking advantage of pymongo's ability to autodiscover the new master. This problem is solveable, but I decided to do some dirty hacks to work around it to do further testing with Pulp.

In my further testing (dirty Celery hacks in place), I encountered another issue: our new Pulp code does not use the auto retry decorator that has been in place on our queries for a while. This means that when the AutoReconnect is raised, our code fails. Unfortunately, when this happens in our tasking system our tasks get dropped. It's easy to end up with a task that is "Waiting" for eternity, as the message has been removed from the queue and the Celery task has failed. We could try to work around this by catching the AutoReconnect Exception, but handling that Exception isn't always an obvious endeavor. If we just try again, that is no different than the auto retry we used to have which is not generally safe. To achieve safety, we will need to consider each query together with its data model to determine what the correct course of action is. Auto retry will be safe in certain situations, but there are times when it will not be easy to handle an AutoReconnect. As we move more of our models from pymongo to mongoengine, this issue will be more and more exposed.

The above issues are solveable, but I estimate that it will be a large effort. When I started working on this issue, it seemed that I'd be able to get us back to HA more quickly. I also believe that we would have much more success if we were using a transactional database. Most of the problems (and performance benefits) with MongoDB are due to its lack of transactions. Because we don't have transactions, it is difficult to handle the AutoReconnect in all cases. It is difficult to determine if the previous queries in any given task have been "committed" to the replica set. Because of this, it's hard to know if it is safe to continue, or if cleanup needs to happen. With Pulp, it is also difficult to determine what cleanup would need to happen if that path were chosen.

My recommendation is that we should wait until we have a transactional database to solve our HA database problem.

Actions #7

Updated by mhrivnak almost 9 years ago

  • Version set to 2.6.0
Actions #9

Updated by bmbouter almost 9 years ago

  • Tracker changed from Issue to Task
  • Groomed set to Yes
  • Sprint Candidate set to No

Changing to be a Task which is more appropriate for a tracker.

Actions #10

Updated by bmbouter over 8 years ago

  • Subject changed from Fix support for replica sets to Short Term Improvements for MongoDB
Actions #11

Updated by bmbouter over 8 years ago

  • Subject changed from Short Term Improvements for MongoDB to Short Term Improvements for Pulp's use of MongoDB
Actions #12

Updated by dkliban@redhat.com over 8 years ago

  • Status changed from NEW to 5
Actions #13

Updated by rbarlow about 8 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
  • Platform Release set to 2.7.1
Actions #14

Updated by bmbouter almost 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF