Project

Profile

Help

Refactor #888

Refactor #765: Convert Pulp to use MongoEngine

pulp_manage_db needs to run .ensure_indexes() on MongoEngine platform models

Added by bcourt over 5 years ago. Updated over 1 year ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
2.7.0
Groomed:
Yes
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
April 2015
Quarter:

Description

After pulp-manage-db performs all the migrations it should load all the models for the core collections (repos, tasks, repo_content_units, etc.) and run the ensure_indexes() method on the models to make sure that the Pulp required indexes are maintained.

Deliverables:

  • Create a list that stores the class paths to provide the loading of all models for core collections
  • Platform models that have already been converted are included in the list of models above
  • Updates to pulp-manage-db calls ensure_indexes() on each loaded class in the list above
  • Mongoengine conversion guide is updated to include adding the model to this collection of calls.

History

#1 Updated by bcourt over 5 years ago

  • Assignee set to bcourt
  • Tags Groomed added

#2 Updated by bmbouter over 5 years ago

  • Description updated (diff)

#3 Updated by bmbouter over 5 years ago

  • Description updated (diff)

#4 Updated by bcourt over 5 years ago

  • Status changed from NEW to ASSIGNED
  • Platform Release set to 2.7.0

#5 Updated by rbarlow over 5 years ago

On 04/14/2015 02:57 PM, Pulp wrote:

After pulp-manage-db performs all the migrations it should load all the
models for the core collections (repos, tasks, repo_content_units, etc.)
and run the ensure_indexes() method on the models to make sure that the
Pulp required indexes are maintained.

Is this true? When I used Mongoengine before, I believe it ensured all
the indices for me upon connecting to the DB. However, it has been ~3.5
years since I last used Mongoengine.

--
Randy Barlow

#6 Updated by bcourt over 5 years ago

  • Status changed from ASSIGNED to CLOSED - CURRENTRELEASE

#7 Updated by bcourt over 5 years ago

  • Sprint/Milestone set to 15
  • % Done changed from 0 to 100

#8 Updated by bmbouter over 5 years ago

I just tested this, and I think that mongoengine calls ensureindexes when accessing the db automatically. I am using mongoengine==0.7.10

Adapted from the mongoengine gettings started guide, I create a simple model in foo.py:

from mongoengine import *
connect('tumblelog')
class User(Document):
    email = StringField(required=True)
    first_name = StringField(max_length=50)
    last_name = StringField(max_length=50)

User(email='ross@example.com', first_name='Ross', last_name='Lawley').save()

I run foo.py. I then list the indexes using the following command and I see 2. One for the primary key and one for some other index that I don't understand.

[bmbouter@server ~]$ mongo tumblelog --eval 'db.user.getIndexes()'
MongoDB shell version: 2.4.6
connecting to: tumblelog
[object Object],[object Object]

Then I add some indexes to foo.py and run it:

from mongoengine import *
connect('tumblelog')
class User(Document):
    meta = {'indexes': ['first_name', 'last_name']}

    email = StringField(required=True)
    first_name = StringField(max_length=50)
    last_name = StringField(max_length=50)

User(email='another@example.com', first_name='Some', last_name='User').save()

Then when I list the indexes I see the two additional ones on the collection.

[bmbouter@server ~]$ mongo tumblelog --eval 'db.user.getIndexes()'
MongoDB shell version: 2.4.6
connecting to: tumblelog
[object Object],[object Object],[object Object],[object Object]

I suspect that the connect() gratuitously calls createIndex() or ensureIndex(). ensureIndex() was deprecated by MongoDB in favor of createIndex later. Having it check with each connection is a mongoengine behavior which can be disabled with the auto_create_index set to False in the meta of the model definition. I think we should leave it to True, and remove the pulp-db and revert this story.

This would have been much more useful two days ago, but I assumed mongoengine didn't do this correctly then.

#9 Updated by bmbouter over 5 years ago

Disabling auto_create_index could have performance gains. Here is a quick analysis of the overhead of auto_create_index. http://fpaste.org/211905/

#10 Updated by bmbouter over 5 years ago

After some IRC discussion, we are going to leave in place the PR [0] that runs an ensure_index() on all mongoengine models at the end of pulp-manage-db. This will push any index creations on big collections to pulp-manage-db time instead of runtime. We are also choosing to leave auto_index_create at its default (enabled) so that if Pulp developers fail to add a model to the pulp-manage-db codepath, mongoengine will still autocreate the missing indexes at runtime.

[0]: https://github.com/pulp/pulp/pull/1784/files

#11 Updated by bmbouter over 5 years ago

  • Groomed set to Yes
  • Sprint Candidate set to No
  • Tags deleted (Groomed)

#12 Updated by bmbouter over 2 years ago

  • Sprint set to April 2015

#13 Updated by bmbouter over 2 years ago

  • Sprint/Milestone deleted (15)

#14 Updated by bmbouter over 1 year ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF