Refactor #888
closedRefactor #765: Convert Pulp to use MongoEngine
pulp_manage_db needs to run .ensure_indexes() on MongoEngine platform models
100%
Description
After pulp-manage-db performs all the migrations it should load all the models for the core collections (repos, tasks, repo_content_units, etc.) and run the ensure_indexes() method on the models to make sure that the Pulp required indexes are maintained.
Deliverables:
- Create a list that stores the class paths to provide the loading of all models for core collections
- Platform models that have already been converted are included in the list of models above
- Updates to pulp-manage-db calls ensure_indexes() on each loaded class in the list above
- Mongoengine conversion guide is updated to include adding the model to this collection of calls.
Updated by bcourt over 9 years ago
- Status changed from NEW to ASSIGNED
- Platform Release set to 2.7.0
Updated by rbarlow over 9 years ago
On 04/14/2015 02:57 PM, Pulp wrote:
After pulp-manage-db performs all the migrations it should load all the
models for the core collections (repos, tasks, repo_content_units, etc.)
and run the ensure_indexes() method on the models to make sure that the
Pulp required indexes are maintained.
Is this true? When I used Mongoengine before, I believe it ensured all
the indices for me upon connecting to the DB. However, it has been ~3.5
years since I last used Mongoengine.
--
Randy Barlow
Updated by bcourt over 9 years ago
- Status changed from ASSIGNED to CLOSED - CURRENTRELEASE
Updated by bcourt over 9 years ago
- Sprint/Milestone set to 15
- % Done changed from 0 to 100
Updated by bmbouter over 9 years ago
I just tested this, and I think that mongoengine calls ensureindexes when accessing the db automatically. I am using mongoengine==0.7.10
Adapted from the mongoengine gettings started guide, I create a simple model in foo.py:
from mongoengine import *
connect('tumblelog')
class User(Document):
email = StringField(required=True)
first_name = StringField(max_length=50)
last_name = StringField(max_length=50)
User(email='ross@example.com', first_name='Ross', last_name='Lawley').save()
I run foo.py. I then list the indexes using the following command and I see 2. One for the primary key and one for some other index that I don't understand.
[bmbouter@server ~]$ mongo tumblelog --eval 'db.user.getIndexes()'
MongoDB shell version: 2.4.6
connecting to: tumblelog
[object Object],[object Object]
Then I add some indexes to foo.py and run it:
from mongoengine import *
connect('tumblelog')
class User(Document):
meta = {'indexes': ['first_name', 'last_name']}
email = StringField(required=True)
first_name = StringField(max_length=50)
last_name = StringField(max_length=50)
User(email='another@example.com', first_name='Some', last_name='User').save()
Then when I list the indexes I see the two additional ones on the collection.
[bmbouter@server ~]$ mongo tumblelog --eval 'db.user.getIndexes()'
MongoDB shell version: 2.4.6
connecting to: tumblelog
[object Object],[object Object],[object Object],[object Object]
I suspect that the connect() gratuitously calls createIndex() or ensureIndex(). ensureIndex() was deprecated by MongoDB in favor of createIndex later. Having it check with each connection is a mongoengine behavior which can be disabled with the auto_create_index set to False in the meta of the model definition. I think we should leave it to True, and remove the pulp-db and revert this story.
This would have been much more useful two days ago, but I assumed mongoengine didn't do this correctly then.
Updated by bmbouter over 9 years ago
Disabling auto_create_index could have performance gains. Here is a quick analysis of the overhead of auto_create_index. http://fpaste.org/211905/
Updated by bmbouter over 9 years ago
After some IRC discussion, we are going to leave in place the PR [0] that runs an ensure_index() on all mongoengine models at the end of pulp-manage-db. This will push any index creations on big collections to pulp-manage-db time instead of runtime. We are also choosing to leave auto_index_create at its default (enabled) so that if Pulp developers fail to add a model to the pulp-manage-db codepath, mongoengine will still autocreate the missing indexes at runtime.
Updated by bmbouter over 9 years ago
- Groomed set to Yes
- Sprint Candidate set to No
- Tags deleted (
Groomed)