Issue #2483
closedpulp_rpm migration 0031 fails: cursor not found
Description
I am migrating our production pulp instance from 2.7 to 2.10.
The data migration (triggered by pulp-manage-db) failed:
Applying migration pulp_rpm.plugins.migrations.0031_regenerate_repo_unit_counts failed.
Halting migrations due to a migration failure.
Cursor not found, cursor id: 102945255089
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 194, in main
return _auto_manage_db(options)
File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 257, in _auto_manage_db
migrate_database(options)
File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 124, in migrate_database
update_current_version=not options.test)
File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 186, in apply_migration
migration.migrate()
File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/migrations/0031_regenerate_repo_unit_counts.py", line 48, in migrate
for repo in repos_collection.find():
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1114, in next
if len(self.__data) or self._refresh():
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1056, in _refresh
self.__max_await_time_ms))
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 926, in __send_message
codec_options=self.__codec_options)
File "/usr/lib64/python2.7/site-packages/pymongo/helpers.py", line 123, in _unpack_response
raise CursorNotFound(msg, 43, errobj)
CursorNotFound: Cursor not found, cursor id: 102945255089
The code for the migration:
def migrate(*args, **kwargs):
"""
Perform the migration as described in this module's docblock.
:param args: unused
:type args: list
:param kwargs: unused
:type kwargs: dict
"""
db = connection.get_database()
repos_collection = db['repos']
for repo in repos_collection.find():
repo_id = repo['repo_id']
rebuild_content_unit_counts(db, repo_id)
Some observations:
- the code iterates over all repositories in pulp. This is a bit strange, since the migration is part of pulp_rpm one would expect the migration to be only affecting rpm repositories
- the production instance has over 100k repositories. This seems to fail after about 30 minutes. I believe the outer cursor that iterates over the repos collection times out.
I was able to make it work by pre-computing the repo_ids (which is fast) and then iterating over that python list, instead of a Mongo cursor:
def migrate(*args, **kwargs):
"""
Perform the migration as described in this module's docblock.
:param args: unused
:type args: list
:param kwargs: unused
:type kwargs: dict
"""
db = connection.get_database()
repos_collection = db['repos']
repo_ids = [x['repo_id'] for x in repos_collection.find()]
for repo_id in sorted(repo_ids):
rebuild_content_unit_counts(db, repo_id)
I think the moral of the story is to be mindful of timeouts when iterating over large data collections.
I am not afraid of touching python code, but I think this could be a serious problem for an admin with higher standards.
Updated by mhrivnak over 7 years ago
The migration was added here in 2.8.5: https://pulp.plan.io/issues/1979
Updated by dkliban@redhat.com over 7 years ago
When we faced a similar problem in the past, we reduced the number of documents the mongo cursor returns at a time. This way the database is queried more often, but the cursor's timeout limit is not reached. Here[0] is the initial fix we tried. We ended up reducing the batch size down to 5 eventually.
[0] https://github.com/pulp/pulp/commit/f8644708e1ed15dc2d4b04f4edd77eb7bc873963
Updated by dkliban@redhat.com over 7 years ago
- Sprint/Milestone set to 31
- Triaged changed from No to Yes
Updated by bizhang over 7 years ago
- Priority changed from Normal to High
- Severity changed from 2. Medium to 3. High
Updated by daviddavis over 7 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to daviddavis
Updated by daviddavis over 7 years ago
- Status changed from ASSIGNED to POST
Added by daviddavis over 7 years ago
Updated by daviddavis over 7 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp_rpm:60aad3930af15721b6fd9769e4fc72680c579d8e.
Updated by semyers over 7 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Fix failing migration
This migration fails if there are too many records with the error: "CursorNotFound: Cursor not found". This is because there's a timeout in mongodb (by default the cursor timeout is 10 minutes). This change fetches all ids and iterates over them as a python list.
fixes #2483 https://pulp.plan.io/issues/2483