Project

Profile

Help

Issue #2483

closed

pulp_rpm migration 0031 fails: cursor not found

Added by mihai.ibanescu@gmail.com over 7 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.10.3
Platform Release:
2.11.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 13
Quarter:

Description

I am migrating our production pulp instance from 2.7 to 2.10.

The data migration (triggered by pulp-manage-db) failed:

Applying migration pulp_rpm.plugins.migrations.0031_regenerate_repo_unit_counts failed.

Halting migrations due to a migration failure.
Cursor not found, cursor id: 102945255089
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 194, in main
    return _auto_manage_db(options)
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 257, in _auto_manage_db
    migrate_database(options)
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 124, in migrate_database
    update_current_version=not options.test)
  File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 186, in apply_migration
    migration.migrate()
  File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/migrations/0031_regenerate_repo_unit_counts.py", line 48, in migrate
    for repo in repos_collection.find():
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1114, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1056, in _refresh
    self.__max_await_time_ms))
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 926, in __send_message
    codec_options=self.__codec_options)
  File "/usr/lib64/python2.7/site-packages/pymongo/helpers.py", line 123, in _unpack_response
    raise CursorNotFound(msg, 43, errobj)
CursorNotFound: Cursor not found, cursor id: 102945255089

The code for the migration:

def migrate(*args, **kwargs):
    """
    Perform the migration as described in this module's docblock.

    :param args:   unused
    :type  args:   list
    :param kwargs: unused
    :type  kwargs: dict
    """
    db = connection.get_database()
    repos_collection = db['repos']
    for repo in repos_collection.find():
        repo_id = repo['repo_id']
        rebuild_content_unit_counts(db, repo_id)

Some observations:

  • the code iterates over all repositories in pulp. This is a bit strange, since the migration is part of pulp_rpm one would expect the migration to be only affecting rpm repositories
  • the production instance has over 100k repositories. This seems to fail after about 30 minutes. I believe the outer cursor that iterates over the repos collection times out.

I was able to make it work by pre-computing the repo_ids (which is fast) and then iterating over that python list, instead of a Mongo cursor:

def migrate(*args, **kwargs):
    """
    Perform the migration as described in this module's docblock.

    :param args:   unused
    :type  args:   list
    :param kwargs: unused
    :type  kwargs: dict
    """
    db = connection.get_database()
    repos_collection = db['repos']
    repo_ids = [x['repo_id'] for x in repos_collection.find()]
    for repo_id in sorted(repo_ids):
        rebuild_content_unit_counts(db, repo_id)

I think the moral of the story is to be mindful of timeouts when iterating over large data collections.

I am not afraid of touching python code, but I think this could be a serious problem for an admin with higher standards.

Also available in: Atom PDF