Project

Profile

Help

Issue #2483

closed

pulp_rpm migration 0031 fails: cursor not found

Added by mihai.ibanescu@gmail.com over 7 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.10.3
Platform Release:
2.11.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 13
Quarter:

Description

I am migrating our production pulp instance from 2.7 to 2.10.

The data migration (triggered by pulp-manage-db) failed:

Applying migration pulp_rpm.plugins.migrations.0031_regenerate_repo_unit_counts failed.

Halting migrations due to a migration failure.
Cursor not found, cursor id: 102945255089
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 194, in main
    return _auto_manage_db(options)
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 257, in _auto_manage_db
    migrate_database(options)
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 124, in migrate_database
    update_current_version=not options.test)
  File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 186, in apply_migration
    migration.migrate()
  File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/migrations/0031_regenerate_repo_unit_counts.py", line 48, in migrate
    for repo in repos_collection.find():
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1114, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1056, in _refresh
    self.__max_await_time_ms))
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 926, in __send_message
    codec_options=self.__codec_options)
  File "/usr/lib64/python2.7/site-packages/pymongo/helpers.py", line 123, in _unpack_response
    raise CursorNotFound(msg, 43, errobj)
CursorNotFound: Cursor not found, cursor id: 102945255089

The code for the migration:

def migrate(*args, **kwargs):
    """
    Perform the migration as described in this module's docblock.

    :param args:   unused
    :type  args:   list
    :param kwargs: unused
    :type  kwargs: dict
    """
    db = connection.get_database()
    repos_collection = db['repos']
    for repo in repos_collection.find():
        repo_id = repo['repo_id']
        rebuild_content_unit_counts(db, repo_id)

Some observations:

  • the code iterates over all repositories in pulp. This is a bit strange, since the migration is part of pulp_rpm one would expect the migration to be only affecting rpm repositories
  • the production instance has over 100k repositories. This seems to fail after about 30 minutes. I believe the outer cursor that iterates over the repos collection times out.

I was able to make it work by pre-computing the repo_ids (which is fast) and then iterating over that python list, instead of a Mongo cursor:

def migrate(*args, **kwargs):
    """
    Perform the migration as described in this module's docblock.

    :param args:   unused
    :type  args:   list
    :param kwargs: unused
    :type  kwargs: dict
    """
    db = connection.get_database()
    repos_collection = db['repos']
    repo_ids = [x['repo_id'] for x in repos_collection.find()]
    for repo_id in sorted(repo_ids):
        rebuild_content_unit_counts(db, repo_id)

I think the moral of the story is to be mindful of timeouts when iterating over large data collections.

I am not afraid of touching python code, but I think this could be a serious problem for an admin with higher standards.

Actions #1

Updated by mhrivnak over 7 years ago

The migration was added here in 2.8.5: https://pulp.plan.io/issues/1979

Actions #2

Updated by dkliban@redhat.com over 7 years ago

When we faced a similar problem in the past, we reduced the number of documents the mongo cursor returns at a time. This way the database is queried more often, but the cursor's timeout limit is not reached. Here[0] is the initial fix we tried. We ended up reducing the batch size down to 5 eventually.

[0] https://github.com/pulp/pulp/commit/f8644708e1ed15dc2d4b04f4edd77eb7bc873963

Actions #3

Updated by dkliban@redhat.com over 7 years ago

  • Sprint/Milestone set to 31
  • Triaged changed from No to Yes
Actions #4

Updated by bizhang over 7 years ago

  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High
Actions #5

Updated by daviddavis over 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
Actions #6

Updated by daviddavis over 7 years ago

  • Status changed from ASSIGNED to POST

Added by daviddavis over 7 years ago

Revision 60aad393 | View on GitHub

Fix failing migration

This migration fails if there are too many records with the error: "CursorNotFound: Cursor not found". This is because there's a timeout in mongodb (by default the cursor timeout is 10 minutes). This change fetches all ids and iterates over them as a python list.

fixes #2483 https://pulp.plan.io/issues/2483

Actions #7

Updated by daviddavis over 7 years ago

  • Status changed from POST to MODIFIED
Actions #8

Updated by semyers over 7 years ago

  • Platform Release set to 2.11.1
Actions #9

Updated by semyers over 7 years ago

  • Status changed from MODIFIED to 5
Actions #10

Updated by semyers over 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #12

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 13
Actions #13

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (31)
Actions #14

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF