Project

Profile

Help

Issue #2584

closed

Mongo cursor times out during task pulp.server.managers.content.orphan.delete_all_orphans

Added by mhrivnak about 7 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
2.12.2
OS:
RHEL 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 15
Quarter:

Description

On a very busy system, a user saw the cursor timeout:


Feb  2 04:13:52 satellite1 pulp: py.warnings:WARNING: (1955-62752) /usr/lib/python2.7/site-packages/pulp/server/managers/content/orphan.py:228: DeprecationWarning: remove is deprecated. Use delete_one or delete_many instead.
Feb  2 04:13:52 satellite1 pulp: py.warnings:WARNING: (1955-62752)   content_units_collection.remove(content_unit['_id'])
Feb  2 04:13:52 satellite1 pulp: py.warnings:WARNING: (1955-62752)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752) Task pulp.server.managers.content.orphan.delete_all_orphans[ece93a94-907b-4c1a-b4a4-474a99893be2] raised unexpected: CursorNotFound('Cursor not found, cursor id: 163452381930',)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752) Traceback (most recent call last):
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     R = retval = fun(*args, **kwargs)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 473, in __call__
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     return super(Task, self).__call__(*args, **kwargs)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 103, in __call__
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     return super(PulpTask, self).__call__(*args, **kwargs)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     return self.run(*args, **kwargs)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib/python2.7/site-packages/pulp/server/managers/content/orphan.py", line 170, in delete_all_orphans
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     OrphanManager.delete_orphans_by_type(content_type_id)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib/python2.7/site-packages/pulp/server/managers/content/orphan.py", line 219, in delete_orphans_by_type
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     fields=['_id', '_storage_path']):
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib/python2.7/site-packages/pulp/server/managers/content/orphan.py", line 106, in generate_orphans_by_type
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     for content_unit in content_units_collection.find({}, projection=fields):
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1097, in next
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     if len(self.__data) or self._refresh():
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1039, in _refresh
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     self.__max_await_time_ms))
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 903, in __send_message
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     codec_options=self.__codec_options)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)   File "/usr/lib64/python2.7/site-packages/pymongo/helpers.py", line 114, in _unpack_response
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752)     raise CursorNotFound(msg, 43, errobj)
Feb  2 04:28:43 satellite1 pulp: celery.worker.job:ERROR: (1358-62752) CursorNotFound: Cursor not found, cursor id: 163452381930

Related issues

Related to Pulp - Issue #3854: Mongo cursor times out during task pulp.server.managers.content.orphan.delete_all_orphansCLOSED - CURRENTRELEASEActions
Actions #1

Updated by mhrivnak about 7 years ago

While investigating options, I found that setting a batch size of 20 caused about a 10% slowdown of the task when iterating over about 50,000 RPMs, and not finding any orphans. That performance penalty might be worth it for the safety.

Otherwise, we could run with no batch size by default, catch the exception, and re-start the iteration with a batch size in case the cursor is lost. That would be more complex, but still fairly straight-forward. It would also mean the task taking a lot more time when encountering the error, but less time in the usual case when there is no error.

Actions #2

Updated by bizhang about 7 years ago

  • Sprint/Milestone set to 33
  • Triaged changed from No to Yes
Actions #3

Updated by mhrivnak about 7 years ago

Using a batch size of 100, I see virtually no measurable performance impact vs. not setting a batch size. Assuming the default 10 minute cursor timeout, that allows 6 seconds per loop iteration, which should be plenty. That would be an easy way to resolve this issue.

Actions #4

Updated by ttereshc about 7 years ago

+1 to batch size of 100. I had the similar experience in terms of performance while iterating over unit collections cursor with such batch size.

Actions #5

Updated by ttereshc about 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc

Added by ttereshc about 7 years ago

Revision d85f6647 | View on GitHub

Set batch_size for the cursor in orphan removal tasks

On a busy system a cursor for iterating over all units in a collection could be timed out during orphan removal. The batch size is set to 100 to make sure cursor is active, it also should not introduce any noticable performance impact.

closes #2584 https://pulp.plan.io/issues/2584

Added by ttereshc about 7 years ago

Revision d85f6647 | View on GitHub

Set batch_size for the cursor in orphan removal tasks

On a busy system a cursor for iterating over all units in a collection could be timed out during orphan removal. The batch size is set to 100 to make sure cursor is active, it also should not introduce any noticable performance impact.

closes #2584 https://pulp.plan.io/issues/2584

Actions #6

Updated by ttereshc about 7 years ago

  • Status changed from ASSIGNED to POST
Actions #7

Updated by ttereshc about 7 years ago

  • Status changed from POST to MODIFIED
Actions #8

Updated by semyers about 7 years ago

  • Platform Release set to 2.12.2
Actions #9

Updated by semyers about 7 years ago

  • Status changed from MODIFIED to 5
Actions #10

Updated by bizhang about 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #11

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 15
Actions #12

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (33)
Actions #13

Updated by daviddavis almost 6 years ago

  • Related to Issue #3854: Mongo cursor times out during task pulp.server.managers.content.orphan.delete_all_orphans added
Actions #14

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF