Issue #6806
closed[pulp2] "BSON too large" error when unassociating from large repo
Description
When repository contains too much units (hundreds of thousands) and when user tries to remove content from it it, mongo fails with "BSON too large error"
Here's reported traceback: Apr 28 04:20:11 pulp-03 pulp: pulp.server.async.tasks:INFO: [e8e1b784] Task failed : [e8e1b784-c07e-4df4-a687-df9f858dea77] Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) Task pulp.server.managers.repo.unit_association.unassociate_by_criteria[e8e1b784-c07e-4df4-a687-df9f858dea77] raised unexpected: DocumentTooLarge('BSON document too large (17039341 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.',) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) Traceback (most recent call last): Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) R = retval = fun(*args, **kwargs) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 529, in call Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) return super(Task, self).call(*args, **kwargs) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 107, in call Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) return super(PulpTask, self).call(*args, **kwargs) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in protected_call Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) return self.run(*args, **kwargs) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/unit_association.py", line 359, in unassociate_by_criteria Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) unassociate_units = load_associated_units(repo_id, criteria) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/unit_association.py", line 443, in load_associated_units Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) associate_us = association_query_manager.get_units(source_repo_id, criteria=criteria) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/unit_association_query.py", line 160, in get_units Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) return list(units_generator) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/unit_association_query.py", line 530, in _merged_units_unique_units Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) for unit in associated_units: Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1097, in next Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) if len(self.__data) or self._refresh(): Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1019, in _refresh Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) self.__read_concern)) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 850, in __send_message Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) **kwargs) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 794, in _send_message_with_response Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) exhaust) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 805, in _reset_on_error Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) return func(*args, **kwargs) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib64/python2.7/site-packages/pymongo/server.py", line 119, in send_message_with_response Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) sock_info.send_message(data, max_doc_size) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 234, in send_message Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) (max_doc_size, self.max_bson_size)) Apr 28 04:20:11 pulp-03 pulp: celery.app.trace:ERROR: [e8e1b784] (43211-12032) DocumentTooLarge: BSON document too large (17039341 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.
Files
When unassociating from large repos, to connect repo unit with content unit pulp queries all units in the repository and then query content units according to the unit filters + _id in returned by query to repo units. If this is too large, mongo will fail with BSON document too large error. This commit changes the approach of querying the db. Db is queried for unit_ids in batches to void sending too big query. As resulted units are yield from the method, there won't be any noticable difference outside of this method
closes #6806