Issue #2220
closedCopying units between repositories hits DocumentTooLarge: BSON document too large, if source repo contains > 345,000 units of same type
Description
If a repo contains greater than approx. 345000 units of the same type, then attempting to associate any units of that type using API is likely to hit an error such as:
most recent call last):
File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/pulp/server/async/tasks.py", line 393, in __call__
return super(Task, self).__call__(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association.py", line 204, in associate_from_repo
associate_us = load_associated_units(source_repo_id, criteria)
File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association.py", line 408, in load_associated_units
associate_us = association_query_manager.get_units(source_repo_id, criteria=criteria)
File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association_query.py", line 205, in get_units
return list(units_generator)
File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association_query.py", line 572, in _merged_units_unique_units
for unit in associated_units:
File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association_query.py", line 498, in _units_from_chained_cursors
for element in cursor:
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1058, in next
if len(self.__data) or self._refresh():
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1002, in _refresh
self.__uuid_subtype))
File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 915, in __send_message
res = client._send_message_with_response(message, **kwargs)
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_replica_set_client.py", line 1676, in _send_message_with_response
response = self.__try_read(member, msg, **kwargs)
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_replica_set_client.py", line 1561, in __try_read
return self.__send_and_receive(member, msg, **kwargs)
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_replica_set_client.py", line 1534, in __send_and_receive
rqst_id, data = self.__check_bson_size(msg, member.max_bson_size)
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_replica_set_client.py", line 1469, in __check_bson_size
(max_doc_size, max_size))
DocumentTooLarge: BSON document too large (16820289 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
This occurs because the code appearing in the backtrace above generates a query containing every ID of the requested unit type in the source repo. BSON-encoded, this works out to >16MB if the source repo has approximately greater than 345,000 units of that type, e.g.
$ python
Python 2.7.5 (default, Aug 9 2016, 05:27:46)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bson
>>> len(bson.BSON.encode({"_id":{"$in":["24ec9b1a-d9fa-4f7d-a5d2-71dc6755a7e9"]*345000}}))
16793915
To reproduce:
-
Create a repo with yum importer, e.g. "all-rpm-content"
-
Create another empty repo with yum importer, e.g. "target"
-
Import many RPMs (maybe 346,000 to be sure)
-
POST to:
/pulp/api/v2/repositories/target/actions/associate/ { 'source_repo_id' : 'all-rpm-content', 'criteria': { 'type_ids' : ['rpm'], 'filters' : { 'unit' : { 'filename': 'test-rpm.rpm' } } } }
Expected result: test-rpm.rpm is associated with 'target' repo.
Actual result: association task fails with: DocumentTooLarge: BSON document too large
Although this was observed in Pulp 2.7, the Pulp 2.10 code on review seems likely to hit the same problem.