Project

Profile

Help

Issue #2220

closed

Copying units between repositories hits DocumentTooLarge: BSON document too large, if source repo contains > 345,000 units of same type

Added by rmcgover almost 6 years ago. Updated over 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.7.1
Platform Release:
2.21.1
OS:
RHEL 6
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Sprint 63
Quarter:

Description

If a repo contains greater than approx. 345000 units of the same type, then attempting to associate any units of that type using API is likely to hit an error such as:

most recent call last):
  File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/async/tasks.py", line 393, in __call__
    return super(Task, self).__call__(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 437, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association.py", line 204, in associate_from_repo
    associate_us = load_associated_units(source_repo_id, criteria)
  File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association.py", line 408, in load_associated_units
    associate_us = association_query_manager.get_units(source_repo_id, criteria=criteria)
  File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association_query.py", line 205, in get_units
    return list(units_generator)
  File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association_query.py", line 572, in _merged_units_unique_units
    for unit in associated_units:
  File "/usr/lib/python2.6/site-packages/pulp/server/managers/repo/unit_association_query.py", line 498, in _units_from_chained_cursors
    for element in cursor:
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1058, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1002, in _refresh
    self.__uuid_subtype))
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 915, in __send_message
    res = client._send_message_with_response(message, **kwargs)
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_replica_set_client.py", line 1676, in _send_message_with_response
    response = self.__try_read(member, msg, **kwargs)
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_replica_set_client.py", line 1561, in __try_read
    return self.__send_and_receive(member, msg, **kwargs)
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_replica_set_client.py", line 1534, in __send_and_receive
    rqst_id, data = self.__check_bson_size(msg, member.max_bson_size)
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_replica_set_client.py", line 1469, in __check_bson_size
    (max_doc_size, max_size))
DocumentTooLarge: BSON document too large (16820289 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.

This occurs because the code appearing in the backtrace above generates a query containing every ID of the requested unit type in the source repo. BSON-encoded, this works out to >16MB if the source repo has approximately greater than 345,000 units of that type, e.g.

$ python
Python 2.7.5 (default, Aug  9 2016, 05:27:46) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bson
>>> len(bson.BSON.encode({"_id":{"$in":["24ec9b1a-d9fa-4f7d-a5d2-71dc6755a7e9"]*345000}}))
16793915

To reproduce:

  • Create a repo with yum importer, e.g. "all-rpm-content"

  • Create another empty repo with yum importer, e.g. "target"

  • Import many RPMs (maybe 346,000 to be sure)

  • POST to:

    /pulp/api/v2/repositories/target/actions/associate/
    {
      'source_repo_id' : 'all-rpm-content',
      'criteria': {
        'type_ids' : ['rpm'],
        'filters' : {
          'unit' : {
            'filename': 'test-rpm.rpm'
          }
        }
      }
    }
    

Expected result: test-rpm.rpm is associated with 'target' repo.

Actual result: association task fails with: DocumentTooLarge: BSON document too large

Although this was observed in Pulp 2.7, the Pulp 2.10 code on review seems likely to hit the same problem.

Also available in: Atom PDF