Project

Profile

Help

Issue #2540

closed

Syncing repo with 200,000+ RPMs causes a BSONObj size limit exception

Added by semyers about 7 years ago. Updated almost 5 years ago.

Status:
CLOSED - WONTFIX
Priority:
High
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 16
Quarter:

Description

Description as copied from Bugzilla:

Running on rhel 6 with mongo 2.4 on a very large satellite install (with ~200,000 rpms), it is possible to get an error on sync:

    Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 240, in trace_task
        R = retval = fun(*args, **kwargs)
      File "/usr/lib/python2.6/site-packages/pulp/server/async/tasks.py", line 473, in __call__
        return super(Task, self).__call__(*args, **kwargs)
      File "/usr/lib/python2.6/site-packages/pulp/server/async/tasks.py", line 103, in __call__
        return super(PulpTask, self).__call__(*args, **kwargs)
      File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 437, in __protected_call__
        return self.run(*args, **kwargs)
      File "/usr/lib/python2.6/site-packages/pulp/server/controllers/repository.py", line 810, in sync
        raise pulp_exceptions.PulpExecutionException(_('Importer indicated a failed response'))
    PulpExecutionException: Importer indicated a failed response

'command SON([(''mapreduce'', u''units_rpm''), (''map'', Code("
    function() {
        var key_fields = [this.name, this.epoch, this.version, this.release, this.arch]
         emit(key_fields.join(''-''), {ids: [this._id]});
    }
    ", {})), (''reduce'', Code("
    function (key, values) {
      // collect mapped values into the first value to build the list of ids for this key/nevra
      var
collector = values[0]
      // since collector is values[0] start this loop at index 1
      // reduce isn''t called if map only emits one result for key,
      // so there is at least one value to collect
      for(var i = 1; i < values.length; i++) {
        collector.ids = collector.ids.concat(values[i].ids)
      }
      returncollector
    }
    ", {})), (''out'', {''inline'': 1}), (''query'', {}), (''finalize'', Code("
    function (key, reduced) {
        if (reduced.ids.length > 1) {
  return reduced;
        }
        // if there''s only one value after reduction, this key is useless
        // undefined is implicitly returned here, which saves space
    }
    ", {}))]) on namespace pulp_database.$cmd failed: exception: BSONObj size: 18210078 (0x1EDD1501) is invalid. Size must be between 0 and 16793600(16MB) First element: 0:
{ _id: "0ad-0-0.0.20-4.el7-x86_64", value: null }'

The mapreduce code seen in that traceback appears here:
https://github.com/pulp/pulp_rpm/blob/2.12-dev/plugins/pulp_rpm/plugins/importers/yum/purge.py#L466-L479

It should only be run when "annotate" is unavailable in mongodb, indicating a version of mongo 2.4 or lower. which happens on el6. This problem shouldn't occur when "annotate" is available, since the most notable difference between the methods is that when using "annotate" mongo is able to return a cursor that can be used to gather results iteratively, where the "mapreduce" method returns all of the mapped/reduced data in a single document.

Also available in: Atom PDF