Issue #2540
closedSyncing repo with 200,000+ RPMs causes a BSONObj size limit exception
Description
Description as copied from Bugzilla:
Running on rhel 6 with mongo 2.4 on a very large satellite install (with ~200,000 rpms), it is possible to get an error on sync:
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/pulp/server/async/tasks.py", line 473, in __call__
return super(Task, self).__call__(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/pulp/server/async/tasks.py", line 103, in __call__
return super(PulpTask, self).__call__(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/pulp/server/controllers/repository.py", line 810, in sync
raise pulp_exceptions.PulpExecutionException(_('Importer indicated a failed response'))
PulpExecutionException: Importer indicated a failed response
'command SON([(''mapreduce'', u''units_rpm''), (''map'', Code("
function() {
var key_fields = [this.name, this.epoch, this.version, this.release, this.arch]
emit(key_fields.join(''-''), {ids: [this._id]});
}
", {})), (''reduce'', Code("
function (key, values) {
// collect mapped values into the first value to build the list of ids for this key/nevra
var
collector = values[0]
// since collector is values[0] start this loop at index 1
// reduce isn''t called if map only emits one result for key,
// so there is at least one value to collect
for(var i = 1; i < values.length; i++) {
collector.ids = collector.ids.concat(values[i].ids)
}
returncollector
}
", {})), (''out'', {''inline'': 1}), (''query'', {}), (''finalize'', Code("
function (key, reduced) {
if (reduced.ids.length > 1) {
return reduced;
}
// if there''s only one value after reduction, this key is useless
// undefined is implicitly returned here, which saves space
}
", {}))]) on namespace pulp_database.$cmd failed: exception: BSONObj size: 18210078 (0x1EDD1501) is invalid. Size must be between 0 and 16793600(16MB) First element: 0:
{ _id: "0ad-0-0.0.20-4.el7-x86_64", value: null }'
The mapreduce code seen in that traceback appears here:
https://github.com/pulp/pulp_rpm/blob/2.12-dev/plugins/pulp_rpm/plugins/importers/yum/purge.py#L466-L479
It should only be run when "annotate" is unavailable in mongodb, indicating a version of mongo 2.4 or lower. which happens on el6. This problem shouldn't occur when "annotate" is available, since the most notable difference between the methods is that when using "annotate" mongo is able to return a cursor that can be used to gather results iteratively, where the "mapreduce" method returns all of the mapped/reduced data in a single document.