Issue #5131

Updated by dalley about 1 year ago

The recursive depsolving code needs to load the contents of an entire repository into libsolv, and it needs to do so quickly. At this scale, the overhead of creating a Python model object for every single content unit becomes significant, so instead of dump the data into Python dictionaries using .as_pymongo(). Additionally, we only want to load the subset of the field data that we need for the depsolving process, so we use .only() to select only the fields we want to load, to reduce memory and database load.

The version of mongoengine currently being used is mongoengine 0.10.5, released in late 2015. It has a bug wherein using .only() in combination with .as_pymongo() in combination with nested fields on the model (e.g. a ListField) returns a list containing a single empty dictionary instead of the data it was supposed to return.

The net result of that is that we cannot load that data properly using the current version of libsolv and the .only() method with the version of mongoengine currently being shipped. method. The workaround, which was heretofore undocumented, was to instead use .excludes() to blacklist fields we don't want instead of .only() to whitelist the fields we do want. I removed this workaround accidentally during a refactor but will re-introduce it after discovering this problem.

The regression went unnoticed because the version of mongoengine provided by Pulp is overridden by a newer version in our Fedora 28 dev environment, which does not contain this defect. Only CentOS and RHEL installations demonstrate it.