Actions
Issue #5058
closedISO publish fails with BSON document too large
Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.21.0
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:
Description
A scaling issue has been discovered when publishing isos via fast forward way.
BSON document too large (20946918 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py", line 181, in publish_repo_fast_forward
unit_absent_set = publish_conduit.get_units(criteria=criteria)
File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 173, in get_units
return do_get_repo_units(self.repo_id, criteria, self.exception_class, as_generator)
File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 704, in do_get_repo_units
return list(_transfer_object_generator())
File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 691, in _transfer_object_generator
for u in units:
File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/unit_association_query.py", line 530, in _merged_units_unique_units
for unit in associated_units:
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1097, in next
if len(self.__data) or self._refresh():
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1019, in _refresh
self.__read_concern))
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 850, in __send_message
**kwargs)
File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 794, in _send_message_with_response
exhaust)
File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 805, in _reset_on_error
return func(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/pymongo/server.py", line 119, in send_message_with_response
sock_info.send_message(data, max_doc_size)
File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 228, in send_message
(max_doc_size, self.max_bson_size))
DistributorConduitException: BSON document too large (20946918 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.
As noticed by Content Delivery team, the problem is coming from here:
/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py
# Copy incremental files into publishing directories
checksum_absent_set = unit_checksum_set - unit_checksum_old_set
criteria = UnitAssociationCriteria(
unit_filters={'checksum': {"$in": list(checksum_absent_set)}})
unit_absent_set = publish_conduit.get_units(criteria=criteria)
for unit in unit_absent_set:
links_to_create = self.get_paths_for_unit(unit)
self._symlink_unit(build_dir, unit, links_to_create)
There's a limit to how large a single mongo query can be. If checksum_absent_set contains too many elements, the query in above code will exceed that limit and crash. We apparently have enough items in redhat-sigstore to hit this limit.
Actions
Fixing ISO publish fails with BSON document too large
There's a limit to how large a single mongo query can be (16 Mbytes). If too many (>134k) units are published, the query to fetch units will exceed that limit and crash.
This fix sets a threshold to avoid generating too large query and crash.
If exceeding the threshold, go back to regular full publish since there won't be much benefit of fast-forward anyway.
ref #5058 https://pulp.plan.io/issues/5058
Limit criteria to fields as needed
Limiting the criteria to fields as needed to save returned result from the query.