Issue #5058
closedISO publish fails with BSON document too large
Description
A scaling issue has been discovered when publishing isos via fast forward way.
BSON document too large (20946918 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py", line 181, in publish_repo_fast_forward
unit_absent_set = publish_conduit.get_units(criteria=criteria)
File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 173, in get_units
return do_get_repo_units(self.repo_id, criteria, self.exception_class, as_generator)
File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 704, in do_get_repo_units
return list(_transfer_object_generator())
File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 691, in _transfer_object_generator
for u in units:
File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/unit_association_query.py", line 530, in _merged_units_unique_units
for unit in associated_units:
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1097, in next
if len(self.__data) or self._refresh():
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1019, in _refresh
self.__read_concern))
File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 850, in __send_message
**kwargs)
File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 794, in _send_message_with_response
exhaust)
File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 805, in _reset_on_error
return func(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/pymongo/server.py", line 119, in send_message_with_response
sock_info.send_message(data, max_doc_size)
File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 228, in send_message
(max_doc_size, self.max_bson_size))
DistributorConduitException: BSON document too large (20946918 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.
As noticed by Content Delivery team, the problem is coming from here:
/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py
# Copy incremental files into publishing directories
checksum_absent_set = unit_checksum_set - unit_checksum_old_set
criteria = UnitAssociationCriteria(
unit_filters={'checksum': {"$in": list(checksum_absent_set)}})
unit_absent_set = publish_conduit.get_units(criteria=criteria)
for unit in unit_absent_set:
links_to_create = self.get_paths_for_unit(unit)
self._symlink_unit(build_dir, unit, links_to_create)
There's a limit to how large a single mongo query can be. If checksum_absent_set contains too many elements, the query in above code will exceed that limit and crash. We apparently have enough items in redhat-sigstore to hit this limit.
Updated by amacdona@redhat.com almost 5 years ago
- Triaged changed from No to Yes
Updated by ipanova@redhat.com almost 5 years ago
- Status changed from NEW to POST
Added by Zhiming almost 5 years ago
Updated by rchan over 4 years ago
@ipanova - The PR looks like it was merged to master but the issue is still in post & the issue does not indicate that it is in 2.21.0 (still in POST/no 2.21.0 tag) - is that accurate? Should it be included in the next release?
Updated by ipanova@redhat.com over 4 years ago
- Status changed from POST to MODIFIED
Thanks for noticing, it should be in modified state and included in the next upcoming release.
Updated by Zhiming over 4 years ago
I checked the case which reported the exception in the description, the task tried to publish around 1~2 k units, it means the size of checksum_absent_set is around 1 ~ 2K, so I don' think the issue is caused by the size of checksum_absent_set.
Perphaps, "publish_conduit.get_units(...)" should be improved, it generates a long query including all unit_id in the repo. In this case, the repo has ~346K units, the long query size can be calculated roughly as the way mentioned in https://pulp.plan.io/issues/2220.
>>> import bson
>>> len(bson.BSON.encode({"_id":{"$in":["24ec9b1a-d9fa-4f7d-a5d2-71dc6755a7e9"]*346000}}))
16842915
Since "get_units(...)" generates a long query, so it may be improved by limiting the query length, i.e. split the long query to small queries to fetch data from mongodb.
Updated by Zhiming over 4 years ago
I also captured some exceptions from other cases.
"traceback": "Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py", line 144, in publish_repo_fast_forward
units = publish_conduit.get_units()
File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 173, in get_units
return do_get_repo_units(self.repo_id, criteria, self.exception_class, as_generator)
......
File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 228, in send_message
(max_doc_size, self.max_bson_size))
DistributorConduitException: BSON document too large (16909537 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.
The exception is reported by "publish_conduit.get_units()", it will occur if disable fast-forward and switch to "force_full". So I guess we need to improve "get_units()".
Updated by ipanova@redhat.com about 4 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
- Platform Release changed from 2.21.1 to 2.21.0
Fixing ISO publish fails with BSON document too large
There's a limit to how large a single mongo query can be (16 Mbytes). If too many (>134k) units are published, the query to fetch units will exceed that limit and crash.
This fix sets a threshold to avoid generating too large query and crash.
If exceeding the threshold, go back to regular full publish since there won't be much benefit of fast-forward anyway.
ref #5058 https://pulp.plan.io/issues/5058
Limit criteria to fields as needed
Limiting the criteria to fields as needed to save returned result from the query.