Project

Profile

Help

Issue #5058

ISO publish fails with BSON document too large

Added by ipanova@redhat.com 4 months ago. Updated 4 months ago.

Status:
POST
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

A scaling issue has been discovered when publishing isos via fast forward way.

BSON document too large (20946918 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py", line 181, in publish_repo_fast_forward
    unit_absent_set = publish_conduit.get_units(criteria=criteria)
  File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 173, in get_units
    return do_get_repo_units(self.repo_id, criteria, self.exception_class, as_generator)
  File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 704, in do_get_repo_units
    return list(_transfer_object_generator())
  File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 691, in _transfer_object_generator
    for u in units:
  File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/unit_association_query.py", line 530, in _merged_units_unique_units
    for unit in associated_units:
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1097, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1019, in _refresh
    self.__read_concern))
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 850, in __send_message
    **kwargs)
  File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 794, in _send_message_with_response
    exhaust)
  File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 805, in _reset_on_error
    return func(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/pymongo/server.py", line 119, in send_message_with_response
    sock_info.send_message(data, max_doc_size)
  File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 228, in send_message
    (max_doc_size, self.max_bson_size))
DistributorConduitException: BSON document too large (20946918 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.

As noticed by Content Delivery team, the problem is coming from here:

/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py

                # Copy incremental files into publishing directories
                checksum_absent_set = unit_checksum_set - unit_checksum_old_set
                criteria = UnitAssociationCriteria(
                    unit_filters={'checksum': {"$in": list(checksum_absent_set)}})
                unit_absent_set = publish_conduit.get_units(criteria=criteria)
                for unit in unit_absent_set:
                    links_to_create = self.get_paths_for_unit(unit)
                    self._symlink_unit(build_dir, unit, links_to_create)

There's a limit to how large a single mongo query can be. If checksum_absent_set contains too many elements, the query in above code will exceed that limit and crash. We apparently have enough items in redhat-sigstore to hit this limit.

Associated revisions

Revision 51e65fe4 View on GitHub
Added by Zhiming 3 months ago

Fixing ISO publish fails with BSON document too large

There's a limit to how large a single mongo query can be (16 Mbytes).
If too many (>134k) units are published, the query to fetch units
will exceed that limit and crash.

This fix sets a threshold to avoid generating too large query and
crash.

If exceeding the threshold, go back to regular full publish since
there won't be much benefit of fast-forward anyway.

ref #5058
https://pulp.plan.io/issues/5058

Limit criteria to fields as needed

Limiting the criteria to fields as needed to save returned result
from the query.

History

#2 Updated by daviddavis 4 months ago

  • Tags Pulp 2 added

#3 Updated by amacdona@redhat.com 4 months ago

  • Triaged changed from No to Yes

#4 Updated by ipanova@redhat.com 4 months ago

  • Status changed from NEW to POST

Please register to edit this issue

Also available in: Atom PDF