Project

Profile

Help

Issue #5058

closed

ISO publish fails with BSON document too large

Added by ipanova@redhat.com almost 5 years ago. Updated about 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.21.0
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

A scaling issue has been discovered when publishing isos via fast forward way.

BSON document too large (20946918 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py", line 181, in publish_repo_fast_forward
    unit_absent_set = publish_conduit.get_units(criteria=criteria)
  File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 173, in get_units
    return do_get_repo_units(self.repo_id, criteria, self.exception_class, as_generator)
  File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 704, in do_get_repo_units
    return list(_transfer_object_generator())
  File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 691, in _transfer_object_generator
    for u in units:
  File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/unit_association_query.py", line 530, in _merged_units_unique_units
    for unit in associated_units:
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1097, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 1019, in _refresh
    self.__read_concern))
  File "/usr/lib64/python2.7/site-packages/pymongo/cursor.py", line 850, in __send_message
    **kwargs)
  File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 794, in _send_message_with_response
    exhaust)
  File "/usr/lib64/python2.7/site-packages/pymongo/mongo_client.py", line 805, in _reset_on_error
    return func(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/pymongo/server.py", line 119, in send_message_with_response
    sock_info.send_message(data, max_doc_size)
  File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 228, in send_message
    (max_doc_size, self.max_bson_size))
DistributorConduitException: BSON document too large (20946918 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.

As noticed by Content Delivery team, the problem is coming from here:

/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py

                # Copy incremental files into publishing directories
                checksum_absent_set = unit_checksum_set - unit_checksum_old_set
                criteria = UnitAssociationCriteria(
                    unit_filters={'checksum': {"$in": list(checksum_absent_set)}})
                unit_absent_set = publish_conduit.get_units(criteria=criteria)
                for unit in unit_absent_set:
                    links_to_create = self.get_paths_for_unit(unit)
                    self._symlink_unit(build_dir, unit, links_to_create)

There's a limit to how large a single mongo query can be. If checksum_absent_set contains too many elements, the query in above code will exceed that limit and crash. We apparently have enough items in redhat-sigstore to hit this limit.

Actions #2

Updated by daviddavis almost 5 years ago

  • Tags Pulp 2 added
Actions #3

Updated by amacdona@redhat.com almost 5 years ago

  • Triaged changed from No to Yes
Actions #4

Updated by ipanova@redhat.com almost 5 years ago

  • Status changed from NEW to POST

Added by Zhiming almost 5 years ago

Revision 51e65fe4 | View on GitHub

Fixing ISO publish fails with BSON document too large

There's a limit to how large a single mongo query can be (16 Mbytes). If too many (>134k) units are published, the query to fetch units will exceed that limit and crash.

This fix sets a threshold to avoid generating too large query and crash.

If exceeding the threshold, go back to regular full publish since there won't be much benefit of fast-forward anyway.

ref #5058 https://pulp.plan.io/issues/5058

Limit criteria to fields as needed

Limiting the criteria to fields as needed to save returned result from the query.

Actions #5

Updated by rchan over 4 years ago

@ipanova - The PR looks like it was merged to master but the issue is still in post & the issue does not indicate that it is in 2.21.0 (still in POST/no 2.21.0 tag) - is that accurate? Should it be included in the next release?

Actions #6

Updated by ipanova@redhat.com over 4 years ago

  • Status changed from POST to MODIFIED

Thanks for noticing, it should be in modified state and included in the next upcoming release.

Actions #7

Updated by Zhiming over 4 years ago

I checked the case which reported the exception in the description, the task tried to publish around 1~2 k units, it means the size of checksum_absent_set is around 1 ~ 2K, so I don' think the issue is caused by the size of checksum_absent_set.

Perphaps, "publish_conduit.get_units(...)" should be improved, it generates a long query including all unit_id in the repo. In this case, the repo has ~346K units, the long query size can be calculated roughly as the way mentioned in https://pulp.plan.io/issues/2220.

>>> import bson
>>> len(bson.BSON.encode({"_id":{"$in":["24ec9b1a-d9fa-4f7d-a5d2-71dc6755a7e9"]*346000}}))
16842915

Since "get_units(...)" generates a long query, so it may be improved by limiting the query length, i.e. split the long query to small queries to fetch data from mongodb.

Actions #8

Updated by Zhiming over 4 years ago

I also captured some exceptions from other cases.

"traceback": "Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pulp/plugins/file/distributor.py", line 144, in publish_repo_fast_forward
units = publish_conduit.get_units()
File "/usr/lib/python2.7/site-packages/pulp/plugins/conduits/mixins.py", line 173, in get_units
return do_get_repo_units(self.repo_id, criteria, self.exception_class, as_generator)
......
File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 228, in send_message
(max_doc_size, self.max_bson_size))
DistributorConduitException: BSON document too large (16909537 bytes) - the connected serversupports BSON document sizes up to 16777216 bytes.

The exception is reported by "publish_conduit.get_units()", it will occur if disable fast-forward and switch to "force_full". So I guess we need to improve "get_units()".

Actions #9

Updated by ipanova@redhat.com over 4 years ago

  • Platform Release set to 2.21.1
Actions #10

Updated by ipanova@redhat.com about 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
  • Platform Release changed from 2.21.1 to 2.21.0

Also available in: Atom PDF