Issue #7480
closedPulp worker can consume high memory when publishing a repository with large metadata files
Description
Clone from bugzilla 1876782
Description of problem: When publishing a repository with large metadata file (such as the others.xml.gz file in rhel-7-server-rpms). The Pulp worker can consumes more than 3GB of RAM for a few minutes. After that, the memory is freed to normally usage which is ok.
When calculating the open-size of a metadata, Pulp opens the gzip file which loads the whole gzip file into the memory.
plugins/distributors/yum/metadata/repomd.py¶
if file_path.endswith('.gz'):
open_size_element = ElementTree.SubElement(data_element, 'open-size')
open_checksum_attributes = {'type': self.checksum_type}
open_checksum_element = ElementTree.SubElement(data_element, 'open-checksum',
open_checksum_attributes)
try:
file_handle = gzip.open(file_path, 'r') <============= Here
except:
# cannot have an else clause to the try without an except clause
raise
else:
try:
content = file_handle.read()
open_size_element.text = str(len(content))
open_checksum_element.text = self.checksum_constructor(content).hexdigest()
finally:
file_handle.close()
This is not quite an issue if user is syncing only a few repos. In the case of Satellite, user may sync large repositories at the same time, such as the Optimized Capsule sync. If one Capsule has 8 workers and each worker consumes 4GB+ of memory then the Capsule will run out of memory.
Steps to Reproduce:
- Set Pulp to use only 1 worker so that we can monitor the progress easily.
- Force full publish a rhel-7-server-rpms repository.
- Use the following command to monitor the memory usage.
watch 'ps -aux | grep reserved_resource_worker-0'
- The high memory consumption happens when Pulp finalizing the others.xml.gz file. You can use the following command to monitor the pulp working directory.
cd /var/cache/pulp/reserved_resource_worker-0@// watch 'ls -alrth'
Fix high memory issue when calculating metadata open checksum
closes #7480 https://pulp.plan.io/issues/7480