Project

Profile

Help

Issue #7480

closed

Pulp worker can consume high memory when publishing a repository with large metadata files

Added by hyu about 4 years ago. Updated about 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.21.4
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Katello, Performance, Pulp 2
Sprint:
Quarter:

Description

Clone from bugzilla 1876782

Description of problem: When publishing a repository with large metadata file (such as the others.xml.gz file in rhel-7-server-rpms). The Pulp worker can consumes more than 3GB of RAM for a few minutes. After that, the memory is freed to normally usage which is ok.

When calculating the open-size of a metadata, Pulp opens the gzip file which loads the whole gzip file into the memory.

plugins/distributors/yum/metadata/repomd.py

    if file_path.endswith('.gz'):

        open_size_element = ElementTree.SubElement(data_element, 'open-size')

        open_checksum_attributes = {'type': self.checksum_type}
        open_checksum_element = ElementTree.SubElement(data_element, 'open-checksum',
                                                       open_checksum_attributes)

        try:
            file_handle = gzip.open(file_path, 'r')   <============= Here

        except:
            # cannot have an else clause to the try without an except clause
            raise

        else:
            try:
                content = file_handle.read()
                open_size_element.text = str(len(content))
                open_checksum_element.text = self.checksum_constructor(content).hexdigest()

            finally:
                file_handle.close()

This is not quite an issue if user is syncing only a few repos. In the case of Satellite, user may sync large repositories at the same time, such as the Optimized Capsule sync. If one Capsule has 8 workers and each worker consumes 4GB+ of memory then the Capsule will run out of memory.

Steps to Reproduce:

  1. Set Pulp to use only 1 worker so that we can monitor the progress easily.
  2. Force full publish a rhel-7-server-rpms repository.
  3. Use the following command to monitor the memory usage.

watch 'ps -aux | grep reserved_resource_worker-0'

  1. The high memory consumption happens when Pulp finalizing the others.xml.gz file. You can use the following command to monitor the pulp working directory.

cd /var/cache/pulp/reserved_resource_worker-0@// watch 'ls -alrth'

Also available in: Atom PDF