Project

Profile

Help

Issue #9553

closed

Publishing repository with large metadata may consume high memory when calculating checksum of the metadata.

Added by hyu over 2 years ago. Updated over 2 years ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.21.1
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello, Performance, Pulp 2
Sprint:
Quarter:

Description

Pulp is consuming high memory when publishing RHEL 7 repository. This is happening when Pulp is calculating the checksum of the metadata. It reads the whole metadata file into memory at once to calculates the checksum. For example, the other.xml.gz (compressed ) for RHEL 7 repository is about 837MB size. Reading the entire file into memory will cause Pulp worker to consume more than 1GB for RAM.

https://github.com/pulp/pulp/blob/2-master/server/pulp/plugins/util/metadata_writer.py#L99-L101.

How to reproduce:

  1. Sync the RHEL 7 repository.
  2. After that manually force full publish it and run the below command to observe the memory usage.

watch -n 1 'ps -aux | grep resource_worker

  1. The memory usage should be stable between 200MB to 350MB all the time, but will suddenly go up to about 1.1GB for about 3 seconds (around finalizing the publish rpms step) then back to 200MB+.

Also available in: Atom PDF