Publishing repository with large metadata may consume high memory when calculating checksum of the metadata.
Katello, Performance, Pulp 2
Pulp is consuming high memory when publishing RHEL 7 repository. This is happening when Pulp is calculating the checksum of the metadata. It reads the whole metadata file into memory at once to calculates the checksum. For example, the other.xml.gz (compressed ) for RHEL 7 repository is about 837MB size. Reading the entire file into memory will cause Pulp worker to consume more than 1GB for RAM.
How to reproduce:
- Sync the RHEL 7 repository.
- After that manually force full publish it and run the below command to observe the memory usage.
watch -n 1 'ps -aux | grep resource_worker
- The memory usage should be stable between 200MB to 350MB all the time, but will suddenly go up to about 1.1GB for about 3 seconds (around finalizing the publish rpms step) then back to 200MB+.
Reduce the memory usage when calculating checksum
Read the metadata file in chunk when calculating its checksum to save memory.
closes: #9553 https://pulp.plan.io/issues/9553