Project

Profile

Help

Issue #4151

Improve MetadataStep performance

Added by quba42 over 2 years ago. Updated 7 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version - Debian:
Platform Release:
Target Release - Debian:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

There appears to be a memory leak relating to the way pulp_deb interacts with python-debpkgr during the MetadataStep.
In particular, the current implementation of the MetadataStep, loads the metadata associated with a repository from the db into memory, and is then processed to gain a list of all packages associated with each architecture for each component for each release in the repository. Particularly packages with architecture = "all" will be appended to every other architecture's list (as well as a list of their own), meaning these packages will appear in multiple lists (and as a result be parsed multiple times).

Each list is then processed by a call to python-debpkgr, which proceeds to access every package file in the list, mostly to regenerate the metadata we already loaded into memory from our db. For repositories containing thousands or tens of thousands of packages (often within a single debpkgr call) this is a severe resource drain (particularly memory).

For large Debian repositories (like Ubuntu Xenial or Debian Stretch) this routinely leads to failures because the Kernel will kill the celery worker. (This has been observed on systems with 32GiB and more and typically happens after several hours of waiting for a sync.)

In practice this makes pulp_deb unusable (or at least painfully slow) for large repositories on all but the most powerful systems.

See https://community.theforeman.org/t/pulp-deb-with-celery-cannot-allocate-memory-and-out-of-memory/11789 for an example.

History

#1 Updated by quba42 over 2 years ago

The following pull request attempts to fix this problem by removing the python-debpkgr dependency from the MetadataStep.
Instead some of the functionality is handled directly in pulp, while some of it uses deb822 from python-debian (also used by python-debpkgr).

https://github.com/pulp/pulp_deb/pull/57

#2 Updated by quba42 over 2 years ago

#3 Updated by mdellweg about 2 years ago

  • Status changed from NEW to POST
  • Assignee set to quba42

#4 Updated by bmbouter about 2 years ago

  • Tags Pulp 2 added

#5 Updated by mdellweg almost 2 years ago

  • Triaged changed from No to Yes

#6 Updated by quba42 7 months ago

  • Status changed from POST to CLOSED - CURRENTRELEASE

This should have been closed "CURRENTRELEASE" some time ago.

Please register to edit this issue

Also available in: Atom PDF