Project

Profile

Help

Issue #3172

closed

Celery worker consumes large number of memory when regenerating applicability for a consumer that binds to many repositories with many errata.

Added by hyu over 6 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
2.16.2
OS:
RHEL 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Sprint 37
Quarter:

Description

The celery worker is consuming about 60MB RAM initially. After running regenerating applicability for a consumer that binds to 9 repositories, it increased to about 350MB+ and the RAM will never be freed.

I think below are the reason of high memory consumption.

Pulp is fetching the pkglist from all the repositories that a particular Erratum is associated to. This is expensive and the results may contain a lot of duplicate pkglist.

For example, Pulp makes this query:

db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886"}).count()

3

Instead of doing the following:

db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886", "repo_id" : "my_org-Red_Hat_Enterprise_Linux_Server-Red_Hat_Satellite_Tools_6_2_for_RHEL_7_Server_RPMs_x86_64"}).count()

1

After amending the "erratum_pkglists" query to filter the errata by repository, the memory consumption and the speed are reduced by 80%

I think I understand why Pulp don't filter the pkglist by repository when regenerating applicability. It is due to the fact that one entry may not contain all the pkglist since an erratum can be copied accross repositories.

I made the following change to retrieve only the "nevra" of the errata pkglist when regenerating applicability for consumer. This patch can reduce the memory consumption by ~50% (350MB to 150MB) for a consumer with 9 repositories.

https://github.com/hao-yu/pulp_rpm/commit/9f5a52823afee80b31c1e3aa14f4f65fc85f9be9

Also available in: Atom PDF