Project

Profile

Help

Issue #3172

Celery worker consumes large number of memory when regenerating applicability for a consumer that binds to many repositories with many errata.

Added by hyu about 2 years ago. Updated 11 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
2.16.2
Blocks Release:
OS:
RHEL 7
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
Yes
Tags:
Pulp 2
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 37

Description

The celery worker is consuming about 60MB RAM initially. After running regenerating applicability for a consumer that binds to 9 repositories, it increased to about 350MB+ and the RAM will never be freed.

I think below are the reason of high memory consumption.

Pulp is fetching the pkglist from all the repositories that a particular Erratum is associated to. This is expensive and the results may contain a lot of duplicate pkglist.

For example, Pulp makes this query:

db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886"}).count()

3

Instead of doing the following:

db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886", "repo_id" : "my_org-Red_Hat_Enterprise_Linux_Server-Red_Hat_Satellite_Tools_6_2_for_RHEL_7_Server_RPMs_x86_64"}).count()

1

After amending the "erratum_pkglists" query to filter the errata by repository, the memory consumption and the speed are reduced by 80%

I think I understand why Pulp don't filter the pkglist by repository when regenerating applicability. It is due to the fact that one entry may not contain all the pkglist since an erratum can be copied accross repositories.

I made the following change to retrieve only the "nevra" of the errata pkglist when regenerating applicability for consumer. This patch can reduce the memory consumption by ~50% (350MB to 150MB) for a consumer with 9 repositories.

https://github.com/hao-yu/pulp_rpm/commit/9f5a52823afee80b31c1e3aa14f4f65fc85f9be9

Associated revisions

Revision 6a400906 View on GitHub
Added by ttereshc almost 2 years ago

Use aggregation to identify unique errata pkglists

To improve both performance and memory consumption of celery workers during applicability regeneration. Serializer for Errata now deals wit unique pkglists as well.

closes #3172 https://pulp.plan.io/issues/3172

Revision 8c852a12 View on GitHub
Added by ttereshc almost 2 years ago

Use aggregation for recursive copy of errata + API response fix

Revision a679f26e View on GitHub
Added by ttereshc almost 2 years ago

Respect search criteria in the erratum serializer

Serializer should not add a pkglist to an erratum if it was excluded from fields in search criteria. Serializer handles the case when errata_id is absent in fields in search criteria while pkglist is not.

re #3172 https://pulp.plan.io/issues/3172

Revision 759ff1c0 View on GitHub
Added by ttereshc over 1 year ago

Use aggregation to identify unique errata pkglists

To improve both performance and memory consumption of celery workers during applicability regeneration. Serializer for Errata now deals wit unique pkglists as well.

closes #3172 https://pulp.plan.io/issues/3172

(cherry picked from commit 6a40090685f8295d42419a1641351998fa75e709)

Revision 8456d136 View on GitHub
Added by ttereshc over 1 year ago

Respect search criteria in the erratum serializer

Serializer should not add a pkglist to an erratum if it was excluded from fields in search criteria. Serializer handles the case when errata_id is absent in fields in search criteria while pkglist is not.

re #3172 https://pulp.plan.io/issues/3172

(cherry picked from commit a679f26eb7d7fe2ec006bf4e6db1d0841715a107)

History

#2 Updated by dalley about 2 years ago

  • Sprint/Milestone set to 48
  • Triaged changed from No to Yes

#4 Updated by rchan about 2 years ago

  • Sprint/Milestone changed from 48 to 52

#5 Updated by hyu about 2 years ago

The issue happened in Satellite 6 when I have some organizations with the same repositories.

To reproduce the issues I created additional 2 organizations (total 3 orgs) and synced the following repositories for each organization.

Red Hat Enterprise Linux 6 Server RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - RH Common RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - Optional RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - Fastrack RPMs x86_64
Red Hat Enterprise Linux 6 Server - Extras RPMs x86_64

Result of generate applicability without my patch:

It consumed ~500MB RSS. Before I created additional 2 organizations it was consuming ~350MB. This proof that the memory will

apache 1811 34.5 6.2 1017124 487380 ? Sl 12:48 0:58 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=20

Result of generate applicability with my patch from upstream bug:

It memory is stable at ~150MB RSS before and after creating additional organizations.

apache 2629 11.0 1.9 684440 150632 ? Sl 12:58 0:54 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=20

Although the memory has significantly reduced with my patch but there is only a little or no performance improvement. it took about 1 minute to run the "generate applicability" task.

#6 Updated by ttereshc about 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc

#7 Updated by rchan about 2 years ago

  • Sprint/Milestone changed from 52 to 53

#8 Updated by jortel@redhat.com about 2 years ago

  • Sprint/Milestone changed from 53 to 54

#9 Updated by rchan about 2 years ago

  • Sprint/Milestone changed from 54 to 56

#10 Updated by bmbouter almost 2 years ago

  • Sprint set to Sprint 33

#11 Updated by bmbouter almost 2 years ago

  • Sprint/Milestone deleted (56)

#12 Updated by jortel@redhat.com almost 2 years ago

  • Sprint Candidate changed from No to Yes
  • Sprint deleted (Sprint 33)

#13 Updated by ttereshc almost 2 years ago

  • Status changed from ASSIGNED to POST

#14 Updated by ttereshc almost 2 years ago

  • Sprint set to Sprint 36

#15 Updated by rchan almost 2 years ago

  • Sprint changed from Sprint 36 to Sprint 37

#16 Updated by ttereshc almost 2 years ago

  • Status changed from POST to MODIFIED

#18 Updated by dkliban@redhat.com over 1 year ago

  • Platform Release set to 2.16.2

#19 Updated by ttereshc over 1 year ago

#22 Updated by ipanova@redhat.com over 1 year ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

#23 Updated by bmbouter 11 months ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF