Project

Profile

Help

Issue #3172

closed

Celery worker consumes large number of memory when regenerating applicability for a consumer that binds to many repositories with many errata.

Added by hyu almost 7 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
2.16.2
OS:
RHEL 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Sprint 37
Quarter:

Description

The celery worker is consuming about 60MB RAM initially. After running regenerating applicability for a consumer that binds to 9 repositories, it increased to about 350MB+ and the RAM will never be freed.

I think below are the reason of high memory consumption.

Pulp is fetching the pkglist from all the repositories that a particular Erratum is associated to. This is expensive and the results may contain a lot of duplicate pkglist.

For example, Pulp makes this query:

db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886"}).count()

3

Instead of doing the following:

db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886", "repo_id" : "my_org-Red_Hat_Enterprise_Linux_Server-Red_Hat_Satellite_Tools_6_2_for_RHEL_7_Server_RPMs_x86_64"}).count()

1

After amending the "erratum_pkglists" query to filter the errata by repository, the memory consumption and the speed are reduced by 80%

I think I understand why Pulp don't filter the pkglist by repository when regenerating applicability. It is due to the fact that one entry may not contain all the pkglist since an erratum can be copied accross repositories.

I made the following change to retrieve only the "nevra" of the errata pkglist when regenerating applicability for consumer. This patch can reduce the memory consumption by ~50% (350MB to 150MB) for a consumer with 9 repositories.

https://github.com/hao-yu/pulp_rpm/commit/9f5a52823afee80b31c1e3aa14f4f65fc85f9be9

Actions #2

Updated by dalley almost 7 years ago

  • Sprint/Milestone set to 48
  • Triaged changed from No to Yes
Actions #4

Updated by rchan almost 7 years ago

  • Sprint/Milestone changed from 48 to 52
Actions #5

Updated by hyu almost 7 years ago

The issue happened in Satellite 6 when I have some organizations with the same repositories.

To reproduce the issues I created additional 2 organizations (total 3 orgs) and synced the following repositories for each organization.

Red Hat Enterprise Linux 6 Server RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - RH Common RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - Optional RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - Fastrack RPMs x86_64
Red Hat Enterprise Linux 6 Server - Extras RPMs x86_64

Result of generate applicability without my patch:

It consumed ~500MB RSS. Before I created additional 2 organizations it was consuming ~350MB. This proof that the memory will

apache 1811 34.5 6.2 1017124 487380 ? Sl 12:48 0:58 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=20

Result of generate applicability with my patch from upstream bug:

It memory is stable at ~150MB RSS before and after creating additional organizations.

apache 2629 11.0 1.9 684440 150632 ? Sl 12:58 0:54 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=20

Although the memory has significantly reduced with my patch but there is only a little or no performance improvement. it took about 1 minute to run the "generate applicability" task.

Actions #6

Updated by ttereshc almost 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc
Actions #7

Updated by rchan almost 7 years ago

  • Sprint/Milestone changed from 52 to 53
Actions #8

Updated by jortel@redhat.com almost 7 years ago

  • Sprint/Milestone changed from 53 to 54
Actions #9

Updated by rchan almost 7 years ago

  • Sprint/Milestone changed from 54 to 56
Actions #10

Updated by bmbouter over 6 years ago

  • Sprint set to Sprint 33
Actions #11

Updated by bmbouter over 6 years ago

  • Sprint/Milestone deleted (56)
Actions #12

Updated by jortel@redhat.com over 6 years ago

  • Sprint Candidate changed from No to Yes
  • Sprint deleted (Sprint 33)

Added by ttereshc over 6 years ago

Revision 6a400906 | View on GitHub

Use aggregation to identify unique errata pkglists

To improve both performance and memory consumption of celery workers during applicability regeneration. Serializer for Errata now deals wit unique pkglists as well.

closes #3172 https://pulp.plan.io/issues/3172

Actions #13

Updated by ttereshc over 6 years ago

  • Status changed from ASSIGNED to POST
Actions #14

Updated by ttereshc over 6 years ago

  • Sprint set to Sprint 36
Actions #15

Updated by rchan over 6 years ago

  • Sprint changed from Sprint 36 to Sprint 37

Added by ttereshc over 6 years ago

Revision 8c852a12 | View on GitHub

Use aggregation for recursive copy of errata + API response fix

Added by ttereshc over 6 years ago

Revision f36ee620 | View on GitHub

Fix unittests

Actions #16

Updated by ttereshc over 6 years ago

  • Status changed from POST to MODIFIED

Added by ttereshc over 6 years ago

Revision a679f26e | View on GitHub

Respect search criteria in the erratum serializer

Serializer should not add a pkglist to an erratum if it was excluded from fields in search criteria. Serializer handles the case when errata_id is absent in fields in search criteria while pkglist is not.

re #3172 https://pulp.plan.io/issues/3172

Actions #18

Updated by dkliban@redhat.com over 6 years ago

  • Platform Release set to 2.16.2

Added by ttereshc over 6 years ago

Revision 759ff1c0 | View on GitHub

Use aggregation to identify unique errata pkglists

To improve both performance and memory consumption of celery workers during applicability regeneration. Serializer for Errata now deals wit unique pkglists as well.

closes #3172 https://pulp.plan.io/issues/3172

(cherry picked from commit 6a40090685f8295d42419a1641351998fa75e709)

Added by ttereshc over 6 years ago

Revision 8456d136 | View on GitHub

Respect search criteria in the erratum serializer

Serializer should not add a pkglist to an erratum if it was excluded from fields in search criteria. Serializer handles the case when errata_id is absent in fields in search criteria while pkglist is not.

re #3172 https://pulp.plan.io/issues/3172

(cherry picked from commit a679f26eb7d7fe2ec006bf4e6db1d0841715a107)

Actions #19

Updated by ttereshc over 6 years ago

Actions #22

Updated by ipanova@redhat.com over 6 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #23

Updated by bmbouter over 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF