Issue #3172
closedCelery worker consumes large number of memory when regenerating applicability for a consumer that binds to many repositories with many errata.
Description
The celery worker is consuming about 60MB RAM initially. After running regenerating applicability for a consumer that binds to 9 repositories, it increased to about 350MB+ and the RAM will never be freed.
I think below are the reason of high memory consumption.
Pulp is fetching the pkglist from all the repositories that a particular Erratum is associated to. This is expensive and the results may contain a lot of duplicate pkglist.
For example, Pulp makes this query:
db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886"}).count()
3
Instead of doing the following:
db.erratum_pkglists.find({"errata_id": "RHBA-2016:1886", "repo_id" : "my_org-Red_Hat_Enterprise_Linux_Server-Red_Hat_Satellite_Tools_6_2_for_RHEL_7_Server_RPMs_x86_64"}).count()
1
After amending the "erratum_pkglists" query to filter the errata by repository, the memory consumption and the speed are reduced by 80%
I think I understand why Pulp don't filter the pkglist by repository when regenerating applicability. It is due to the fact that one entry may not contain all the pkglist since an erratum can be copied accross repositories.
I made the following change to retrieve only the "nevra" of the errata pkglist when regenerating applicability for consumer. This patch can reduce the memory consumption by ~50% (350MB to 150MB) for a consumer with 9 repositories.
https://github.com/hao-yu/pulp_rpm/commit/9f5a52823afee80b31c1e3aa14f4f65fc85f9be9
Updated by dalley almost 7 years ago
- Sprint/Milestone set to 48
- Triaged changed from No to Yes
Updated by hyu almost 7 years ago
The issue happened in Satellite 6 when I have some organizations with the same repositories.
To reproduce the issues I created additional 2 organizations (total 3 orgs) and synced the following repositories for each organization.
Red Hat Enterprise Linux 6 Server RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - RH Common RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - Optional RPMs x86_64 6Server
Red Hat Enterprise Linux 6 Server - Fastrack RPMs x86_64
Red Hat Enterprise Linux 6 Server - Extras RPMs x86_64
Result of generate applicability without my patch:
It consumed ~500MB RSS. Before I created additional 2 organizations it was consuming ~350MB. This proof that the memory will
apache 1811 34.5 6.2 1017124 487380 ? Sl 12:48 0:58 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=20
Result of generate applicability with my patch from upstream bug:
It memory is stable at ~150MB RSS before and after creating additional organizations.
apache 2629 11.0 1.9 684440 150632 ? Sl 12:58 0:54 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-0.pid --heartbeat-interval=30 --maxtasksperchild=20
Although the memory has significantly reduced with my patch but there is only a little or no performance improvement. it took about 1 minute to run the "generate applicability" task.
Updated by ttereshc almost 7 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ttereshc
Updated by jortel@redhat.com almost 7 years ago
- Sprint/Milestone changed from 53 to 54
Updated by jortel@redhat.com over 6 years ago
- Sprint Candidate changed from No to Yes
- Sprint deleted (
Sprint 33)
Added by ttereshc over 6 years ago
Updated by ttereshc over 6 years ago
- Status changed from ASSIGNED to POST
Added by ttereshc over 6 years ago
Revision 8c852a12 | View on GitHub
Use aggregation for recursive copy of errata + API response fix
Updated by ttereshc over 6 years ago
- Status changed from POST to MODIFIED
Applied in changeset 6a40090685f8295d42419a1641351998fa75e709.
Added by ttereshc over 6 years ago
Revision a679f26e | View on GitHub
Respect search criteria in the erratum serializer
Serializer should not add a pkglist to an erratum if it was excluded from fields in search criteria. Serializer handles the case when errata_id is absent in fields in search criteria while pkglist is not.
Added by ttereshc over 6 years ago
Revision 759ff1c0 | View on GitHub
Use aggregation to identify unique errata pkglists
To improve both performance and memory consumption of celery workers during applicability regeneration. Serializer for Errata now deals wit unique pkglists as well.
closes #3172 https://pulp.plan.io/issues/3172
(cherry picked from commit 6a40090685f8295d42419a1641351998fa75e709)
Added by ttereshc over 6 years ago
Revision 8456d136 | View on GitHub
Respect search criteria in the erratum serializer
Serializer should not add a pkglist to an erratum if it was excluded from fields in search criteria. Serializer handles the case when errata_id is absent in fields in search criteria while pkglist is not.
re #3172 https://pulp.plan.io/issues/3172
(cherry picked from commit a679f26eb7d7fe2ec006bf4e6db1d0841715a107)
Updated by ttereshc over 6 years ago
Applied in changeset 759ff1c0f48b07881dcbb0c10b97f0dbfc9b2701.
Updated by ipanova@redhat.com over 6 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Use aggregation to identify unique errata pkglists
To improve both performance and memory consumption of celery workers during applicability regeneration. Serializer for Errata now deals wit unique pkglists as well.
closes #3172 https://pulp.plan.io/issues/3172