Project

Profile

Help

Issue #1662

closed

RepoUnitAssociationManager._units_from_criteria does not use unit_fields from the criteria

Added by jsherril@redhat.com about 8 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Master
Platform Release:
2.8.0
OS:
RHEL 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Performing a unit copy to copy all the rpms a large repo (10,000 rpms) to another repo causes the pulp worker to consumer a large amount of memory. In my case it got up to 2.5 GB.

Steps to reproduce:

1) Create and sync a large repo, such as rhel 7 or rhel 6
2) Create a 2nd repo that is empty
3) unit copy all the rpms from one repo to the other.

I can provide the exact api call if needed, but was able to reproduce with this pulp-admin command:

pulp-admin -u admin -p password rpm repo copy rpm --from-repo-id="large-repo" --to-repo-id=empty-repo

On my box this causes the OOM killer to kick in and kill the worker, so i'm not entirely sure how much memory it will eventually consume.

Actions #1

Updated by jortel@redhat.com about 8 years ago

  • Platform Release set to 2.8.0
Actions #2

Updated by rbarlow about 8 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to rbarlow
  • Triaged changed from No to Yes

Justin has unreasonably refused to buy several more tens of GBs of RAM, so I guess I'll see if I can figure out what is going on.

Actions #3

Updated by rbarlow about 8 years ago

I've determined that this line is where the OOM killer dies on my dev box:

https://github.com/pulp/pulp_rpm/blob/c9f246401513efb2877f303fd01643701d2c47a1/plugins/pulp_rpm/plugins/importers/yum/associate.py#L53

Fixing it may or may not be easy, as it was using a set for de-duplication, which may be difficult to accomplish with a generator.

Actions #4

Updated by rbarlow about 8 years ago

I've got an experimental fix that keeps the RHEL 7 copy under 190 MB on my dev box. It needs more testing, but I may be able to have a PR on Monday or Tuesday.

Actions #5

Updated by rbarlow about 8 years ago

  • Status changed from ASSIGNED to POST

I've submitted a WIP pull request with my experimental fix. I will perform more testing before I will mark it as ready for real review, but I would appreciate eyes on the code that I am considering for this fix:

https://github.com/pulp/pulp_rpm/pull/807

Actions #6

Updated by mhrivnak about 8 years ago

pulp-admin is still sending the list of "fields" in the criteria when it makes the associate call, so it's likely that the platform is ignoring that. Assuming that's the case, we should try to fix this in the platform.

$ pulp-admin -vv rpm repo copy rpm -f foo -t bar
2016-02-15 19:03:00,641 - DEBUG - sending POST request to /pulp/api/v2/repositories/bar/actions/associate/
2016-02-15 19:03:00,646 - INFO - POST request to /pulp/api/v2/repositories/bar/actions/associate/ with parameters {"source_repo_id": "foo", "override_config": {}, "criteria": {"fields": {"unit": ["name", "epoch", "version", "release", "arch", "checksum", "checksumtype"]}, "type_ids": ["rpm"], "filters": {"unit": {}}}}
Actions #7

Updated by bmbouter about 8 years ago

If we fix it in platform, we could chain a fields() call onto the queryset which would be easy and achieve the goal (if I understand this correctly).

Added by rbarlow about 8 years ago

Revision 41f4d255 | View on GitHub

Use the unit_fields from criteria in unit_association.

pulp.server.managers.repo.unit_association.RepoUnitAssociationManager._units_from_criteria() was not passing the criteria's unit_fields list along to the next callee. This caused significant memory usage in certian Pulp operations. This commit corrects that oversight.

https://pulp.plan.io/issues/1662

fixes #1662

Added by rbarlow about 8 years ago

Revision 41f4d255 | View on GitHub

Use the unit_fields from criteria in unit_association.

pulp.server.managers.repo.unit_association.RepoUnitAssociationManager._units_from_criteria() was not passing the criteria's unit_fields list along to the next callee. This caused significant memory usage in certian Pulp operations. This commit corrects that oversight.

https://pulp.plan.io/issues/1662

fixes #1662

Actions #8

Updated by rbarlow about 8 years ago

  • Project changed from RPM Support to Pulp
  • Subject changed from Unit copy on a large repo consumes large amounts of memory to RepoUnitAssociationManager._units_from_criteria does not use unit_fields from the criteria

When I started out on this, I had forgotten the hacky requirement for API callers to tell us a list of fields when they perform a copy. I still think that's hacky and is truly a bug in itself, but it turned out that that mongoengine conversion had stopped paying attention to that field inside pulp.server.managers.repo.unit_association.RepoUnitAssociationManager._units_from_criteria().

We definitely need to fix that code path as it is used by other things too, but I still think users shouldn't have to tell us a list of fields when copying units. We may be able to slap an if statement around the code in my other pull request to make it OK when users do this with RPMs, but it won't be a general fix. Switching to postgres will make fixing this the right way easy.

This PR addresses the issue in platform:

https://github.com/pulp/pulp/pull/2425

Actions #9

Updated by rbarlow about 8 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #10

Updated by dkliban@redhat.com about 8 years ago

  • Status changed from MODIFIED to 5
Actions #11

Updated by dkliban@redhat.com about 8 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #12

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF