Issue #596
closedMultiple unit_types in association call causes to fetch everything from source repo
Description
When I run two almost same copy calls it has very different impact on performance. First (a) is creating correct query on all-content repo. It means it contains filters. If I call second one (b), pulp somehow leave out filters and queries everything in all-content. It fetches all and then call file by file queries. As in my repo is quite a lot of files, apache soon eats all memory (30GB) and crashes. Difference is just in querying more unit types in one call. Of course there is a workaround to split it to more calls, but in first place it is not expected behaviour.
This behaviour is present in 2.3 release, not sure if it is fixed in newer releases.
from pulp_rpm.common.ids import UNIT_KEY_RPM
filters = {'checksum': {'$in': ['abc', 'def', ghi']}, 'checksumtype': 'sha256'}
fields = list(UNIT_KEY_RPM) + ['filename', 'signature']
a = repo_unit_api.copy('all-content', 'test-rpm', type_ids = ['rpm'], filters=filters, fields=fields).response_body.task_id
b = repo_unit_api.copy('all-content', 'test-rpm', type_ids = ['rpm', 'iso'], filters=filters, fields=fields).response_body.task_id
Description:
Packages to be added:
Comps group:
Default:
Mandatory:
Visible:
Multi-lib:
Need to be present for arches:
Description of problem:
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
+ This bug was cloned from Bugzilla Bug #1158545 +
Related issues
Updated by cduryee almost 10 years ago
This appears to work OK for me in pulp 2.4.
My test was to create a RHEL7 repo and sync it down, then create a new "copyrepo" repo. "pulp-admin rpm repo list" had the following output:
[devel@245erratatest-net0 ~]$ pulp-admin rpm repo list
--------------------------------------------------------------------
RPM Repositories
--------------------------------------------------------------------
Id: rhel7
Display Name: rhel7
Description: None
Content Unit Counts:
Distribution: 1
Erratum: 167
Package Category: 9
Package Environment: 6
Package Group: 70
Rpm: 9591
Yum Repo Metadata File: 1
Id: copyrepo
Display Name: copyrepo
Description: None
Content Unit Counts:
I then created a json with the following contents:
{"source_repo_id": "rhel7",
"override_config": {},
"criteria": {"type_ids": ["erratum", "distribution"],
"filters": {"unit": {}}}}
I ran: curl -k -X POST -d @./copy.json "https://admin:admin@localhost/pulp/api/v2/repositories/copyrepo/actions/associate/"
This appears to have copied the correct units over:
$ pulp-admin rpm repo list
--------------------------------------------------------------------
RPM Repositories
--------------------------------------------------------------------
Id: rhel7
Display Name: rhel7
Description: None
Content Unit Counts:
Distribution: 1
Erratum: 167
Package Category: 9
Package Environment: 6
Package Group: 70
Rpm: 9591
Yum Repo Metadata File: 1
Id: copyrepo
Display Name: copyrepo
Description: None
Content Unit Counts:
Distribution: 1
Erratum: 167
+ This comment was cloned from Bugzilla #1158545 comment 1 +
Updated by cduryee almost 10 years ago
Marking as CLOSED/WORKSFORME but feel free to re-open if you hit this issue in Pulp 2.4.
+ This comment was cloned from Bugzilla #1158545 comment 2 +
Updated by tkopecek@redhat.com almost 10 years ago
Differrence is not the result - this is really correct - but performance. Running associate per unit_type works appropriately. Running your json with will drastically influence performance. My worst case is importing few new units to repo with 150k units already. It shouldn't eat more than few kilobytes of memory, but it uses tens of gigabytes, as it is fetching everything from that repo to memory (probably to check if unit_key is already present). So problem is somewhere on ORM level which constructs many queries in this case (and later filtering it on pulp side instead of mongo side) instead of one query in first case.
Have you checked this difference? I'm not sure if it will be remarkably visible on 10000 rpms repo, but it should make a (smaller) memory consumption peek also.
+ This comment was cloned from Bugzilla #1158545 comment 3 +
Updated by cduryee almost 10 years ago
I was able to repro this behavior. I am working on a fix now.
+ This comment was cloned from Bugzilla #1158545 comment 4 +
Updated by cduryee almost 10 years ago
I am not sure if the way you set the fields is going to work as expected. It looks like you are setting "fields = list(UNIT_KEY_RPM) + ['filename', 'signature']" but then trying to associate both RPMs and ISOs.
Unfortunately, I think that if you are specifying unit fields (which is needed for memory considerations) you will need to copy items one type_id at a time.
+ This comment was cloned from Bugzilla #1158545 comment 5 +
Updated by tkopecek@redhat.com almost 10 years ago
Doesn't it just ignore non-existent fields? If it is the base of the problem, let's close it as NOTABUG. I'm already using per unit type approach, but was thinking that we can get to lower number of queries with this.
+ This comment was cloned from Bugzilla #1158545 comment 6 +
Updated by cduryee almost 10 years ago
sounds good, I will close as CLOSED/NOTABUG since there is a workaround.
However, your point is still valid:) I have created a redmine task at https://pulp.plan.io/issues/105 to track the OOM issue I saw.
Thanks for the bug report!
+ This comment was cloned from Bugzilla #1158545 comment 7 +
Updated by cduryee almost 10 years ago
closing redmine issue and re-opening bz for now.
Triage team: there are a few places in unit copying that use lists instead of generators which is the cause of this bug.
+ This comment was cloned from Bugzilla #1158545 comment 8 +
Updated by cduryee almost 10 years ago
This is a good bug to examine after Units are converted to mongoengine.
Updated by bmbouter almost 10 years ago
- Has duplicate Issue #105: repo unit association does not always use generators, leading to OOMs added
Updated by bmbouter almost 10 years ago
- Severity changed from Medium to 2. Medium
Updated by bmbouter over 5 years ago
- Status changed from NEW to CLOSED - WONTFIX
Updated by bmbouter over 5 years ago
Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.
Updated by bmbouter over 4 years ago
- Category deleted (
14)
We are removing the 'API' category per open floor discussion June 16, 2020.