Project

Profile

Help

Issue #4247

improve performance of uploading ISO

Added by Zhiming almost 2 years ago. Updated about 2 months ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.15.1
Platform Release:
OS:
RHEL 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Adding a search criteria to filter out units as need to improve performance of "find_repo_content_units" in pulp_rpm/plugins/importers/iso/importer.py

We suffered serious performance issue to uploading units (ISO) to a repository when data volume get larger and larger. After applying the change[1], performance improved. In addition, some our internal function tests had been done, it's pass. We uses pulp 2.15.

[1]https://github.com/pulp/pulp_rpm/pull/1236

Associated revisions

Revision 648b021f View on GitHub
Added by Zhiming almost 2 years ago

Improve performance of uploading ISO

Adding a search criteria to filter out units as need to improve performance of "find_repo_content_units".

ref #4247 https://pulp.plan.io/issues/4247

History

#1 Updated by Zhiming almost 2 years ago

Steps to reproduce the issues:

1. Preparing test data. Uploading 30,000 small ISOs to Pulp.
2. Upload an ISO to Pulp and capture elapsed time. (It takes around 16 seconds in my test env. After applying the fix, it just takes around 2 - 3 seconds.)

#2 Updated by Zhiming almost 2 years ago

Analysis for the performance issue.

Let’s look at below piece of codes which is from “pulp/server/controllers/repository.py”.
def find_repo_content_units(..., repo_content_unit_q=None,..)
......
qs = model.RepositoryContentUnit.objects(q_obj=repo_content_unit_q,
repo_id=repository.repo_id)
......
for repo_content_unit in qs:
id_set = type_map.setdefault(repo_content_unit.unit_type_id, set())
id_set.add(repo_content_unit.unit_id)
content_unit_set = content_units.setdefault(repo_content_unit.unit_type_id, dict())
content_unit_set[repo_content_unit.unit_id] = repo_content_unit
......

As the value of parameter “repo_content_unit_q” equals “None” (invoker does not pass any data for it), "qs = model.RepositoryContentUnit.objects(q_obj=repo_content_unit_q, repo_id=repository.repo_id)" fetches all records of the repo from repo_content_units in MongoDB, then saves to Python a Map object and List objects.

Profiling shows the piece takes >80% of total time of uploading an unit (there are more than 30000 units in the repo). The major reason is result set “qs” is too large. Worse, “qs” will get larger as uploading more units, and performance will get worse and worse.

So passing a search criteria[1] "repo_content_unit_q" to filter out units as need can improve performance.

[1]https://github.com/pulp/pulp_rpm/pull/1236

#3 Updated by ttereshc almost 2 years ago

  • Status changed from NEW to POST
  • Triaged changed from No to Yes

#4 Updated by bmbouter over 1 year ago

  • Status changed from POST to CLOSED - WONTFIX

Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.

#5 Updated by bmbouter over 1 year ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF