Project

Profile

Help

Issue #2362

applicability calculation wastes time scanning a list

Added by mhrivnak over 5 years ago. Updated almost 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
2.8.7
Platform Release:
2.10.3
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 9
Quarter:

Description

Applicability calculation for errata spends most of its time scanning a large list of dicts to see if it contains a particular dict. Profiling shows that a normal applicability task spends about 95% of its time scanning this list. This is causing calculation to take much longer than is reasonably expected.

Considering a RHEL 6 repo with approximately 18000 rpms and 3700 errata, pulp does this:

make a list of dicts, each representing one rpm
for each errata:
  for each rpm listed in the errata:
    check if the rpm is in the list of dicts

The query happens here:

https://github.com/pulp/pulp_rpm/blob/pulp-rpm-2.8.7-1/plugins/pulp_rpm/plugins/profilers/yum.py#L321

Profiling with cProfile showed that out of 81 seconds spent calculating applicability (both rpm and errata) for one consumer, 77 of those seconds were spent on this activity. The list was scanned more than 80000 times.

It should instead use a better data structure, like a set, that is optimized to perform membership queries in constant time.

Associated revisions

Revision d2a22a5a View on GitHub
Added by mhrivnak over 5 years ago

errata applicability no longer spends most of its time scanning a list

https://pulp.plan.io/issues/2362 fixes #2362

History

#1 Updated by mhrivnak over 5 years ago

This patch appears to fix the issue:

diff --git a/plugins/pulp_rpm/plugins/profilers/yum.py b/plugins/pulp_rpm/plugins/profilers/yum.py
index f390357..f6f60ce 100644
--- a/plugins/pulp_rpm/plugins/profilers/yum.py
+++ b/plugins/pulp_rpm/plugins/profilers/yum.py
@@ -199,8 +199,8 @@ class YumProfiler(Profiler):

         # this needs to be fetched outside of the units loop :)
         if content_type == TYPE_ID_ERRATA:
-            available_rpm_nevras = [YumProfiler._create_nevra(r.unit_key) for r in
-                                    conduit.get_repo_units(bound_repo_id, TYPE_ID_RPM)]
+            available_rpm_nevras = set([YumProfiler._create_nevra(r.unit_key) for r in
+                                        conduit.get_repo_units(bound_repo_id, TYPE_ID_RPM)])

         applicable_unit_ids = []
         # Check applicability for each unit
@@ -478,7 +478,4 @@ class YumProfiler(Profiler):
         testing with real data.

         """
-        nevra = {'name': str(r['name']), 'epoch': str(r['epoch']),
-                 'version': str(r['version']), 'release': str(r['release']),
-                 'arch': str(r['arch'])}
-        return nevra
+        return tuple(str(r[k]) for k in ('name', 'epoch', 'version', 'release', 'arch'))

#3 Updated by mhrivnak over 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to mhrivnak
  • Priority changed from Normal to High
  • Sprint/Milestone set to 27
  • Triaged changed from No to Yes

I'm adding this to the sprint since it's nearly complete anyway. PRifying the proposed fix is a tiny portion of the work.

#4 Updated by mhrivnak over 5 years ago

  • Status changed from ASSIGNED to POST

#5 Updated by mhrivnak over 5 years ago

  • Status changed from POST to MODIFIED

#6 Updated by semyers over 5 years ago

  • Platform Release set to 2.10.1

#7 Updated by semyers over 5 years ago

  • Platform Release deleted (2.10.1)

#9 Updated by semyers about 5 years ago

  • Platform Release set to 2.10.3

#10 Updated by semyers about 5 years ago

  • Status changed from MODIFIED to 5

#11 Updated by semyers about 5 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE

#13 Updated by bmbouter almost 4 years ago

  • Sprint set to Sprint 9

#14 Updated by bmbouter almost 4 years ago

  • Sprint/Milestone deleted (27)

#15 Updated by bmbouter almost 3 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF