Project

Profile

Help

Issue #1702

closed

Fetching repo info with details takes much longer on 2.8

Added by jsherril@redhat.com about 8 years ago. Updated almost 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Master
Platform Release:
2.8.0
OS:
CentOS 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Hitting this api:

/pulp/api/v2/repositories/REPO_ID/?details=true

seems to take much longer on 2.8 than it did on 2.6.

On 2.8 I'm seeing it take 6 seconds on a repo with 20K units, 4 seconds on a repo with 11K units. On pulp 2.6 this seemed almost instant.

The result of this for katello is that some operations seem to take a much longer time than before. Some page views make this api call multiple times across different repos (once per repo).

Actions #1

Updated by mhrivnak about 8 years ago

I strongly suspect this is the repository controller's "missing_unit_count" function taking up the time, which determines how many units in the repo do not have their files downloaded.

Actions #2

Updated by mhrivnak about 8 years ago

  • Priority changed from Normal to High
  • Platform Release set to 2.8.0
  • Triaged changed from No to Yes
Actions #3

Updated by jortel@redhat.com about 8 years ago

  • Priority changed from High to Normal
  • Platform Release deleted (2.8.0)
  • Triaged changed from Yes to No

The functions used by missing_unit_count() use the default pagination size of 1000. In this case we're only querying by unit_id and downloaded=False which will produce a consistent and deterministic size query. Perhaps the pagination size for this query can be safely increased to improve performance? Rough calculations (not accounting for query overhead) suggest that the json representation of a query containing a list of 100K UUIDs has about a 4MB memory footprint. Even when considering query syntax overhead not accounted for - the data suggests we can safely increase the pagination size used when obtaining the count to something closer to 100K than 1K. I would hope that even 25K would be 25x faster. Should be easy to confirm with some experimentation.

Actions #4

Updated by jortel@redhat.com about 8 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to jortel@redhat.com
Actions #5

Updated by jortel@redhat.com about 8 years ago

Tested with a repository containing 12k units. Two findings:

  • get_associated_unit_ids() confirmed as responsible. The root cause is mongoengine creating Document objects for queried unit associations.
  • changing page size to: 25k has no impact.
Actions #6

Updated by mhrivnak about 8 years ago

  • Platform Release set to 2.8.0
Actions #7

Updated by jortel@redhat.com about 8 years ago

  • Status changed from ASSIGNED to POST

Added by jortel@redhat.com about 8 years ago

Revision 3fbb132b | View on GitHub

Fix performance regression in associated unit queries. The mongoengine object instantiation is too slow. closes #1702

Added by jortel@redhat.com about 8 years ago

Revision 3fbb132b | View on GitHub

Fix performance regression in associated unit queries. The mongoengine object instantiation is too slow. closes #1702

Actions #8

Updated by jortel@redhat.com about 8 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #9

Updated by amacdona@redhat.com about 8 years ago

  • Triaged changed from No to Yes
Actions #10

Updated by dkliban@redhat.com about 8 years ago

  • Status changed from MODIFIED to 5
Actions #11

Updated by dkliban@redhat.com about 8 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #12

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added
Actions #13

Updated by bmbouter almost 4 years ago

  • Category deleted (14)

We are removing the 'API' category per open floor discussion June 16, 2020.

Also available in: Atom PDF