Project

Profile

Help

Issue #3814

File Support - Issue #3770: Pulp 3 is about 2x slower than pulp 2 in syncing a large file repo

RemositoryVersion's add_content and remove_content does not perform bulk operations

Added by bmbouter over 1 year ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sync Performance
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

Motivation

A cprofile report shows that a lot of time is being spent in RepositoryVersion.add_content() which interacts with the database on a unit-by-unit level. This is taking a long time. We need to improve the interface to perform add_content and remove_content via bulk operations.

Solution

1. Create a test in python that adds/removes X number of content units to a repo version
2. Benchmark the test
3. Update add_content and remove_content to support lists of content
4. Benchmark the change

I'll probably use different values of X starting with 1,000 and increasing up by factors of 10.

Associated revisions

Revision 9bfc50d9 View on GitHub
Added by daviddavis over 1 year ago

Using querysets for add/remove_content methods

fixes #3814
https://pulp.plan.io/issues/3814

Revision 9bfc50d9 View on GitHub
Added by daviddavis over 1 year ago

Using querysets for add/remove_content methods

fixes #3814
https://pulp.plan.io/issues/3814

Revision 9bfc50d9 View on GitHub
Added by daviddavis over 1 year ago

Using querysets for add/remove_content methods

fixes #3814
https://pulp.plan.io/issues/3814

History

#1 Updated by CodeHeeler over 1 year ago

  • Triaged changed from No to Yes

#2 Updated by daviddavis over 1 year ago

  • Description updated (diff)
  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis

#3 Updated by daviddavis over 1 year ago

I modified add_content() and remove_content() to accept querysets. Here are the initial results for 1000 content units to a repo version:

add_content currently: 43.9s
add_content with bulk_create: 4.3s

remove_content currently: 44.4s
remove_content with a queryset: 0.5s

#4 Updated by daviddavis over 1 year ago

  • Status changed from ASSIGNED to POST

Went ahead and opened a PR with the performance improvements:

https://github.com/pulp/pulp/pull/3548

#5 Updated by daviddavis over 1 year ago

  • Status changed from POST to MODIFIED

#6 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#7 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF