Project

Profile

Help

Issue #6581

Pulpcore delete_orphans took too long

Added by iballou 5 months ago. Updated 7 days ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

Recently I ran delete_orphans on pulpcore-3.0.1-2 and it took 15 hours, which is quite a bit longer than it takes with Pulp 2. Pulp 2 took just under 3 minutes.

My Katello setup is as follows:

Repos: 72 BusyBox (Docker) 51 Alpine (Docker) 51 Bash (Docker) 72 Large File (70,000 synced units) 1 Very Large File (150,000 synced units) 5 File with 10 uploaded files each

Content views: 1 Large View with 100 versions: 90 versions with all Docker repos 5 versions with all Docker repos plus Large File 5 versions with only Large File Another view with 10 versions: All Busybox only 100 Little Views with 1 version each: All Docker repos

orphan_profile2.png (816 KB) orphan_profile2.png dalley, 09/11/2020 03:43 AM
250

Associated revisions

Revision d89a488d View on GitHub
Added by dalley 18 days ago

Improve performance of orphan cleanup slightly

Also memory consumption.

re: #6581 https://pulp.plan.io/issues/6581

History

#1 Updated by fao89 5 months ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 71

#2 Updated by rchan 5 months ago

  • Sprint changed from Sprint 71 to Sprint 72

#3 Updated by rchan 5 months ago

  • Sprint changed from Sprint 72 to Sprint 73

#4 Updated by rchan 4 months ago

  • Sprint changed from Sprint 73 to Sprint 74

#5 Updated by rchan 4 months ago

  • Sprint changed from Sprint 74 to Sprint 75

#6 Updated by rchan 3 months ago

  • Sprint changed from Sprint 75 to Sprint 76

#7 Updated by rchan 3 months ago

  • Sprint changed from Sprint 76 to Sprint 77

#8 Updated by iballou 3 months ago

When I hit this issue, I had just migrated from Pulp 2. Pulp 2 - 3 migration is still sort of a tech preview feature, so I think we'll be safe by ignoring the migration. It might cause unneeded headaches considering the content syncing takes a while too.

Instructions for creating a reproducer environment:

  1. Setup Forklift: https://github.com/theforeman/forklift along with the DNS steps (if you want).

  2. In your Forklift 99-local.yaml, add the following:

centos7-katello-3.15:
  box: centos7-katello-3.15
  cpus: 8
  memory: 32768
  1. Run vagrant up centos7-katello-3.15 and then vagrant ssh centos7-katello-3.15

  2. Follow these steps to improve the Katello server's tuning: https://projects.theforeman.org/issues/29370#note-5

  3. Navigate to https://<your VM's hostname> and make sure the Katello server is up and running

  4. Clone my scripts repo: https://github.com/ianballou/katello-performance-scripts

  5. Run migration_performance_test --setup. It will likely take a number of hours due to the amount of content. Note that I had troubles syncing many repositories at the same time with Pulp 2, so it's mostly serial syncs. You might want to run this overnight.

  6. Run migration_performance_test_tier_2 --setup. This script won't take nearly as long as the first one.

  7. Check the foreman tasks monitor to ensure all the tasks passed successfully. Double check that all the content is properly synced and that all of the content views exist.

  8. To try orphan cleanup, run foreman-rake katello:delete_orphaned_content or just do it through the Pulp 3 API.

#9 Updated by dalley 3 months ago

That is excellent information Ian, it should be extremely helpful in reproducing this.

I had a couple of different theories about why this might be slow, and possibly they're all true at once:

#10 Updated by gerrod 3 months ago

  • Assignee set to gerrod

#11 Updated by rchan 2 months ago

  • Sprint changed from Sprint 77 to Sprint 78

#12 Updated by rchan about 2 months ago

  • Sprint changed from Sprint 78 to Sprint 79

#13 Updated by daviddavis about 1 month ago

  • Assignee deleted (gerrod)

#14 Updated by rchan about 1 month ago

  • Sprint changed from Sprint 79 to Sprint 80

#15 Updated by ttereshc 27 days ago

Just to confirm that even on a small setup it takes longer than expected. I had 2 RPM repositories, centos8 baseod kickstart and fedora32. That's it, no other repositories, no other content. Both repos were synced with on_demand policy, so only metadata files are on the fs. I removed both repos and ran orphan clean-up which took 3.5 minutes (removed 57230 content units and 13 artifacts).

#16 Updated by rchan 25 days ago

  • Sprint changed from Sprint 80 to Sprint 81

#17 Updated by dalley 21 days ago

  • Status changed from NEW to ASSIGNED

#18 Updated by dalley 21 days ago

  • Assignee set to dalley

#19 Updated by dalley 19 days ago

  • Status changed from ASSIGNED to POST

#20 Updated by dalley 19 days ago

250

The PR linked improves performance by about 40%-ish and improves memory consumption. I don't really think that is enough of an improvement to close this issue, but I also don't see an easy way to improve it further. It looks like django-lifecycle used by RBAC is incurring some overhead here (65%?) but I'm not sure that can be easily rectified (Brian agreed).

Brian has some ideas about preventing it from blocking other tasks, which should help mitigate the practical issues of it taking a long time, but that's technically a separate issue. I'm going to send this one back to NEW and remove it from the sprint since it was mitigated without a full fix (and without one forthcoming).

#21 Updated by dalley 19 days ago

  • Status changed from POST to NEW
  • Assignee deleted (dalley)
  • Sprint deleted (Sprint 81)

#22 Updated by dalley 7 days ago

  • Status changed from NEW to MODIFIED
  • Assignee set to dalley

#23 Updated by bmbouter 7 days ago

  • Sprint/Milestone set to 3.7.0

#24 Updated by pulpbot 7 days ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF