Project

Profile

Help

Issue #7540

workers and resource-manager go missing during large migration

Added by jsherril@redhat.com about 1 year ago. Updated 12 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 85
Quarter:

Description

When doing a large migration with ~300K rpms, my workers and resource-manager went missing. Upon further investigation, it appeared that postgresql was stuck in a large IO wait trying to commit a large transaction, for ~10-15 minutes.

My guess is that there is a very large transaction that needs to be broken up into smaller ones, probably around saving artifacts (although this is just a guess)

History

#1 Updated by jsherril@redhat.com about 1 year ago

  • Description updated (diff)

#2 Updated by ttereshc about 1 year ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 82

#3 Updated by jsherril@redhat.com about 1 year ago

  • Priority changed from Normal to High

#4 Updated by ttereshc about 1 year ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc

#5 Updated by rchan about 1 year ago

  • Sprint changed from Sprint 82 to Sprint 83

#6 Updated by rchan about 1 year ago

  • Sprint changed from Sprint 83 to Sprint 84

#7 Updated by rchan 12 months ago

  • Sprint changed from Sprint 84 to Sprint 85

#8 Updated by ttereshc 12 months ago

  • Status changed from ASSIGNED to CLOSED - CURRENTRELEASE

Resolved by multiple fixes released in 0.5.0 and 0.5.1.

The main problem was a memory leaking in createrepo_c which caused a system to use swap and slow everything down. Workers were going missing because heartbeat update was way too slow. Createrepo_ c fixes (dalley++):

Please register to edit this issue

Also available in: Atom PDF