Project

Profile

Help

Issue #7540

closed

workers and resource-manager go missing during large migration

Added by jsherril@redhat.com about 2 years ago. Updated almost 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 85
Quarter:

Description

When doing a large migration with ~300K rpms, my workers and resource-manager went missing. Upon further investigation, it appeared that postgresql was stuck in a large IO wait trying to commit a large transaction, for ~10-15 minutes.

My guess is that there is a very large transaction that needs to be broken up into smaller ones, probably around saving artifacts (although this is just a guess)

Actions #1

Updated by jsherril@redhat.com about 2 years ago

  • Description updated (diff)
Actions #2

Updated by ttereshc about 2 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 82
Actions #3

Updated by jsherril@redhat.com about 2 years ago

  • Priority changed from Normal to High
Actions #4

Updated by ttereshc about 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc
Actions #5

Updated by rchan almost 2 years ago

  • Sprint changed from Sprint 82 to Sprint 83
Actions #6

Updated by rchan almost 2 years ago

  • Sprint changed from Sprint 83 to Sprint 84
Actions #7

Updated by rchan almost 2 years ago

  • Sprint changed from Sprint 84 to Sprint 85
Actions #8

Updated by ttereshc almost 2 years ago

  • Status changed from ASSIGNED to CLOSED - CURRENTRELEASE

Resolved by multiple fixes released in 0.5.0 and 0.5.1.

The main problem was a memory leaking in createrepo_c which caused a system to use swap and slow everything down. Workers were going missing because heartbeat update was way too slow. Createrepo_ c fixes (dalley++):

Also available in: Atom PDF