workers and resource-manager go missing during large migration
When doing a large migration with ~300K rpms, my workers and resource-manager went missing. Upon further investigation, it appeared that postgresql was stuck in a large IO wait trying to commit a large transaction, for ~10-15 minutes.
My guess is that there is a very large transaction that needs to be broken up into smaller ones, probably around saving artifacts (although this is just a guess)
Updated by ttereshc almost 2 years ago
- Status changed from ASSIGNED to CLOSED - CURRENTRELEASE
Resolved by multiple fixes released in 0.5.0 and 0.5.1.
The main problem was a memory leaking in createrepo_c which caused a system to use swap and slow everything down. Workers were going missing because heartbeat update was way too slow. Createrepo_ c fixes (dalley++):