Project

Profile

Help

Story #2509

Updated by bizhang over 7 years ago

Currently pulp hearbeats are slow: 
 * worker heartbeat is 30 seconds  
 * celerybeat heartbeat is 90 seconds 
 * worker ageout time is 300 seconds 
 * celerybeat lock ageout time is 200 seconds 
 * resource manager lock check is 60 seconds 

 This means that when As a worker process dies it could take 300-390s for it to be considered dead.  

 This time should be shorter. We should consider Pulp workers to be dead if they have been missing for 30 seconds. 

 We can do this by updating part of https://pulp.plan.io/issues/2186 we updated the worker heartbeat celery hearbeats from 90s to 5 seconds, the celerybeat heartbeat 20s. It would make sense to 5 seconds and decrease the worker ageout timeout time from 300s to 25 seconds. 
 60s as a result of this. The timeout is set here [0].  

 This change would also mean that a worker that has not checked in for 5 heartbeats (25s) would be considered missing the next wait time celerybeat checks (25s-30s after of the last time the worker checked in) user for pulp-manage-db would shorten from up to 5 minutes to 1 minute, which is a much better user experience. 

 The proposed timings are: 
 * worker heartbeat 5s  
 * celerybeat heartbeat 5s 
 * worker ageout time 25s 
 * celerybeat lock ageout time 25 s 
 * resource manager lock heartbeat 5s 


 [0] https://github.com/pulp/pulp/blob/master/common/pulp/common/constants.py#L62

Back