Task #3121
Updated by dalley about 7 years ago
The "online" state of worker is currently dependent on two factors. The 'online' field, present on the Worker model, and also whether the last recorded heartbeat is within the timeout interval (30 seconds) of the present time. These two conditions are set in multiple different places, and the online status of the worker based upon those two values is evaluated in multiple different places. I propose a few different changes to DRY this up. The First, the canonical reference for whether a worker is online (based on the multiple different criteria) should be a property on the worker model named "online". "is_online". The field currently named "online" should be renamed to "gracefully_shutdown". The "online" property should then check both the current-ness of the timestamp as well as the "gracefully_shutdown" field The worker serializer should be updated to include both this value as the representation of these. Currently there exist several parts worker state instead of the code which set or reference the "online" field, and these should be removed. The "gracefully_shutdown" field currently exposed from the model directly. Second, "online=True" should not be True only when set anywhere except for the save_heartbeat() method on the worker model. If a worker is shutdown via normal procedures. heartbeating, it is online. During operation, and after If a hard-kill (e.g. OOM), this value should worker was offline, was started, but has not yet heartbeat, it shouldn't be false. considered online. The default value of this field should There are currently a few different places where "online=True" is being set externally, but there is no need for it to be "True", that way, and changing it should would be updated to "False" whenever "save_heartbeat()" is called for that worker, so that the value will be set to "False" during normal operation. It should be set to True as part of the shutdown sequence for the worker. less error-prone.