Project

Profile

Help

Task #3121

closed

Simplify tracking of worker 'online' state

Added by dalley over 6 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Sprint:
Sprint 34
Quarter:

Description

The "online" state of worker is currently dependent on two factors. The 'online' field, present on the Worker model, and also whether the last recorded heartbeat is within the timeout interval (30 seconds) of the present time. These two conditions are set in multiple different places, and the online status of the worker based upon those two values is evaluated in multiple different places.

I propose a few different changes to DRY this up.

The canonical reference for whether a worker is online (based on the multiple different criteria) should be a property on the worker model named "online". The field currently named "online" should be renamed to "gracefully_stopped". The "online" property should then check both the current-ness of the timestamp as well as the "gracefully_stopped" field The worker serializer should be updated to include both of these. Currently there exist several parts of the code which set or reference the "online" field, and these should be removed.

The "gracefully_stopped" field should be True only when the worker is shutdown via normal procedures. During operation, and after a hard-kill (e.g. OOM), this value should be false. The default value of this field should be "True", and it should be updated to "False" whenever "save_heartbeat()" is called for that worker, so that the value will be set to "False" during normal operation. It should be set to True as part of the shutdown sequence for the worker.


Related issues

Related to Pulp - Story #3143: As an authenticated user, I can filter workers.CLOSED - CURRENTRELEASEdalley

Actions
Actions #1

Updated by dalley over 6 years ago

  • Tags Pulp 3 added
Actions #2

Updated by dalley over 6 years ago

  • Description updated (diff)
Actions #3

Updated by bmbouter over 6 years ago

This story is well written, and the 'is_online' property changes all make perfect sense.

In terms of keeping the 'online' boolean itself that is really just for the graceful shutdown case right? If we got rid of it entirely the only issue users would have is that when shutting down a worker gracefully is_online would show True for an additional 30 seconds right?

Actions #4

Updated by dalley over 6 years ago

That's my understanding, yes

Actions #5

Updated by dalley over 6 years ago

Actually, there is another scenario. When a worker first goes offline we probably want to log some message about it. If this happens in a non-gentle way, we need some way to distinguish the workers which are supposed to be online (but not heartbeating) apart from the others so that we can print a message once, and only once, since we are keeping the old worker records around instead of deleting them. This would be accomplished by setting online=False after logging the message.

Actions #6

Updated by bmbouter over 6 years ago

Having the 'is_online' and 'online' will be confusing for developers. What about renaming 'online' to be 'gracefully_shutdown' which we could record to True during graceful shutdown and set to False in all other places? It would also be included in the viewset and would be user read-only. I like that better because it creates more value (knowledge) for the user and solves the user case.

What do you think about ^?

Actions #7

Updated by dalley over 6 years ago

That sounds good to me

Actions #8

Updated by dalley over 6 years ago

  • Description updated (diff)
Actions #9

Updated by dkliban@redhat.com over 6 years ago

bmbouter wrote:

Having the 'is_online' and 'online' will be confusing for developers. What about renaming 'online' to be 'gracefully_shutdown' which we could record to True during graceful shutdown and set to False in all other places? It would also be included in the viewset and would be user read-only. I like that better because it creates more value (knowledge) for the user and solves the user case.

What do you think about ^?

I would prefer 'online' and 'gracefully_stopped'.

Actions #10

Updated by bmbouter over 6 years ago

  • Description updated (diff)
  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes

I'm +1 on 'gracefully_stopped' also because it more clearly past tense and seems like simpler language.

I think this story is good to go, so I'm grooming.

Actions #11

Updated by mhrivnak over 6 years ago

  • Sprint/Milestone set to 47
Actions #12

Updated by dalley over 6 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley
Actions #13

Updated by daviddavis over 6 years ago

  • Related to Story #3143: As an authenticated user, I can filter workers. added
Actions #14

Updated by rchan over 6 years ago

  • Sprint/Milestone changed from 47 to 48
Actions #15

Updated by daviddavis over 6 years ago

  • Tags Pulp 3 MVP added
Actions #16

Updated by rchan over 6 years ago

  • Sprint/Milestone changed from 48 to 52
Actions #17

Updated by dalley over 6 years ago

  • Status changed from ASSIGNED to NEW

I still plan to do this but it's been moved to the very end of my list of priorities by other work.

Actions #18

Updated by dalley over 6 years ago

  • Assignee deleted (dalley)
Actions #19

Updated by rchan over 6 years ago

  • Sprint/Milestone changed from 52 to 53
Actions #20

Updated by dalley over 6 years ago

  • Status changed from NEW to ASSIGNED
Actions #21

Updated by dalley over 6 years ago

  • Assignee set to dalley
Actions #22

Updated by jortel@redhat.com over 6 years ago

  • Sprint/Milestone changed from 53 to 54
Actions #23

Updated by rchan about 6 years ago

  • Sprint/Milestone changed from 54 to 56
Actions #24

Updated by dalley about 6 years ago

  • Assignee deleted (dalley)
  • Sprint/Milestone deleted (56)
Actions #25

Updated by dalley about 6 years ago

  • Sprint/Milestone set to 56
Actions #26

Updated by dalley about 6 years ago

  • Assignee set to dalley
Actions #27

Updated by daviddavis about 6 years ago

Added a checklist item to create a testcase in pulp-smash ;)

Actions #28

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 33
Actions #29

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (56)
Actions #30

Updated by jortel@redhat.com about 6 years ago

  • Sprint changed from Sprint 33 to Sprint 34
Actions #31

Updated by dalley about 6 years ago

  • Status changed from ASSIGNED to POST

Added by dalley about 6 years ago

Revision f852da10 | View on GitHub

Simplify worker 'online' state tracking

closes #3121 https://pulp.plan.io/issues/3121

Added by dalley about 6 years ago

Revision f852da10 | View on GitHub

Simplify worker 'online' state tracking

closes #3121 https://pulp.plan.io/issues/3121

Added by dalley about 6 years ago

Revision 501cdd8a | View on GitHub

Additional cleanup and refactoring

re: #3121 https://pulp.plan.io/issues/3121

Added by dalley about 6 years ago

Revision 501cdd8a | View on GitHub

Additional cleanup and refactoring

re: #3121 https://pulp.plan.io/issues/3121

Actions #32

Updated by dalley about 6 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #33

Updated by dkliban@redhat.com about 6 years ago

  • Sprint/Milestone set to 3.0.0
Actions #34

Updated by bmbouter about 5 years ago

  • Tags deleted (Pulp 3, Pulp 3 MVP)
Actions #35

Updated by bmbouter over 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF