Project

Profile

Help

Task #3121

Simplify tracking of worker 'online' state

Added by dalley over 2 years ago. Updated 6 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Platform Release:
Blocks Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
QA Contact:
Complexity:
Smash Test:
Sprint:
Sprint 34

Description

The "online" state of worker is currently dependent on two factors. The 'online' field, present on the Worker model, and also whether the last recorded heartbeat is within the timeout interval (30 seconds) of the present time. These two conditions are set in multiple different places, and the online status of the worker based upon those two values is evaluated in multiple different places.

I propose a few different changes to DRY this up.

The canonical reference for whether a worker is online (based on the multiple different criteria) should be a property on the worker model named "online". The field currently named "online" should be renamed to "gracefully_stopped". The "online" property should then check both the current-ness of the timestamp as well as the "gracefully_stopped" field The worker serializer should be updated to include both of these. Currently there exist several parts of the code which set or reference the "online" field, and these should be removed.

The "gracefully_stopped" field should be True only when the worker is shutdown via normal procedures. During operation, and after a hard-kill (e.g. OOM), this value should be false. The default value of this field should be "True", and it should be updated to "False" whenever "save_heartbeat()" is called for that worker, so that the value will be set to "False" during normal operation. It should be set to True as part of the shutdown sequence for the worker.


Checklist


Related issues

Related to Pulp - Story #3143: As an authenticated user, I can filter workers. CLOSED - CURRENTRELEASE Actions

Associated revisions

Revision f852da10 View on GitHub
Added by dalley about 2 years ago

Simplify worker 'online' state tracking

closes #3121 https://pulp.plan.io/issues/3121

Revision f852da10 View on GitHub
Added by dalley about 2 years ago

Simplify worker 'online' state tracking

closes #3121 https://pulp.plan.io/issues/3121

Revision 501cdd8a View on GitHub
Added by dalley about 2 years ago

Additional cleanup and refactoring

re: #3121 https://pulp.plan.io/issues/3121

Revision 501cdd8a View on GitHub
Added by dalley about 2 years ago

Additional cleanup and refactoring

re: #3121 https://pulp.plan.io/issues/3121

History

#1 Updated by dalley over 2 years ago

  • Tags Pulp 3 added

#2 Updated by dalley over 2 years ago

  • Description updated (diff)

#3 Updated by bmbouter over 2 years ago

This story is well written, and the 'is_online' property changes all make perfect sense.

In terms of keeping the 'online' boolean itself that is really just for the graceful shutdown case right? If we got rid of it entirely the only issue users would have is that when shutting down a worker gracefully is_online would show True for an additional 30 seconds right?

#4 Updated by dalley over 2 years ago

That's my understanding, yes

#5 Updated by dalley over 2 years ago

Actually, there is another scenario. When a worker first goes offline we probably want to log some message about it. If this happens in a non-gentle way, we need some way to distinguish the workers which are supposed to be online (but not heartbeating) apart from the others so that we can print a message once, and only once, since we are keeping the old worker records around instead of deleting them. This would be accomplished by setting online=False after logging the message.

#6 Updated by bmbouter over 2 years ago

Having the 'is_online' and 'online' will be confusing for developers. What about renaming 'online' to be 'gracefully_shutdown' which we could record to True during graceful shutdown and set to False in all other places? It would also be included in the viewset and would be user read-only. I like that better because it creates more value (knowledge) for the user and solves the user case.

What do you think about ^?

#7 Updated by dalley over 2 years ago

That sounds good to me

#8 Updated by dalley over 2 years ago

  • Description updated (diff)

#9 Updated by dkliban@redhat.com over 2 years ago

bmbouter wrote:

Having the 'is_online' and 'online' will be confusing for developers. What about renaming 'online' to be 'gracefully_shutdown' which we could record to True during graceful shutdown and set to False in all other places? It would also be included in the viewset and would be user read-only. I like that better because it creates more value (knowledge) for the user and solves the user case.

What do you think about ^?

I would prefer 'online' and 'gracefully_stopped'.

#10 Updated by bmbouter over 2 years ago

  • Description updated (diff)
  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes

I'm +1 on 'gracefully_stopped' also because it more clearly past tense and seems like simpler language.

I think this story is good to go, so I'm grooming.

#11 Updated by mhrivnak over 2 years ago

  • Sprint/Milestone set to 47

#12 Updated by dalley over 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley

#13 Updated by daviddavis over 2 years ago

  • Related to Story #3143: As an authenticated user, I can filter workers. added

#14 Updated by rchan over 2 years ago

  • Sprint/Milestone changed from 47 to 48

#15 Updated by daviddavis over 2 years ago

  • Tags Pulp 3 MVP added

#16 Updated by rchan over 2 years ago

  • Sprint/Milestone changed from 48 to 52

#17 Updated by dalley over 2 years ago

  • Status changed from ASSIGNED to NEW

I still plan to do this but it's been moved to the very end of my list of priorities by other work.

#18 Updated by dalley over 2 years ago

  • Assignee deleted (dalley)

#19 Updated by rchan over 2 years ago

  • Sprint/Milestone changed from 52 to 53

#20 Updated by dalley over 2 years ago

  • Status changed from NEW to ASSIGNED

#21 Updated by dalley over 2 years ago

  • Assignee set to dalley

#22 Updated by jortel@redhat.com over 2 years ago

  • Sprint/Milestone changed from 53 to 54

#23 Updated by rchan over 2 years ago

  • Sprint/Milestone changed from 54 to 56

#24 Updated by dalley over 2 years ago

  • Assignee deleted (dalley)
  • Sprint/Milestone deleted (56)

#25 Updated by dalley about 2 years ago

  • Sprint/Milestone set to 56

#26 Updated by dalley about 2 years ago

  • Assignee set to dalley

#27 Updated by daviddavis about 2 years ago

Added a checklist item to create a testcase in pulp-smash ;)

#28 Updated by bmbouter about 2 years ago

  • Sprint set to Sprint 33

#29 Updated by bmbouter about 2 years ago

  • Sprint/Milestone deleted (56)

#30 Updated by jortel@redhat.com about 2 years ago

  • Sprint changed from Sprint 33 to Sprint 34

#31 Updated by dalley about 2 years ago

  • Status changed from ASSIGNED to POST

#32 Updated by dalley about 2 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#33 Updated by dkliban@redhat.com about 2 years ago

  • Sprint/Milestone set to 3.0.0

#34 Updated by bmbouter about 1 year ago

  • Tags deleted (Pulp 3, Pulp 3 MVP)

#35 Updated by bmbouter 6 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF