Project

Profile

Help

Task #3121

Simplify tracking of worker 'online' state

Added by dalley almost 2 years ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 34

Description

The "online" state of worker is currently dependent on two factors. The 'online' field, present on the Worker model, and also whether the last recorded heartbeat is within the timeout interval (30 seconds) of the present time. These two conditions are set in multiple different places, and the online status of the worker based upon those two values is evaluated in multiple different places.

I propose a few different changes to DRY this up.

The canonical reference for whether a worker is online (based on the multiple different criteria) should be a property on the worker model named "online". The field currently named "online" should be renamed to "gracefully_stopped". The "online" property should then check both the current-ness of the timestamp as well as the "gracefully_stopped" field The worker serializer should be updated to include both of these. Currently there exist several parts of the code which set or reference the "online" field, and these should be removed.

The "gracefully_stopped" field should be True only when the worker is shutdown via normal procedures. During operation, and after a hard-kill (e.g. OOM), this value should be false. The default value of this field should be "True", and it should be updated to "False" whenever "save_heartbeat()" is called for that worker, so that the value will be set to "False" during normal operation. It should be set to True as part of the shutdown sequence for the worker.


Checklist


Related issues

Related to Pulp - Story #3143: As an authenticated user, I can filter workers. MODIFIED Actions

Associated revisions

Revision f852da10 View on GitHub
Added by dalley over 1 year ago

Simplify worker 'online' state tracking

closes #3121
https://pulp.plan.io/issues/3121

Revision f852da10 View on GitHub
Added by dalley over 1 year ago

Simplify worker 'online' state tracking

closes #3121
https://pulp.plan.io/issues/3121

Revision f852da10 View on GitHub
Added by dalley over 1 year ago

Simplify worker 'online' state tracking

closes #3121
https://pulp.plan.io/issues/3121

Revision 501cdd8a View on GitHub
Added by dalley over 1 year ago

Additional cleanup and refactoring

re: #3121
https://pulp.plan.io/issues/3121

Revision 501cdd8a View on GitHub
Added by dalley over 1 year ago

Additional cleanup and refactoring

re: #3121
https://pulp.plan.io/issues/3121

Revision 501cdd8a View on GitHub
Added by dalley over 1 year ago

Additional cleanup and refactoring

re: #3121
https://pulp.plan.io/issues/3121

History

#1 Updated by dalley almost 2 years ago

  • Tags Pulp 3 added

#2 Updated by dalley almost 2 years ago

  • Description updated (diff)

#3 Updated by bmbouter almost 2 years ago

This story is well written, and the 'is_online' property changes all make perfect sense.

In terms of keeping the 'online' boolean itself that is really just for the graceful shutdown case right? If we got rid of it entirely the only issue users would have is that when shutting down a worker gracefully is_online would show True for an additional 30 seconds right?

#4 Updated by dalley almost 2 years ago

That's my understanding, yes

#5 Updated by dalley almost 2 years ago

Actually, there is another scenario. When a worker first goes offline we probably want to log some message about it. If this happens in a non-gentle way, we need some way to distinguish the workers which are supposed to be online (but not heartbeating) apart from the others so that we can print a message once, and only once, since we are keeping the old worker records around instead of deleting them. This would be accomplished by setting online=False after logging the message.

#6 Updated by bmbouter almost 2 years ago

Having the 'is_online' and 'online' will be confusing for developers. What about renaming 'online' to be 'gracefully_shutdown' which we could record to True during graceful shutdown and set to False in all other places? It would also be included in the viewset and would be user read-only. I like that better because it creates more value (knowledge) for the user and solves the user case.

What do you think about ^?

#7 Updated by dalley almost 2 years ago

That sounds good to me

#8 Updated by dalley almost 2 years ago

  • Description updated (diff)

#9 Updated by dkliban@redhat.com almost 2 years ago

bmbouter wrote:

Having the 'is_online' and 'online' will be confusing for developers. What about renaming 'online' to be 'gracefully_shutdown' which we could record to True during graceful shutdown and set to False in all other places? It would also be included in the viewset and would be user read-only. I like that better because it creates more value (knowledge) for the user and solves the user case.

What do you think about ^?

I would prefer 'online' and 'gracefully_stopped'.

#10 Updated by bmbouter almost 2 years ago

  • Description updated (diff)
  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes

I'm +1 on 'gracefully_stopped' also because it more clearly past tense and seems like simpler language.

I think this story is good to go, so I'm grooming.

#11 Updated by mhrivnak almost 2 years ago

  • Sprint/Milestone set to 47

#12 Updated by dalley almost 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley

#13 Updated by daviddavis almost 2 years ago

  • Related to Story #3143: As an authenticated user, I can filter workers. added

#14 Updated by rchan almost 2 years ago

  • Sprint/Milestone changed from 47 to 48

#15 Updated by daviddavis almost 2 years ago

  • Tags Pulp 3 MVP added

#16 Updated by rchan almost 2 years ago

  • Sprint/Milestone changed from 48 to 52

#17 Updated by dalley almost 2 years ago

  • Status changed from ASSIGNED to NEW

I still plan to do this but it's been moved to the very end of my list of priorities by other work.

#18 Updated by dalley almost 2 years ago

  • Assignee deleted (dalley)

#19 Updated by rchan almost 2 years ago

  • Sprint/Milestone changed from 52 to 53

#20 Updated by dalley over 1 year ago

  • Status changed from NEW to ASSIGNED

#21 Updated by dalley over 1 year ago

  • Assignee set to dalley

#22 Updated by jortel@redhat.com over 1 year ago

  • Sprint/Milestone changed from 53 to 54

#23 Updated by rchan over 1 year ago

  • Sprint/Milestone changed from 54 to 56

#24 Updated by dalley over 1 year ago

  • Assignee deleted (dalley)
  • Sprint/Milestone deleted (56)

#25 Updated by dalley over 1 year ago

  • Sprint/Milestone set to 56

#26 Updated by dalley over 1 year ago

  • Assignee set to dalley

#27 Updated by daviddavis over 1 year ago

Added a checklist item to create a testcase in pulp-smash ;)

#28 Updated by bmbouter over 1 year ago

  • Sprint set to Sprint 33

#29 Updated by bmbouter over 1 year ago

  • Sprint/Milestone deleted (56)

#30 Updated by jortel@redhat.com over 1 year ago

  • Sprint changed from Sprint 33 to Sprint 34

#31 Updated by dalley over 1 year ago

  • Status changed from ASSIGNED to POST

#32 Updated by dalley over 1 year ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#33 Updated by dkliban@redhat.com over 1 year ago

  • Sprint/Milestone set to 3.0

#34 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3, Pulp 3 MVP)

Please register to edit this issue

Also available in: Atom PDF