Project

Profile

Help

Story #6858

closed

As a user I can track progress of the task group with a task group progress report

Added by ipanova@redhat.com almost 4 years ago. Updated almost 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
No
Tags:
Sprint:
Sprint 76
Quarter:

Description

GroupProgressReport will be a similar model as ProgressReport.

Each group progress report will have a message, code, done, total, and relation to the TaskGroup. Plugin writers will create these objects to show progress of work that is expected to be completed by the tasks in the group.

Tasks that belong to the TaskGroup will update the progress report. All group progress reports needs to be created in advance so a Task can find the appropriate one, by code or message and update it. Tasks will need to handle logic and figure out what exactly they need to update in the group progress report. For example a task in the migration plugin called complex repo migration will create a repo version, publication and distribution. That means, the task will update 3 group progress reports.

To avoid race conditions / cache invalidation issues, this pattern needs to be used so that operations are performed directly inside the database

.update(done=F('done') + 1)

See: https://docs.djangoproject.com/en/3.0/ref/models/expressions/#f-expressions

Question: How do we figure out 'total' per each group report?

=====================

Alternative Solution (a modification of the earlier "progressreport aggregation" strategy): on the TaskGroup serializer add another field called progress_report. It will query the db and look for tasks that belong to the group and aggregate the results by task 'code'. For example group will have 4 syncing tasks, each task has code 'sync', in the aggregated report there will be total of 4 syncing repos and based on the each tasks status the done will be calculated. This implementation is limited to the name of the task, and it does not have that much flexibility in case tasks creates more resources. For example: a task in the migration plugin called complex repo migration will create a repo version, publication and distribution. However in the progress report field it will only record that '1 complex repo migration done'.


Related issues

Blocks Migration Plugin - Story #6769: As a user, I can track the progress of pulp2->pulp3 migrationsCLOSED - CURRENTRELEASEipanova@redhat.com

Actions
Actions #1

Updated by ipanova@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #2

Updated by ipanova@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #3

Updated by ipanova@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #4

Updated by ipanova@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #5

Updated by dalley almost 4 years ago

  • Description updated (diff)
Actions #6

Updated by bmbouter almost 4 years ago

I believe users have needs to understand the overall progress of a group of tasks and distinctly see the progress of a single task. To the extent that is true, the design to have TaskGroupProgress be totally new objects that don't aggregate will force plugin writers to intentionally think about what progress is reported at each level. This is an application of the explicit is better than implicit design principal. The aggregation design would work in most cases, and be easier for plugin writers, but I don't think as strong for users.

For the question on figuring out 'total', generally the plugin writer sets it based on the workload they understand. I don't expect GroupProgressReport to make this race condition free in all cases. Here are three scenarios I think about regarding how various workloads could handle this depending on various needs.

  • There is no race condition on 'total'. The 'total' is known at some point and is only ever set once. No one but the user ever reads it.

  • There is only a write-read race condition on 'total'. In this case the writer sets it, calls save() and other tasks that read it, they can't be guaranteed 'total' is up to date unless it's set once. In the case it's set multiple times, they would need to have some sort of synchronization but TaskGroupProgress would not handle this for them in any way. I think this is also unlikely to be needed.

  • There is a write-write race condition. In this case multiple processes are writing to 'total' based on portions of the work they are discovering. In this case the F() values are the way. Or someone can use a database transaction and handle the transaction-failed errors when one saves and the other doesn't.

Overall I don't think TaskGroupProgress needs to provide much except a F() based implementation for implementing the 'done' count because that's the one that is likeliest to be incremented across multiple processes.

Actions #7

Updated by daviddavis almost 4 years ago

The TaskGroupProgress seems reasonable to me. I'm guessing we'll probably have to use F() to update totals.

The 'total' is known at some point and is only ever set once.

Could you give more information about how this would work?

Actions #8

Updated by bmbouter almost 4 years ago

daviddavis wrote:

The TaskGroupProgress seems reasonable to me. I'm guessing we'll probably have to use F() to update totals.

The 'total' is known at some point and is only ever set once.

Could you give more information about how this would work?

Sure. The import/export example I think is this case actually. IIRC, for imports, the first task to run reads the archive to import and determines how many/which repos need updating and dispatches one "sub-task" for each of them. The first task after reading the number of repos that need updating would set total when it is known. The sub-tasks work through the work but do not modify total again.

Actions #9

Updated by ttereshc almost 4 years ago

I agree that we might find a way to set the total only once for the current use cases. I'm not entirely sure about the migration plugin because some of its work can be identified only during the migration itself but there is a chance that all those items are happening before subtasks for more well defined scope are dispatched.

+1 to start with F() for done. And if we have a use case, we can add it for total later.

Actions #10

Updated by ipanova@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #11

Updated by ipanova@redhat.com almost 4 years ago

  • Blocks Story #6769: As a user, I can track the progress of pulp2->pulp3 migrations added
Actions #12

Updated by ipanova@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #13

Updated by ipanova@redhat.com almost 4 years ago

  • Groomed changed from No to Yes
  • Sprint set to Sprint 74
Actions #14

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 74 to Sprint 75
Actions #15

Updated by ipanova@redhat.com almost 4 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ipanova@redhat.com

Added by ipanova@redhat.com almost 4 years ago

Revision 104aed7b | View on GitHub

Add GroupProgressReport model and serializer.

closes #6858

https://pulp.plan.io/issues/6858

Actions #16

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 75 to Sprint 76
Actions #18

Updated by ipanova@redhat.com almost 4 years ago

  • Status changed from ASSIGNED to MODIFIED
  • % Done changed from 0 to 100
Actions #19

Updated by fao89 almost 4 years ago

  • Sprint/Milestone set to 3.5.0
Actions #20

Updated by pulpbot almost 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF