Project

Profile

Help

Story #6374

closed

[Epic] As a user, performance can be improved by increasing number of workers

Added by ttereshc over 4 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

Motivation

There are parts of the migration process that can be run in parallel in multiple tasks. Even by having different parts run in parallel in coroutines, database requests will still be sequential. The parts which can run in parallel - content migration and repo version/publication/distribution creation - currently consumes 70% of re-migration time.

Proposal

Dispatch multiple tasks for the parts that can be run in parallel.

Workflow

  1. User creates a MigrationPlan and triggers a migration task.
  2. A migration task:
    • performs any migration validations if needed
    • creates a migration report
    • dispatches a poll task
    • finishes and is marked as complete
  3. User tracks the migration progress by looking at the migration report
  4. The poll task:
    • checks the progress of migration by looking at the migration report and the status of the tasks from the migration report and acts accordingly
      • if a migration step is complete, dispatch one or more tasks for the next step
      • if the migration step is not complete, re-dispatch itself
      • If the overall status of the migration report is set to 'cancelled', the poll task should finish and not dispatch itself
      • if the last step of migration is finished, the poll task should finish and not dispatch itself
  5. To cancel the migration user needs to update the state in the migration report to 'cancel'.
MigrationReport

There needs to be a MigrationReport object which will track the migration progress, including the tasks being dispatched. It should be accessible at the dedicated endpoint /pulp/api/v3/migrationreport/. It is created by the task which initiates the migration process.

To prevent triggering multiple migrations by user, no MigrationReport can be created if there is a MigrationReport in the unfinished state.

At a minimum, a migration report has:

  • migration plan href
  • status of the migration
  • start/finish timestamps
  • for each progress step:
    • task href
    • status
    • code (string, id of the step)
    • total/done counters

GET /pulp/api/v3/migrationreport//

  • to query a specific MigrationReport

PATCH /pulp/api/v3/migrationreport//

  • allows to update only the status of the migration to cancel any unfinished tasks related to this migration
  • before updating the status of migration report to 'cancel', cancel all the tasks for the migration
Poll task

There needs to be a poll task which serves as a synchronisation mechanism for multiple tasks between different steps of the migration process. It checks the progress of migration, re-dispatches itself when done, dispatches a new batch of tasks for the next step of the migration when needed.

It needs to be aware of :

  • the order of the migration steps
  • which tasks to trigger for each of them
  • when the migration step is done
Tasks to run in parallel
  • Pre-migration is run as a single task.
  • Content migration can be run in multiple tasks, the order of content migration has to be preserved
  • Creation of repo versions, publications, distributions can be run in multiple tasks.

Also available in: Atom PDF