Project

Profile

Help

Story #20

As a user, my applicability data is calculated in parallel

Added by Anonymous over 5 years ago. Updated over 1 year ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
2.8.0
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:

Description

Our applicability algorithm would be straightforward to convert into a parallel operation, wherein each consumer's or each repo's applicability calculation could be done as independent Celery tasks. This would allow Pulp to calculate applicability n times faster, where n is the number of Celery workers available.


Related issues

Blocked by Pulp - Story #1206: As an API user, I can get summary status for a task groupCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision 0ecc2dfd View on GitHub
Added by dkliban@redhat.com over 4 years ago

Parallelizes applicability regeneration for updated repository

This patch provides a new Celery task for performing applicability regenration for a batch of applicability profiles. The ApplicabilityRegenerationManager dispatches a series of tasks with the same group id. Each task is dispatched with a list of up to 10 RepoProfileApplicabilities to reevaluate.

The API endpoint for generating content applicability for updated repositories changes as part of this patch. Instead of returning 202 with a call report, the server returns 202 with a group call report.

This patch does not make any changes to the algorithm used to calculate content applicability.

https://pulp.plan.io/issues/20 closes #20

Revision 0ecc2dfd View on GitHub
Added by dkliban@redhat.com over 4 years ago

Parallelizes applicability regeneration for updated repository

This patch provides a new Celery task for performing applicability regenration for a batch of applicability profiles. The ApplicabilityRegenerationManager dispatches a series of tasks with the same group id. Each task is dispatched with a list of up to 10 RepoProfileApplicabilities to reevaluate.

The API endpoint for generating content applicability for updated repositories changes as part of this patch. Instead of returning 202 with a call report, the server returns 202 with a group call report.

This patch does not make any changes to the algorithm used to calculate content applicability.

https://pulp.plan.io/issues/20 closes #20

History

#2 Updated by rbarlow about 5 years ago

  • Groomed set to No
  • Sprint Candidate set to Yes

#3 Updated by rbarlow about 5 years ago

It might be worth thinking about whether we can make a patch that will apply cleanly against 2.4 since there are users who are having problems with DB cursor timeouts. Patching against 2.6 might also be fine if we are comfortable requiring users to upgrade to a newer Pulp to fix this.

#4 Updated by rbarlow about 5 years ago

On 06/04/2015 11:00 AM, Pulp wrote:

It might be worth thinking about whether we can make a patch that will
apply cleanly against 2.4 since there are users who are having problems
with DB cursor timeouts. Patching against 2.6 might also be fine if we
are comfortable requiring users to upgrade to a newer Pulp to fix this.

On second thought, this might have to be done with "spawned tasks" which
would change the API to the task. One way to work around this not being
backwards-incompatible would be to add an optional boolean to the API
call that lets the user state whether they want to do the calculation in
parallel or not, and if the bool isn't provided we default to the
current behavior. Then, with Pulp 3.0 we can just change to always doing
it in parallel and drop the boolean.

--
Randy Barlow

#5 Updated by mhrivnak about 5 years ago

  • Priority changed from Normal to High

#6 Updated by dkliban@redhat.com about 5 years ago

Here is a possible implementation:

Define TaskMonitorTask as a regular celery task that takes two parameters: 'parent_task_id' and 'tasks'. 'tasks' is a list of task id's for tasks that need to be monitored. The task will check the status of all tasks in the list and then update the status of parent task. If not all of the tasks are in a final state, the task dispatches itself again with a list of remaining tasks and the same parent task id. Each time this task is dispatched with a delay of 5 minutes or another configurable value.

Define RepoProfileApplicabilityCalculation task as a celery task that takes an existing repo profile applicability and perform the work here [0]

Create a new RepoApplicabilityCalculationTask as a Pulp Task that will dispatch 1 RepoProfileApplicabilityCalculation task for each repo applicability profile that needs to be updated. Then it dispatches TaskMonitorTask and passes it the list of RepoProfileApplicabilityCalculation tasks that were dispatched and the id of itself (RepoApplicabilityCalculationTask)

[0] https://github.com/pulp/pulp/blob/2.6-dev/server/pulp/server/managers/consumer/applicability.py#L141:L158

#7 Updated by mhrivnak about 5 years ago

  • Groomed changed from No to Yes

Please review the final plan with the team before implementing.

This diff must apply cleanly on 2.6, but may have to be released with 2.7.

#8 Updated by bmbouter about 5 years ago

@dkliban When you say RepoApplicabilityCalculationTask is a Pulp task do you mean it inherits from Pulp's base Task? If so then it will be auto-marked as completed as soon as it is finished because of the on_success or on_failure handlers that provides.

That would need to be somehow disabled and the final call to TaskMonitorTask would need to set it specifically. What can we do in that area?

Also one other important point to consider is using apply_async versus apply_async_with_reservation. Does the task RepoApplicabilityCalculationTask need a reservation to ensure a repo operation doesn't happen underneath it? What do you think?

#9 Updated by dkliban@redhat.com about 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com

I looked into using Celery chords to do this work, however, I have discovered that Celery chords rely on using the results backend [0]. Since we are trying to move away from depending on the results backend, Brian and I have come up with a plan to introduce an implementation of ParallelTasks using the TaskStatus in the database. I'll update this story once I have the plan fully written out.

[0] http://blog.untrod.com/2015/03/how-celery-chord-synchronization-works.html

#10 Updated by rbarlow about 5 years ago

On 07/21/2015 02:41 PM, Pulp wrote:

Since we are trying to move away from depending on the results backend

IMO, it's OK to use the broker as a results backend for this purpose.
Have you considered that since it may be easier?

#11 Updated by dkliban@redhat.com almost 5 years ago

  • Blocked by Story #1205: As a developer I can dispatch a task that can dispatch a group of tasks added

#12 Updated by dkliban@redhat.com almost 5 years ago

  • Blocks Story #1206: As an API user, I can get summary status for a task group added

#13 Updated by dkliban@redhat.com almost 5 years ago

  • Blocks deleted (Story #1206: As an API user, I can get summary status for a task group)

#14 Updated by dkliban@redhat.com almost 5 years ago

  • Blocked by Story #1206: As an API user, I can get summary status for a task group added

#15 Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from ASSIGNED to NEW

#16 Updated by mhrivnak almost 5 years ago

  • Assignee deleted (dkliban@redhat.com)
  • Platform Release set to 2.8.0

#18 Updated by mhrivnak over 4 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com

#19 Updated by dkliban@redhat.com over 4 years ago

  • Status changed from ASSIGNED to POST

#20 Updated by dkliban@redhat.com over 4 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#21 Updated by dkliban@redhat.com over 4 years ago

  • Blocked by deleted (Story #1205: As a developer I can dispatch a task that can dispatch a group of tasks)

#22 Updated by rbarlow over 4 years ago

  • Status changed from MODIFIED to 5

#23 Updated by dkliban@redhat.com over 4 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE

#25 Updated by bmbouter over 1 year ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF