Project

Profile

Help

Issue #4296

Stages API could deadlock when "discovering" content due to minsize

Added by bmbouter 10 months ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

This was originally described by @mdellweg in this comment. What he is describing is a pipeline where content is "discovered" as it goes, similar to what issue 4294 will support.

Generally there are situations where the pipeline is holding content in a batch which is needed to in-turn fill up the batch to minsize.

The Solution

We add a field `does_batch` to the `DeclarativeContent` that defaults to True.
This value can be overwritten by an argument to `__init__`, and it gets cleared, when a future is requested.
When does_block is false, the batching mechanism does not wait for any further content, that is not immediately available, to prevent blocking and thereby deadlocking.


Related issues

Blocks Docker Support - Refactor #4173: Change the multilayered design to use Futures to handle nested content MODIFIED Actions

Associated revisions

Revision 42f17e72 View on GitHub
Added by mdellweg 8 months ago

Prevent content with future to deadlock

DeclarativeContent with an attached future are meant to result in more
DeclarativeContent from the first stage. Therefore blocking them while
waiting for batches to fill, can result in deadlocks.

closes: #4296
https://pulp.plan.io/issues/4296

History

#1 Updated by bmbouter 10 months ago

  • Subject changed from Stages API could deadlock when "discovering" content to Stages API could deadlock when "discovering" content due to minsize
  • Description updated (diff)

#2 Updated by bmbouter 10 months ago

@mdellweg had an idea that the DeclarativeContent could be marked somehow for it to not be batched. This solves the issue by disabling batching without having to pass options to all of the stages.

Another option is to introduce a timer to Stage.batches. Overall I think that would be an important safety feature to avoiding deadlock. For example we could add a max_timer=1000 where max_timer is the maximum number of milliseconds to hold > 0 items before returning the batch even if minsize is not met. Maybe plugin writers would need to override this option also?

#3 Updated by mdellweg 9 months ago

In my last version of a proposed solution, i used the tagged DeclarativeContent objects to stop waiting for the current batch to fill. So the would not circumvent the batching completely, just cutting them shorter.

And there is another solution i can think of:
We could introduce, like `None` as the pipeline ends marker, a `flush` marker that thaws all pending batches on the way, but not finish the loop. This would, however, require that the DCs do not reorder from step to step.

#4 Updated by bmbouter 9 months ago

mdellweg wrote:

In my last version of a proposed solution, i used the tagged DeclarativeContent objects to stop waiting for the current batch to fill. So the would not circumvent the batching completely, just cutting them shorter.

I imagined tagging all DeclarativeContent, but you're identifying you would just tag some of them. Can you describe that more? How do you know which DeclarativeContent items could cause deadlock?

And there is another solution i can think of:
We could introduce, like `None` as the pipeline ends marker, a `flush` marker that thaws all pending batches on the way, but not finish the loop. This would, however, require that the DCs do not reorder from step to step.

We have some stages that do reordering now so I already have that assumption as a property of the Stages API.

#5 Updated by CodeHeeler 9 months ago

  • Triaged changed from No to Yes

#6 Updated by mdellweg 9 months ago

bmbouter wrote:

I imagined tagging all DeclarativeContent, but you're identifying you would just tag some of them. Can you describe that more? How do you know which DeclarativeContent items could cause deadlock?

I think it is as simple as "Can this content unit give raise to more content units? Then make it prioritized."
In the case of Debian Repositories, the discovery chain has 3 Levels: Release -> PackageIndex -> Package. So I give the priority Flag to all but the last Level. Since the hugely overwhelming part of content units are leafs in this picture, this i not a big performance concern.

If you only have one content type, and you never know in advance, whether it can produce more, you are out of luck here. Then I could think of a kind of barrier (flush marker), that gets inserted whenever the first stage has nothing more to add, and you must make sure, no content unit get's overtaken by that barrier (It might help performance a little if the barrier was semipermeable).

#7 Updated by amacdona@redhat.com 9 months ago

  • Blocks Refactor #4173: Change the multilayered design to use Futures to handle nested content added

#8 Updated by mdellweg 9 months ago

Before changing the flow control of the pipeline, i would like to change the plugin api in a way, that the abstracts the interface to the queues away from plugin writers.

RFC: https://github.com/mdellweg/pulpcore-plugin/tree/refactor_stages

#9 Updated by bherring 9 months ago

#10 Updated by bherring 9 months ago

#11 Updated by bmbouter 8 months ago

  • Status changed from NEW to POST
  • Assignee set to mdellweg

#12 Updated by mdellweg 8 months ago

  • Description updated (diff)

#13 Updated by mdellweg 8 months ago

  • Status changed from POST to MODIFIED

#14 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#15 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF