Project

Profile

Help

Story #3844

closed

As a plugin writer, I can use and customize a declarative, concurrent pipeline

Added by bmbouter over 6 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Sprint:
Sprint 40
Quarter:

Description

Motivation

There are several use cases plugin writers have a hard time fulfilling easily with the current plguin API. There are distinct issues, but could represent a collective opportunity for resolution.

Customization Use Cases

In PR discussion, gmbnomis brought up an example where he needs to do make additional http calls for content units to be newly created during sync. If he is using a declarative interface he specifically isn't trying to determine on his own which content units need to be created versus those that exist. Making these calls later is both inefficient and could lead to correctness problems if a fatal exception is encountered after being saved with partial information.

The ideal functionality would be akin to adding a "step" in the middle.

Related data Use Cases

We've seen several examples of a Content units, e.g. AnsibleRoleVersion, that have ForeignKey relationships to other non-content unit data, e.g. AnsibleRole. During saving newly created AnsibleRoleVersion data may need to be related to existing AnsibleRole data and the generic core machinery doesn't know how to do that. Also different plugins may want different behaviors.

Validation Use Cases

Plugin writer's may want to prevent saving of new content units or Artifacts if they fail certain validation. For example when adding a new Content Unit, e.g. AnsibleRoleVersion, lint checks could be run on it to ensure it's quality meets the requirements.

Stream based end-to-end Use Case

The plugin writer wants to be able to start processing units (downloading, querying the db, saving, etc) without "all units" being available. This should include the downloading and fetching of initial metadata.

Declarative Use Case

To make plugin writer code as easy as possible, having them declare that state of the remote repository and having the core code do the rest is ideal.

Concurrency Use Case

Each of the stream processing steps should be able to be efficiently run concurrently. Also we want this concurrency to mix well with the concurrency already used by the downloaders (asyncio).

Possible Resolution

Use the producer consumer pattern of asyncio to create a linear pipeline of asyncio stages to create a RepositoryVersion from a stream of unsaved content units and unsaved Artifacts. Plugin writers can inject new, custom stages, reorganize/reuse existing ones, or remove stages to get the stream processing they need.

Overall Design Diagram: https://i.imgur.com/7cEXC5e.png

The design has 3 parts in the pulp/pulp PR.

a) The Stages API itself which is effectively the make_pipeline() method
b) All of the stages that are already compatible with the Stages API. This is most of the code
c) DeclarativeVersion, an object which assembles a specific pipeline that can provide both sync_mode='additive/mirror' and lazy mode support without customization.

Core code is here: https://github.com/pulp/pulp/compare/master...bmbouter:introducing-asyncio-stages

The pulp_file code is here: https://github.com/pulp/pulp_file/compare/master...bmbouter:introducing-asyncio-stages

Todo list

  • Tune the pipeline some. the Queue maxsize=100 may be too small.
  • Add a limiter to the Artifact download stage that restricts the number of Artifact downloads in-flight.
  • Update to use the bulk-create updates from https://pulp.plan.io/issues/3814
  • Update to use the bulk-create updates from https://pulp.plan.io/issues/3813
  • Add some docs
  • use aiofiles to move Artifacts into place just before saving since bulk_create won't call save() for each

Related issues

Blocks File Support - Task #3890: Port pulp_file to use DeclarativeVersionCLOSED - CURRENTRELEASEbmbouter

Actions

Also available in: Atom PDF