Story #3844
closedAs a plugin writer, I can use and customize a declarative, concurrent pipeline
0%
Description
Motivation¶
There are several use cases plugin writers have a hard time fulfilling easily with the current plguin API. There are distinct issues, but could represent a collective opportunity for resolution.
Customization Use Cases¶
In PR discussion, gmbnomis brought up an example where he needs to do make additional http calls for content units to be newly created during sync. If he is using a declarative interface he specifically isn't trying to determine on his own which content units need to be created versus those that exist. Making these calls later is both inefficient and could lead to correctness problems if a fatal exception is encountered after being saved with partial information.
The ideal functionality would be akin to adding a "step" in the middle.
Related data Use Cases¶
We've seen several examples of a Content units, e.g. AnsibleRoleVersion, that have ForeignKey relationships to other non-content unit data, e.g. AnsibleRole. During saving newly created AnsibleRoleVersion data may need to be related to existing AnsibleRole data and the generic core machinery doesn't know how to do that. Also different plugins may want different behaviors.
Validation Use Cases¶
Plugin writer's may want to prevent saving of new content units or Artifacts if they fail certain validation. For example when adding a new Content Unit, e.g. AnsibleRoleVersion, lint checks could be run on it to ensure it's quality meets the requirements.
Stream based end-to-end Use Case¶
The plugin writer wants to be able to start processing units (downloading, querying the db, saving, etc) without "all units" being available. This should include the downloading and fetching of initial metadata.
Declarative Use Case¶
To make plugin writer code as easy as possible, having them declare that state of the remote repository and having the core code do the rest is ideal.
Concurrency Use Case¶
Each of the stream processing steps should be able to be efficiently run concurrently. Also we want this concurrency to mix well with the concurrency already used by the downloaders (asyncio).
Possible Resolution¶
Use the producer consumer pattern of asyncio to create a linear pipeline of asyncio stages to create a RepositoryVersion from a stream of unsaved content units and unsaved Artifacts. Plugin writers can inject new, custom stages, reorganize/reuse existing ones, or remove stages to get the stream processing they need.
Overall Design Diagram: https://i.imgur.com/7cEXC5e.png
The design has 3 parts in the pulp/pulp PR.
a) The Stages API itself which is effectively the make_pipeline()
method
b) All of the stages that are already compatible with the Stages API. This is most of the code
c) DeclarativeVersion, an object which assembles a specific pipeline that can provide both sync_mode='additive/mirror' and lazy mode support without customization.
Core code is here: https://github.com/pulp/pulp/compare/master...bmbouter:introducing-asyncio-stages
The pulp_file code is here: https://github.com/pulp/pulp_file/compare/master...bmbouter:introducing-asyncio-stages
Todo list¶
- Tune the pipeline some. the Queue maxsize=100 may be too small.
- Add a limiter to the Artifact download stage that restricts the number of Artifact downloads in-flight.
- Update to use the bulk-create updates from https://pulp.plan.io/issues/3814
- Update to use the bulk-create updates from https://pulp.plan.io/issues/3813
- Add some docs
- use aiofiles to move Artifacts into place just before saving since bulk_create won't call save() for each
Related issues
Updated by bmbouter almost 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to bmbouter
Updated by bmbouter almost 5 years ago
- Status changed from ASSIGNED to POST
Updated by bmbouter almost 5 years ago
- Blocks Task #3890: Port pulp_file to use DeclarativeVersion added
Added by bmbouter almost 5 years ago
Added by bmbouter almost 5 years ago
half-working
Added by bmbouter almost 5 years ago
Added by bmbouter almost 5 years ago
Added by bmbouter almost 5 years ago
Working, but not bulk_create
Added by bmbouter almost 5 years ago
Working, but not bulk_create
Added by bmbouter almost 5 years ago
bulk_create does not work
Added by bmbouter almost 5 years ago
bulk_create does not work
Added by bmbouter almost 5 years ago
Working, but doesn't remove on mirror
Added by bmbouter almost 5 years ago
Working, but doesn't remove on mirror
Added by bmbouter almost 5 years ago
Last Stage working
Added by bmbouter almost 5 years ago
Last Stage working
Added by bmbouter almost 5 years ago
Moving docs fixes to its own PR
Added by bmbouter almost 5 years ago
Moving docs fixes to its own PR
Added by bmbouter almost 5 years ago
sync_mode works and new docstrings
Added by bmbouter almost 5 years ago
sync_mode works and new docstrings
Added by bmbouter almost 5 years ago
Adds some docs, committing before code changes
Added by bmbouter almost 5 years ago
Adds some docs, committing before code changes
Added by bmbouter almost 5 years ago
switched last stage to use QuerSet
Added by bmbouter almost 5 years ago
switched last stage to use QuerSet
Added by bmbouter almost 5 years ago
more docstrings
Added by bmbouter almost 5 years ago
more docstrings
Added by bmbouter almost 5 years ago
DeclarativeArtiface and DeclarativeContent objects
They were using namedtuples which were not ideal in two ways:
-
They could not be modified so updating them with new results from the database was problematic.
-
They couldn't provide validation for required versus optional fields.
Added by bmbouter almost 5 years ago
DeclarativeArtiface and DeclarativeContent objects
They were using namedtuples which were not ideal in two ways:
-
They could not be modified so updating them with new results from the database was problematic.
-
They couldn't provide validation for required versus optional fields.
Added by bmbouter almost 5 years ago
Add transactions and progress reporting
Added by bmbouter almost 5 years ago
Add transactions and progress reporting
Added by bmbouter almost 5 years ago
finished up docstrings
Added by bmbouter almost 5 years ago
finished up docstrings
Added by bmbouter almost 5 years ago
Replace sleep() with efficient get()
Added by bmbouter almost 5 years ago
Replace sleep() with efficient get()
Added by bmbouter almost 5 years ago
docstring the progress reporting
Added by bmbouter almost 5 years ago
docstring the progress reporting
Added by bmbouter almost 5 years ago
Make content_association a batch operation
With https://pulp.plan.io/issues/3852 the add_content and remove_content interfaces were switched to using querysets instead of item-by-item usage. This updates DeclarativeVersion and its stages to work with that change.
Overall this makes the content_association stage much more efficient.
This PR also adjusts the data type arriving into the content unit association stage to be DeclarativeVersion.
Added by bmbouter almost 5 years ago
Make content_association a batch operation
With https://pulp.plan.io/issues/3852 the add_content and remove_content interfaces were switched to using querysets instead of item-by-item usage. This updates DeclarativeVersion and its stages to work with that change.
Overall this makes the content_association stage much more efficient.
This PR also adjusts the data type arriving into the content unit association stage to be DeclarativeVersion.
Added by bmbouter almost 5 years ago
Adds max_downloads feature to downloader stage
Added by bmbouter almost 5 years ago
Adds max_downloads feature to downloader stage
Added by bmbouter almost 5 years ago
Introducing FirstStage
gmbnomis suggested some changes, so this PR implements them.
https://github.com/pulp/pulp_file/commit/c23bec069e12a485000f048e486e1d89cce65beb#r29701942
It has a bug in it, which I need to resolve. It also doesn't save the files correctly. I am still making Artifact save compatible with bulk_create()
Added by bmbouter almost 5 years ago
Introducing FirstStage
gmbnomis suggested some changes, so this PR implements them.
https://github.com/pulp/pulp_file/commit/c23bec069e12a485000f048e486e1d89cce65beb#r29701942
It has a bug in it, which I need to resolve. It also doesn't save the files correctly. I am still making Artifact save compatible with bulk_create()
Added by bmbouter almost 5 years ago
Fixes flake8 errors
Added by bmbouter almost 5 years ago
Fixes flake8 errors
Added by bmbouter almost 5 years ago
files now save in the backend with bulk_create
Added by bmbouter almost 5 years ago
files now save in the backend with bulk_create
Added by bmbouter almost 5 years ago
Fixed bug, it works again!
Added by bmbouter almost 5 years ago
Fixed bug, it works again!
Added by bmbouter almost 5 years ago
Ok now it's really working for the second sync too.
Added by bmbouter almost 5 years ago
Ok now it's really working for the second sync too.
Added by bmbouter almost 5 years ago
Removing accidental docs change
Added by bmbouter almost 5 years ago
Removing accidental docs change
Added by bmbouter almost 5 years ago
Fixes a bug caused by using ints not uuids
autoincrement fields cause you to have to get the db-assigned ids from the database after the first bulk_create(). Otherwise the RemoteContent.content_artifact attributes will be None.
Added by bmbouter almost 5 years ago
Fixes a bug caused by using ints not uuids
autoincrement fields cause you to have to get the db-assigned ids from the database after the first bulk_create(). Otherwise the RemoteContent.content_artifact attributes will be None.
Added by bmbouter almost 5 years ago
flake8 fixes
Added by bmbouter almost 5 years ago
flake8 fixes
Added by bmbouter almost 5 years ago
Do away with the closures
This replaces the closures with classes that can hold the required values. Closure's aren't entirely needed here.
Added by bmbouter almost 5 years ago
Do away with the closures
This replaces the closures with classes that can hold the required values. Closure's aren't entirely needed here.
Added by bmbouter almost 5 years ago
Breaking up huge module into a package of modules
All of the user facing plugin-API things are importable from pulpcore.plugin.stages.
Added by bmbouter almost 5 years ago
Breaking up huge module into a package of modules
All of the user facing plugin-API things are importable from pulpcore.plugin.stages.
Added by bmbouter almost 5 years ago
Some fixes related to file handling
- redo save() to inspect .pks instead
- replace the usage of shutil.move() with django's storage backend save() method.
Added by bmbouter almost 5 years ago
Some fixes related to file handling
- redo save() to inspect .pks instead
- replace the usage of shutil.move() with django's storage backend save() method.
Added by bmbouter almost 5 years ago
Adds docs for the stages package
Added by bmbouter almost 5 years ago
Adds docs for the stages package
Added by bmbouter almost 5 years ago
Lots of docs fixes
Added by bmbouter almost 5 years ago
Lots of docs fixes
Added by bmbouter almost 5 years ago
Updating stages w/ some clearer names
Added by bmbouter almost 5 years ago
Updating stages w/ some clearer names
Added by bmbouter almost 5 years ago
Various fixes from review
Mostly fixes class names for pep8 compliance
Added by bmbouter almost 5 years ago
Various fixes from review
Mostly fixes class names for pep8 compliance
Added by bmbouter almost 5 years ago
All stages are classes now
This is a big diff, but mainly due only to indention
Added by bmbouter almost 5 years ago
All stages are classes now
This is a big diff, but mainly due only to indention
Added by bmbouter almost 5 years ago
flake8 fix
Added by bmbouter almost 5 years ago
flake8 fix
Added by bmbouter almost 5 years ago
Final round of fixes
- switch storage backend to DefaultStorage()
- super() is now called first in anytime it's used
- FirstStage is removed
- BaseStage -> Stage
- the relative path now handles duplicate rel_paths correctly
- documenting args and kwargs
I did some final hand testing, and I also ensure the docs build and look good.
Added by bmbouter almost 5 years ago
Final round of fixes
- switch storage backend to DefaultStorage()
- super() is now called first in anytime it's used
- FirstStage is removed
- BaseStage -> Stage
- the relative path now handles duplicate rel_paths correctly
- documenting args and kwargs
I did some final hand testing, and I also ensure the docs build and look good.
Added by bmbouter almost 5 years ago
Fixing a bug where 2 artifacts are used
Without this commit, an exception will occur when one content unit has two or more Artifacts are associated with it.
Added by bmbouter almost 5 years ago
Fixing a bug where 2 artifacts are used
Without this commit, an exception will occur when one content unit has two or more Artifacts are associated with it.
Added by vdusek almost 5 years ago
Fixes string comparison
Added by vdusek almost 5 years ago
Fixes string comparison
Added by bmbouter almost 5 years ago
Have DeclarativeVersion validate download digests
The downloaders created by DeclarativeVersion didn't specify any digest or size information to use for validation even if it was available. I've tested this code and it worked for me.
Added by bmbouter almost 5 years ago
Have DeclarativeVersion validate download digests
The downloaders created by DeclarativeVersion didn't specify any digest or size information to use for validation even if it was available. I've tested this code and it worked for me.
Updated by bmbouter over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
half-working
https://pulp.plan.io/issues/3844 re #3844