Story #4016
closedAs a plugin writer I have the "batches" interface for the Stages API
100%
Description
This story should:
1. Add the "batches() interface" to the plugin API
2. Refactor the existing Stages API stages to use it
As quoted from the email link below:
https://www.redhat.com/archives/pulp-dev/2018-September/msg00013.html
The trigger for the discussion was to get rid of boilerplate code like
this [0], [1] to handle batches in the stages API. This becomes a
single line [2] when using an asynchronous generator [3]. Adding the
`batches()` async generator to Pulp core would simplify existing
stages and ease implementation of stages in plugins.
[0] https://github.com/pulp/pulp/blob/631031e38270c5c7c2b2289ff4ab87a058447c5e/plugin/pulpcore/plugin/stages/content_unit_stages.py#L47-L59
[1] https://github.com/pulp/pulp/blob/631031e38270c5c7c2b2289ff4ab87a058447c5e/plugin/pulpcore/plugin/stages/artifact_stages.py#L48-L60
[2] https://github.com/gmbnomis/pulp_cookbook/blob/ca4882cecab16995c5713d27131da8112a5f5a0c/pulp_cookbook/app/tasks/synchronizing.py#L98
[3] https://github.com/gmbnomis/pulp_cookbook/blob/d44ed593925b78c046e1b568810b15acbdad5ac4/pulp_cookbook/app/tasks/synchronizing.py#L26
Updated by bmbouter over 6 years ago
- Subject changed from As a plugin writer I have the "batcher interface" fr to As a plugin writer I have the "batches" interface for the Stages API
Updated by gmbnomis over 6 years ago
I could not find a way to assign this to myself(?)
I can work on this topic.
One question on ContentUnitSaver._pre_save
and ContentUnitSaver._post_save
: The documentation mentions that "... one item in batch is None indicating it is the end of the pipeline". Is this just a side-effect of the current implementation or is this "None" needed (for finalization)?
Updated by bmbouter over 6 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to gmbnomis
I updated your user permission so you can do a lot more in Redmine now (including take issues as assigned).
It's a side-effect of the Stages API finalizing the pipeline with None. As batches are passed into the hooks, this None value is not filtered out. It doesn't benefit the hook in my usage in pulp_rpm [0], but if None is filtered out there isn't a way for a batch to know it's the last one. I have no use case for that need though.
I'm open to changes here. How could this be better?
[0]: https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/tasks/synchronizing.py#L275-L314
Updated by gmbnomis over 6 years ago
bmbouter wrote:
I updated your user permission so you can do a lot more in Redmine now (including take issues as assigned).
Thanks a lot!
It's a side-effect of the Stages API finalizing the pipeline with None. As batches are passed into the hooks, this None value is not filtered out. It doesn't benefit the hook in my usage in pulp_rpm [0], but if None is filtered out there isn't a way for a batch to know it's the last one. I have no use case for that need though.
I'm open to changes here. How could this be better?
Currently, the None is not necessarily part of the last batch with declarative content. It may also happen that the None is the only element of the last batch. A subclass overriding these functions which needs some sort of "finalizing" must cope with this situation as well.
If we ever need such a functionality, we could mimic this and implement a _finalize() hook that is called when all batches have been processed. Currently, I would like to just drop the "None" part from the documentation of _pre_save
and _post_save
.
Updated by bmbouter over 6 years ago
gmbnomis wrote:
Currently, the None is not necessarily part of the last batch with declarative content. It may also happen that the None is the only element of the last batch. A subclass overriding these functions which needs some sort of "finalizing" must cope with this situation as well.
Agreed
If we ever need such a functionality, we could mimic this and implement a _finalize() hook that is called when all batches have been processed. Currently, I would like to just drop the "None" part from the documentation of
_pre_save
and_post_save
.
Agreed and +1 that sounds fine.
Updated by gmbnomis over 6 years ago
First PR (implementation) is at: https://github.com/pulp/pulp/pull/3643
Updated by gmbnomis over 6 years ago
PR now contains the refactoring of the existing stages as well
Added by gmbnomis over 6 years ago
Added by gmbnomis over 6 years ago
Revision 28ed6351 | View on GitHub
Add unit tests for plugin module, add Stages.batches unit tests
Added by gmbnomis over 6 years ago
Revision 93f4ac2b | View on GitHub
Refactor existing stages to use the async batches iterator
Added by gmbnomis over 6 years ago
Revision c6d64895 | View on GitHub
Add "batches" iterators to Stages API
This will simplify stages working on batches of DeclarativeContent a lot.
Added by gmbnomis over 6 years ago
Revision 28ed6351 | View on GitHub
Add unit tests for plugin module, add Stages.batches unit tests
Added by gmbnomis over 6 years ago
Revision 93f4ac2b | View on GitHub
Refactor existing stages to use the async batches iterator
Updated by gmbnomis over 6 years ago
- Status changed from ASSIGNED to MODIFIED
- % Done changed from 0 to 100
Applied in changeset pulp|93f4ac2ba9eeba8516b23873c47853505a4a9bfb.
Updated by bmbouter about 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Add "batches" iterators to Stages API
This will simplify stages working on batches of DeclarativeContent a lot.
ref #4016 https://pulp.plan.io/issues/4016