Refactor #2316
closedBatch save() calls to ProgressReport
100%
Description
This is a performance improvement story. As a Plugin developer when using an ancestor of pulp.app.models.ProgressReport every call to save() causes a round-trip write to the database.
For example as the documentation says:
>>> with ProgressSpinner('Publishing Metadata'):
>>> publish_metadata()
After discussion on the ticket the following functionality is needed:
- Have a call to increment() or save() skipped if it has been saved within the last 0.5 seconds. This implementation needs to also be for increment() in addition to save() because increment() will likely not use save() anymore after Issue #2318 is implemented. I believe update() does not call save().
- Only perform this behavior when we know the context manager is being used. This should be set by enter() as an instance attribute on the in-memory ProgressReport object.
- Update the class docstrings to identify this behavior for plugin writers
Updated by mhrivnak over 8 years ago
- Sprint Candidate changed from No to Yes
- Tags Pulp 3 added
The biggest problem will be with incrementing progress. We have lots of tasks already that do at least dozens, if not hundreds or more operations per second, and we definitely do not want to save the progress object that frequently. That would certainly impact performance, so I think we should prioritize this as part of the pulp 3 MVP. I see it less as an optimization and more of an effort to avoid creating a problem. Especially since it only takes a few lines of simple code , let's just do it.
As an example, even a yum repo sync can download 50+ rpms per second on a decent connection. A publish can go through several hundred rpms per second.
Here is a simple implementation from pulp 2:
Similar logic exists in the pulp 2 step system.
If we just put similar logic to that in the increment method, and ensure exit does one last update, that should take care of it.
Updated by bmbouter over 8 years ago
- Sprint Candidate changed from Yes to No
- Tags deleted (
Pulp 3)
A correct design is not that simple. We aren't guaranteed that users will use the increment() or context manager. If we skipping calls to save() and the plugin writer is using the model directly how can we be guaranteed that we aren't missing important/necessary calls to save().
I went to #django to ask about this problem because surely others have had this issue. The response 2 users gave to me was to not worry about it. They also cautioned me about ignoring save() calls. We don't have a design that provides correctness and we don't know that we need it yet. That is why I'm proposing we do not do this work for the Pulp3 MVP.
Updated by mhrivnak over 8 years ago
I'll take a closer look and make a more specific proposal. I don't think we should allow an individual task to do hundreds of DB writes per second when one or two would suffice. It's hard to imagine that not impacting performance, especially with multiple tasks running concurrently.
Updated by bmbouter over 8 years ago
Sounds good. Skipping the writes is easy enough, I think the crux of the issue is ensuring that a save() will occur later if some of the most recent saves (or just the very last one) was skipped. I don't know how to do that without a separate thread, or epoll callback loop, etc.
Maybe where we are going wrong is that we are trying to hide this from the plugin writer. Note they could just save less often and they can ensure correctness.
What about this as an idea, we could make it do this fancy behavior only if it is being used inside the context manager.
Updated by mhrivnak over 8 years ago
To be clear, I do not want to hide this behavior at all. I do want to choose default behaviors that make the most common use cases easy, but we should be very clear about this behavior and possibly allow the plugin writer to influence it.
I think we can wrap this save call in a little bit of logic that only saves at most once per some interval of time:
As you point out, we could only do the batching in the case where the object is being used as a context manager. That's a reasonable approach that I would be happy with.
To allow more control, we could optionally have a boolean property of the object, "batch_increments" or something like that, which defaults to False. A user could set it to True if they want, and it would also be set to True in "__enter__". This is probably overkill for a first cut, but it's an option for the future in case people want to do incrementing without using the object as a context manager.
I also think it would be fine to just do the batching in the increment() method, make sure that if "done == total" a save always happens, and clearly document that the user is responsible for doing a final save on the object when they are done incrementing it.
Any of these approaches is fine with me, but I do think we need to solve this for pulp 3. Can we add the pulp 3 and sprint candidate options back to this refactor?
Updated by bmbouter over 8 years ago
- Description updated (diff)
- Sprint Candidate changed from No to Yes
- Tags Pulp 3 added
I think having the "skip if saved recently" behavior when running within the context manager is a good compromise. I added the sprint candidate and Pulp 3 flags again. I updated the story too.
Updated by bizhang about 8 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to bizhang
Updated by bizhang almost 8 years ago
PR: https://github.com/pulp/pulp/pull/2986
I removed `increment() skip-save logic` from the checklist, since increment() still calls save() because https://pulp.plan.io/issues/2318 was resolved via documentation.
Added by werwty almost 8 years ago
Added by werwty almost 8 years ago
Revision 54b0a6a1 | View on GitHub
Batch save() calls to ProgressReport
Updated by werwty almost 8 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset pulp|54b0a6a193641c2d75fa6cdd37193ded771e7f4c.
Updated by bmbouter about 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Batch save() calls to ProgressReport
closes #2316 https://pulp.plan.io/issues/2316