https://pulp.plan.io/https://pulp.plan.io/favicon.ico2016-10-04T14:26:50ZPulpPulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=148572016-10-04T14:26:50Zmhrivnakmhrivnak@redhat.com
<ul><li><strong>Sprint Candidate</strong> changed from <i>No</i> to <i>Yes</i></li><li><strong>Tags</strong> <i>Pulp 3</i> added</li></ul><p>The biggest problem will be with incrementing progress. We have lots of tasks already that do at least dozens, if not hundreds or more operations per second, and we definitely do not want to save the progress object that frequently. That would certainly impact performance, so I think we should prioritize this as part of the pulp 3 MVP. I see it less as an optimization and more of an effort to avoid creating a problem. Especially since it only takes a few lines of simple code , let's just do it.</p>
<p>As an example, even a yum repo sync can download 50+ rpms per second on a decent connection. A publish can go through several hundred rpms per second.</p>
<p>Here is a simple implementation from pulp 2:</p>
<p><a href="https://github.com/pulp/nectar/blob/4fe7327c4ad1bb2f8c040b75d80306ee047de615/nectar/downloaders/threaded.py#L301-L303" class="external">https://github.com/pulp/nectar/blob/4fe7327c4ad1bb2f8c040b75d80306ee047de615/nectar/downloaders/threaded.py#L301-L303</a></p>
<p>Similar logic exists in the pulp 2 step system.</p>
<p>If we just put similar logic to that in the increment method, and ensure <em>exit</em> does one last update, that should take care of it.</p> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=148592016-10-04T14:46:19Zbmbouterbmbouter@redhat.com
<ul><li><strong>Sprint Candidate</strong> changed from <i>Yes</i> to <i>No</i></li><li><strong>Tags</strong> deleted (<del><i>Pulp 3</i></del>)</li></ul><p>A correct design is not that simple. We aren't guaranteed that users will use the increment() or context manager. If we skipping calls to save() and the plugin writer is using the model directly how can we be guaranteed that we aren't missing important/necessary calls to save().</p>
<p>I went to #django to ask about this problem because surely others have had this issue. The response 2 users gave to me was to not worry about it. They also cautioned me about ignoring save() calls. We don't have a design that provides correctness and we don't know that we need it yet. That is why I'm proposing we do not do this work for the Pulp3 MVP.</p> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=148672016-10-04T15:16:00Zmhrivnakmhrivnak@redhat.com
<ul></ul><p>I'll take a closer look and make a more specific proposal. I don't think we should allow an individual task to do hundreds of DB writes per second when one or two would suffice. It's hard to imagine that not impacting performance, especially with multiple tasks running concurrently.</p> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=148682016-10-04T15:23:17Zbmbouterbmbouter@redhat.com
<ul></ul><p>Sounds good. Skipping the writes is easy enough, I think the crux of the issue is ensuring that a save() will occur later if some of the most recent saves (or just the very last one) was skipped. I don't know how to do that without a separate thread, or epoll callback loop, etc.</p>
<p>Maybe where we are going wrong is that we are trying to hide this from the plugin writer. Note they could just save less often and they can ensure correctness.</p>
<p>What about this as an idea, we could make it do this fancy behavior only if it is being used inside the context manager.</p> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=148732016-10-04T15:38:46Zmhrivnakmhrivnak@redhat.com
<ul></ul><p>To be clear, I do not want to hide this behavior at all. I do want to choose default behaviors that make the most common use cases easy, but we should be very clear about this behavior and possibly allow the plugin writer to influence it.</p>
<p>I think we can wrap this save call in a little bit of logic that only saves at most once per some interval of time:</p>
<p><a href="https://github.com/pulp/pulp/blob/f44599980274ec4cf69594f84882ca90273041cf/app/pulp/app/models/progress.py#L208" class="external">https://github.com/pulp/pulp/blob/f44599980274ec4cf69594f84882ca90273041cf/app/pulp/app/models/progress.py#L208</a></p>
<p>As you point out, we could only do the batching in the case where the object is being used as a context manager. That's a reasonable approach that I would be happy with.</p>
<p>To allow more control, we could optionally have a boolean property of the object, "batch_increments" or something like that, which defaults to False. A user could set it to True if they want, and it would also be set to True in "__enter__". This is probably overkill for a first cut, but it's an option for the future in case people want to do incrementing without using the object as a context manager.</p>
<p>I also think it would be fine to just do the batching in the increment() method, make sure that if "done == total" a save always happens, and clearly document that the user is responsible for doing a final save on the object when they are done incrementing it.</p>
<p>Any of these approaches is fine with me, but I do think we need to solve this for pulp 3. Can we add the pulp 3 and sprint candidate options back to this refactor?</p> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=148772016-10-04T16:37:18Zbmbouterbmbouter@redhat.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/14877/diff?detail_id=15634">diff</a>)</li><li><strong>Sprint Candidate</strong> changed from <i>No</i> to <i>Yes</i></li><li><strong>Tags</strong> <i>Pulp 3</i> added</li></ul><p>I think having the "skip if saved recently" behavior when running within the context manager is a good compromise. I added the sprint candidate and Pulp 3 flags again. I updated the story too.</p> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=151812016-10-20T13:47:02Zmhrivnakmhrivnak@redhat.com
<ul><li><strong>Groomed</strong> changed from <i>No</i> to <i>Yes</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=159412016-11-21T16:06:22Zmhrivnakmhrivnak@redhat.com
<ul><li><strong>Sprint/Milestone</strong> set to <i>29</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=160832016-11-29T21:30:58Zbizhangbizhang@redhat.com
<ul><li><strong>Status</strong> changed from <i>NEW</i> to <i>ASSIGNED</i></li><li><strong>Assignee</strong> set to <i>bizhang</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=162282016-12-05T17:57:38Zmhrivnakmhrivnak@redhat.com
<ul><li><strong>Sprint/Milestone</strong> changed from <i>29</i> to <i>30</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=185532017-03-16T15:06:23Zmhrivnakmhrivnak@redhat.com
<ul><li><strong>Sprint/Milestone</strong> changed from <i>30</i> to <i>36</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=189412017-04-05T02:23:02Zbizhangbizhang@redhat.com
<ul><li><strong>Status</strong> changed from <i>ASSIGNED</i> to <i>POST</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=190492017-04-10T12:59:27Zmhrivnakmhrivnak@redhat.com
<ul><li><strong>Sprint/Milestone</strong> changed from <i>36</i> to <i>37</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=190942017-04-11T18:00:16Zbizhangbizhang@redhat.com
<ul></ul><p>PR: <a href="https://github.com/pulp/pulp/pull/2986" class="external">https://github.com/pulp/pulp/pull/2986</a></p>
<p>I removed `increment() skip-save logic` from the checklist, since increment() still calls save() because <a href="https://pulp.plan.io/issues/2318" class="external">https://pulp.plan.io/issues/2318</a> was resolved via documentation.</p> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=191652017-04-13T13:06:08Zwerwtybihan.zh@gmail.com
<ul><li><strong>Status</strong> changed from <i>POST</i> to <i>MODIFIED</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset <a class="changeset" title="Batch save() calls to ProgressReport closes #2316 https://pulp.plan.io/issues/2316" href="https://pulp.plan.io/projects/pulp/repository/pulp/revisions/54b0a6a193641c2d75fa6cdd37193ded771e7f4c">pulp|54b0a6a193641c2d75fa6cdd37193ded771e7f4c</a>.</p> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=259122018-03-08T23:09:06Zbmbouterbmbouter@redhat.com
<ul><li><strong>Sprint</strong> set to <i>Sprint 18</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=259292018-03-08T23:12:43Zbmbouterbmbouter@redhat.com
<ul><li><strong>Sprint/Milestone</strong> deleted (<del><i>37</i></del>)</li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=417352019-04-25T16:47:07Zdaviddavis
<ul><li><strong>Sprint/Milestone</strong> set to <i>3.0.0</i></li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=430482019-04-26T20:39:27Zbmbouterbmbouter@redhat.com
<ul><li><strong>Tags</strong> deleted (<del><i>Pulp 3</i></del>)</li></ul> Pulp - Refactor #2316: Batch save() calls to ProgressReporthttps://pulp.plan.io/issues/2316?journal_id=506332019-12-13T17:16:26Zbmbouterbmbouter@redhat.com
<ul><li><strong>Status</strong> changed from <i>MODIFIED</i> to <i>CLOSED - CURRENTRELEASE</i></li></ul>