https://pulp.plan.io/https://pulp.plan.io/favicon.ico2018-07-03T14:37:52ZPulpPulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=292422018-07-03T14:37:52Zamacdona@redhat.comaustin@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-8 priority-6 priority-default closed" href="/issues/3767">Issue #3767</a>: Unable to save models with relation to Content with changeset</i> added</li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=292442018-07-03T14:37:57Zamacdona@redhat.comaustin@redhat.com
<ul><li><strong>Triaged</strong> changed from <i>No</i> to <i>Yes</i></li></ul><p>Whatever we come up with here, it might be relevant to take a look at this one also: <a href="https://pulp.plan.io/issues/3767" class="external">https://pulp.plan.io/issues/3767</a></p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=292602018-07-05T20:10:07Zdalleydalley@redhat.com
<ul><li><strong>Status</strong> changed from <i>NEW</i> to <i>ASSIGNED</i></li><li><strong>Assignee</strong> set to <i>dalley</i></li></ul><p>As discussed with Brian and David, first step is to make a flat version of a ContentUnit table and benchmark save() vs bulk_create() to see how much faster it actually is.</p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=294062018-07-10T19:20:41Zdalleydalley@redhat.com
<ul></ul><p>Results:</p>
<pre><code>(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 1000
1000 units: indidvidual save in seconds: 5.763491153717041
1000 units: bulk save in seconds: 0.1316215991973877
= 43.7x speedup
(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 5000
5000 units: indidvidual save in seconds: 28.695824146270752
5000 units: bulk save in seconds: 0.41101503372192383
= 69.8x speedup
(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 5000
5000 units: indidvidual save in seconds: 25.684953689575195
5000 units: bulk save in seconds: 0.4922950267791748
= 52.2x speedup
(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 5000
5000 units: indidvidual save in seconds: 25.778226375579834
5000 units: bulk save in seconds: 0.5030674934387207
=51.2x speedup
(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 5000
5000 units: indidvidual save in seconds: 28.895294189453125
5000 units: bulk save in seconds: 0.4287230968475342
=67.4x speedup
(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 10000
10000 units: indidvidual save in seconds: 51.1158390045166
10000 units: bulk save in seconds: 0.9658312797546387
= 52.9x speedup
</code></pre>
<p>Code:</p>
<p><a href="https://github.com/dralley/pulp_content_benchmarks/" class="external">https://github.com/dralley/pulp_content_benchmarks/</a></p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=294352018-07-11T18:32:50Zbmbouterbmbouter@redhat.com
<ul><li><strong>Status</strong> changed from <i>ASSIGNED</i> to <i>NEW</i></li><li><strong>Assignee</strong> deleted (<del><i>dalley</i></del>)</li></ul><p>Setting back to new because I'm not actively working on it.</p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=294362018-07-11T18:35:23Zbmbouterbmbouter@redhat.com
<ul><li><strong>Sprint Candidate</strong> changed from <i>No</i> to <i>Yes</i></li></ul><p>I updated the wrong issue, whoops. I think if its in new state though it will show up at sprint planning so maybe this is good.</p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=294382018-07-11T18:45:35Zdaviddavis
<ul><li><strong>Groomed</strong> changed from <i>No</i> to <i>Yes</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=294392018-07-11T19:05:31Zjortel@redhat.comjortel@redhat.com
<ul></ul><p>Looking at the code, the <em>individual saves</em> test is being done with a commit per insert which is the slowest way possible. The bulk create by nature will be doing a single commit. Each commit is very expensive. For the metric to be useful, both tests need to do the same number of commits. Suggest modifying the <em>individual saves</em> test to run in a single transaction.</p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=294632018-07-12T15:26:32Zdalleydalley@redhat.com
<ul></ul><p>New content benchmarks:</p>
<pre><code>(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num=1000
1000 multi-table content: individual save in seconds: 6.193058490753174
1000 multi-table content: individual save w/ transaction in seconds: 1.801978588104248
1000 single-table content: individual save w/ transaction in seconds: 1.0102925300598145
1000 single-table content: bulk save in seconds: 0.06991720199584961
(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num=1000
1000 multi-table content: individual save in seconds: 5.5913708209991455
1000 multi-table content: individual save w/ transaction in seconds: 1.853820562362671
1000 single-table content: individual save w/ transaction in seconds: 0.9765522480010986
1000 single-table content: bulk save in seconds: 0.07139015197753906
(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num=5000
5000 multi-table content: individual save in seconds: 25.336352109909058
5000 multi-table content: individual save w/ transaction in seconds: 8.87453031539917
5000 single-table content: individual save w/ transaction in seconds: 4.718692064285278
5000 single-table content: bulk save in seconds: 0.39882731437683105
(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num=5000
5000 multi-table content: individual save in seconds: 28.781572580337524
5000 multi-table content: individual save w/ transaction in seconds: 8.198915958404541
5000 single-table content: individual save w/ transaction in seconds: 4.499735116958618
5000 single-table content: bulk save in seconds: 0.4010334014892578
(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num=10000
10000 multi-table content: individual save in seconds: 57.38842058181763
10000 multi-table content: individual save w/ transaction in seconds: 16.86572265625
10000 single-table content: individual save w/ transaction in seconds: 9.358201503753662
10000 single-table content: bulk save in seconds: 0.8168251514434814
(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num=10000
10000 multi-table content: individual save in seconds: 58.31086325645447
10000 multi-table content: individual save w/ transaction in seconds: 16.490723609924316
10000 single-table content: individual save w/ transaction in seconds: 9.385928392410278
10000 single-table content: bulk save in seconds: 0.827233076095581
(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num=20000
20000 multi-table content: individual save in seconds: 114.19883108139038
20000 multi-table content: individual save w/ transaction in seconds: 34.82098698616028
20000 single-table content: individual save w/ transaction in seconds: 17.453803062438965
20000 single-table content: bulk save in seconds: 1.6993021965026855
(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num=20000
20000 multi-table content: individual save in seconds: 115.88576149940491
20000 multi-table content: individual save w/ transaction in seconds: 51.81009030342102
20000 single-table content: individual save w/ transaction in seconds: 24.00934410095215
20000 single-table content: bulk save in seconds: 2.081272840499878
</code></pre>
<p>benchmark.py code also updated.</p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=295372018-07-16T00:20:47Zdkliban@redhat.com
<ul><li><strong>Sprint</strong> set to <i>Sprint 40</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=297172018-07-25T21:19:17Zbmbouterbmbouter@redhat.com
<ul><li><strong>Status</strong> changed from <i>NEW</i> to <i>ASSIGNED</i></li><li><strong>Assignee</strong> set to <i>bmbouter</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=298762018-08-06T13:16:57Zrchan
<ul><li><strong>Sprint</strong> changed from <i>Sprint 40</i> to <i>Sprint 41</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=299332018-08-07T16:08:16Zjortel@redhat.comjortel@redhat.com
<ul><li><strong>Subject</strong> changed from <i>Pulp3 Content Units are not compatible with bulk_save</i> to <i>Pulp3 Content models are not compatible with bulk_save</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=299352018-08-07T16:18:43Zjortel@redhat.comjortel@redhat.com
<ul></ul><p>Please include a design (change) proposal that specifies:</p>
<ul>
<li>Proposed Content model hierarchy.</li>
<li>How content within a repository-version will be queried.</li>
<li>How content will be associated to a repository-version.</li>
</ul>
<p>so that it can be vetted before implementation.</p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=301132018-08-13T21:35:48Zbmbouterbmbouter@redhat.com
<ul></ul><p>This is a good article on MultiTable inheritance <a href="https://godjango.com/blog/django-abstract-base-class-multi-table-inheritance/" class="external">https://godjango.com/blog/django-abstract-base-class-multi-table-inheritance/</a>. It identifies that anytime you have a Concrete model inheriting from another concrete Django model you'll get multi-tale inheritance. That is specifically the thing that won't work with bulk_create according to their docs: <a href="https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create" class="external">https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create</a></p>
<p>So we have to make Content an abstract django model and have any subclasses, e.g. FileContent be the concrete model. This way you will be compatible with bulk_create(). With this change you can no longer make calls like <code>Content.objects</code>. becuase an abstract class does not have a Manager method, e.g. <code>objects</code>.</p>
<p>There are a few places where users or plugin writers treat mixed-type content as one type, e.g. the add_content(), remove_content() calls. These will need to move toward something that has distinct type anytime we deal with Queryset objects which cannot span single-tables.</p>
<p>This diff show the model changes that were necessary. The next step is to port the Querset areas to use a dictionary of Querysets, one per type. For example the content argument to <a href="https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/models/repository.py#L337" class="external">https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/models/repository.py#L337</a> would become:</p>
<pre><code>{
'pulp_ansible.AnsibleContent': 'QuerySetobjects',
'pulp_rpm.UpdateContent': 'QuerySetobjects'
}
</code></pre> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=303082018-08-27T13:03:19Zrchan
<ul><li><strong>Sprint</strong> changed from <i>Sprint 41</i> to <i>Sprint 42</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=308562018-09-17T17:18:58Zrchan
<ul><li><strong>Sprint</strong> changed from <i>Sprint 42</i> to <i>Sprint 43</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=312032018-10-03T16:38:44Zdalleydalley@redhat.com
<ul><li><strong>Assignee</strong> changed from <i>bmbouter</i> to <i>dalley</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=312902018-10-08T13:07:24Zrchan
<ul><li><strong>Sprint</strong> changed from <i>Sprint 43</i> to <i>Sprint 44</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=317282018-11-06T18:51:52Zrchan
<ul><li><strong>Sprint</strong> changed from <i>Sprint 44</i> to <i>Sprint 45</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=322062018-11-28T16:19:49Zdalleydalley@redhat.com
<ul><li><strong>Status</strong> changed from <i>ASSIGNED</i> to <i>CLOSED - WONTFIX</i></li><li><strong>Sprint</strong> deleted (<del><i>Sprint 45</i></del>)</li></ul><p>Closing as it looks like the model changes are unlikely to move forwards. The API changes will be added in a new story.</p> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=414632019-04-25T16:45:29Zdaviddavis
<ul><li><strong>Sprint/Milestone</strong> set to <i>3.0.0</i></li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=424272019-04-26T20:34:49Zbmbouterbmbouter@redhat.com
<ul><li><strong>Tags</strong> deleted (<del><i>Pulp 3</i></del>)</li></ul> Pulp - Issue #3812: Pulp3 Content models are not compatible with bulk_savehttps://pulp.plan.io/issues/3812?journal_id=583992020-06-16T20:59:51Zbmbouterbmbouter@redhat.com
<ul><li><strong>Tags</strong> <i>Performance</i> added</li><li><strong>Tags</strong> deleted (<del><i>Sync Performance</i></del>)</li></ul>