Issue #3812: Pulp3 Content models are not compatible with bulk_save - Pulp

Actions

Send by e-mail Copy link

Issue #3812

closed

File Support - Issue #3770: Pulp 3 is about 2x slower than pulp 2 in syncing a large file repo

Pulp3 Content models are not compatible with bulk_save

Added by bmbouter about 6 years ago. Updated about 4 years ago.

Status:

CLOSED - WONTFIX

Priority:

Normal

Assignee:

dalley

Category:

Sprint/Milestone:

3.0.0

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version:

Platform Release:

OS:

Triaged:

Yes

Groomed:

Yes

Sprint Candidate:

Yes

Tags:

Performance

Sprint:

Quarter:

Description

Motivation¶

Content Units are not compatible with bulk_save() because they use Master/Detail and therefore multi-table inheritance which fails by raising a: ValueError: Can't bulk create a multi-table inherited model Exception.

Solution¶

TBD, write ideas in comments

Related issues

Actions

Copy link

Updated by amacdona@redhat.com about 6 years ago

Related to Issue #3767: Unable to save models with relation to Content with changeset added

Actions

Copy link

Updated by amacdona@redhat.com about 6 years ago

Triaged changed from No to Yes

Whatever we come up with here, it might be relevant to take a look at this one also: https://pulp.plan.io/issues/3767

Actions

Copy link

Updated by dalley about 6 years ago

Status changed from NEW to ASSIGNED
Assignee set to dalley

As discussed with Brian and David, first step is to make a flat version of a ContentUnit table and benchmark save() vs bulk_create() to see how much faster it actually is.

Actions

Copy link

Updated by dalley about 6 years ago

Results:

(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 1000
1000 units: indidvidual save in seconds: 5.763491153717041
1000 units: bulk save in seconds: 0.1316215991973877

= 43.7x speedup

(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 5000
5000 units: indidvidual save in seconds: 28.695824146270752
5000 units: bulk save in seconds: 0.41101503372192383

= 69.8x speedup

(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 5000
5000 units: indidvidual save in seconds: 25.684953689575195
5000 units: bulk save in seconds: 0.4922950267791748

= 52.2x speedup

(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 5000
5000 units: indidvidual save in seconds: 25.778226375579834
5000 units: bulk save in seconds: 0.5030674934387207

=51.2x speedup

(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 5000
5000 units: indidvidual save in seconds: 28.895294189453125
5000 units: bulk save in seconds: 0.4287230968475342

=67.4x speedup

(bench) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py --num 10000
10000 units: indidvidual save in seconds: 51.1158390045166
10000 units: bulk save in seconds: 0.9658312797546387

= 52.9x speedup

Code:

https://github.com/dralley/pulp_content_benchmarks/

Actions

Copy link

Updated by bmbouter about 6 years ago

Status changed from ASSIGNED to NEW
Assignee deleted (~~dalley~~)

Setting back to new because I'm not actively working on it.

Actions

Copy link

Updated by bmbouter about 6 years ago

Sprint Candidate changed from No to Yes

I updated the wrong issue, whoops. I think if its in new state though it will show up at sprint planning so maybe this is good.

Actions

Copy link

Updated by daviddavis about 6 years ago

Groomed changed from No to Yes

Actions

Copy link

Updated by jortel@redhat.com about 6 years ago

Looking at the code, the individual saves test is being done with a commit per insert which is the slowest way possible. The bulk create by nature will be doing a single commit. Each commit is very expensive. For the metric to be useful, both tests need to do the same number of commits. Suggest modifying the individual saves test to run in a single transaction.

Actions

Copy link

Updated by dalley about 6 years ago

New content benchmarks:

(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py  --num=1000
1000 multi-table content: individual save in seconds: 6.193058490753174
1000 multi-table content: individual save w/ transaction in seconds: 1.801978588104248
1000 single-table content: individual save w/ transaction in seconds: 1.0102925300598145
1000 single-table content: bulk save in seconds: 0.06991720199584961

(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py  --num=1000
1000 multi-table content: individual save in seconds: 5.5913708209991455
1000 multi-table content: individual save w/ transaction in seconds: 1.853820562362671
1000 single-table content: individual save w/ transaction in seconds: 0.9765522480010986
1000 single-table content: bulk save in seconds: 0.07139015197753906

(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py  --num=5000
5000 multi-table content: individual save in seconds: 25.336352109909058
5000 multi-table content: individual save w/ transaction in seconds: 8.87453031539917
5000 single-table content: individual save w/ transaction in seconds: 4.718692064285278
5000 single-table content: bulk save in seconds: 0.39882731437683105

(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py  --num=5000
5000 multi-table content: individual save in seconds: 28.781572580337524
5000 multi-table content: individual save w/ transaction in seconds: 8.198915958404541
5000 single-table content: individual save w/ transaction in seconds: 4.499735116958618
5000 single-table content: bulk save in seconds: 0.4010334014892578

(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py  --num=10000
10000 multi-table content: individual save in seconds: 57.38842058181763
10000 multi-table content: individual save w/ transaction in seconds: 16.86572265625
10000 single-table content: individual save w/ transaction in seconds: 9.358201503753662
10000 single-table content: bulk save in seconds: 0.8168251514434814

(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py  --num=10000
10000 multi-table content: individual save in seconds: 58.31086325645447
10000 multi-table content: individual save w/ transaction in seconds: 16.490723609924316
10000 single-table content: individual save w/ transaction in seconds: 9.385928392410278
10000 single-table content: bulk save in seconds: 0.827233076095581

(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py  --num=20000
20000 multi-table content: individual save in seconds: 114.19883108139038
20000 multi-table content: individual save w/ transaction in seconds: 34.82098698616028
20000 single-table content: individual save w/ transaction in seconds: 17.453803062438965
20000 single-table content: bulk save in seconds: 1.6993021965026855

(env) [vagrant@pulp3 models_benchmark]$ python3 benchmark.py  --num=20000
20000 multi-table content: individual save in seconds: 115.88576149940491
20000 multi-table content: individual save w/ transaction in seconds: 51.81009030342102
20000 single-table content: individual save w/ transaction in seconds: 24.00934410095215
20000 single-table content: bulk save in seconds: 2.081272840499878

benchmark.py code also updated.

Actions

Copy link

#10

Updated by dkliban@redhat.com about 6 years ago

Sprint set to Sprint 40

Actions

Copy link

#11

Updated by bmbouter about 6 years ago

Status changed from NEW to ASSIGNED
Assignee set to bmbouter

Actions

Copy link

#12

Updated by rchan about 6 years ago

Sprint changed from Sprint 40 to Sprint 41

Actions

Copy link

#13

Updated by jortel@redhat.com about 6 years ago

Subject changed from Pulp3 Content Units are not compatible with bulk_save to Pulp3 Content models are not compatible with bulk_save

Actions

Copy link

#14

Updated by jortel@redhat.com about 6 years ago

Please include a design (change) proposal that specifies:

Proposed Content model hierarchy.
How content within a repository-version will be queried.
How content will be associated to a repository-version.

so that it can be vetted before implementation.

Actions

Copy link

#15

Updated by bmbouter about 6 years ago

This is a good article on MultiTable inheritance https://godjango.com/blog/django-abstract-base-class-multi-table-inheritance/. It identifies that anytime you have a Concrete model inheriting from another concrete Django model you'll get multi-tale inheritance. That is specifically the thing that won't work with bulk_create according to their docs: https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create

So we have to make Content an abstract django model and have any subclasses, e.g. FileContent be the concrete model. This way you will be compatible with bulk_create(). With this change you can no longer make calls like Content.objects. becuase an abstract class does not have a Manager method, e.g. objects.

There are a few places where users or plugin writers treat mixed-type content as one type, e.g. the add_content(), remove_content() calls. These will need to move toward something that has distinct type anytime we deal with Queryset objects which cannot span single-tables.

This diff show the model changes that were necessary. The next step is to port the Querset areas to use a dictionary of Querysets, one per type. For example the content argument to https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/models/repository.py#L337 would become:

{
    'pulp_ansible.AnsibleContent': 'QuerySetobjects',
    'pulp_rpm.UpdateContent': 'QuerySetobjects'
}

Actions

Copy link

#16

Updated by rchan almost 6 years ago

Sprint changed from Sprint 41 to Sprint 42

Actions

Copy link

#17

Updated by rchan almost 6 years ago

Sprint changed from Sprint 42 to Sprint 43

Actions

Copy link

#18

Updated by dalley almost 6 years ago

Assignee changed from bmbouter to dalley

Actions

Copy link

#19

Updated by rchan almost 6 years ago

Sprint changed from Sprint 43 to Sprint 44

Actions

Copy link

#20

Updated by rchan almost 6 years ago

Sprint changed from Sprint 44 to Sprint 45

Actions

Copy link

#21

Updated by dalley over 5 years ago

Status changed from ASSIGNED to CLOSED - WONTFIX
Sprint deleted (~~Sprint 45~~)

Closing as it looks like the model changes are unlikely to move forwards. The API changes will be added in a new story.

Actions

Copy link

#22

Updated by daviddavis over 5 years ago

Sprint/Milestone set to 3.0.0

Actions

Copy link

#23

Updated by bmbouter over 5 years ago

Tags deleted (~~Pulp 3~~)

Actions

Copy link

#24

Updated by bmbouter about 4 years ago

Tags Performance added
Tags deleted (~~Sync Performance~~)

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Pulp

Agile boards

Custom queries

Issue #3812

Pulp3 Content models are not compatible with bulk_save

Motivation¶

Solution¶

Updated by amacdona@redhat.com about 6 years ago

Updated by amacdona@redhat.com about 6 years ago

Updated by dalley about 6 years ago

Updated by dalley about 6 years ago

Updated by bmbouter about 6 years ago

Updated by bmbouter about 6 years ago

Updated by daviddavis about 6 years ago

Updated by jortel@redhat.com about 6 years ago

Updated by dalley about 6 years ago

Updated by dkliban@redhat.com about 6 years ago

Updated by bmbouter about 6 years ago

Updated by rchan about 6 years ago

Updated by jortel@redhat.com about 6 years ago

Updated by jortel@redhat.com about 6 years ago

Updated by bmbouter about 6 years ago

Updated by rchan almost 6 years ago

Updated by rchan almost 6 years ago

Updated by dalley almost 6 years ago

Updated by rchan almost 6 years ago

Updated by rchan almost 6 years ago

Updated by dalley over 5 years ago

Updated by daviddavis over 5 years ago

Updated by bmbouter over 5 years ago

Updated by bmbouter about 4 years ago