Project

Profile

Help

Issue #3813

File Support - Issue #3770: Pulp 3 is about 2x slower than pulp 2 in syncing a large file repo

Pulp3 Artifacts are not compatible with bulk_save

Added by bmbouter over 1 year ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Documentation, Sync Performance
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 40

Description

Motivation

Artifacts are not compatible with bulk_save() because they have a save() method which does not get called.

Solution

Add in documentation that plugin writers can use bulk_save with Artifacts, but they must have the file field be a path to the Artifact already in-place in the storage backend (content addressable storage).

artifact_test.py (1.73 KB) generate and save many Artifacts using bulk-create and individual save() bmbouter, 07/09/2018 10:59 PM artifact_test.py

Associated revisions

Revision 80bfd474 View on GitHub
Added by bmbouter about 1 year ago

Adds docs on using Artifact with bulk_create()

https://pulp.plan.io/issues/3813
closes #3813

Revision 80bfd474 View on GitHub
Added by bmbouter about 1 year ago

Adds docs on using Artifact with bulk_create()

https://pulp.plan.io/issues/3813
closes #3813

Revision 80bfd474 View on GitHub
Added by bmbouter about 1 year ago

Adds docs on using Artifact with bulk_create()

https://pulp.plan.io/issues/3813
closes #3813

History

#1 Updated by CodeHeeler over 1 year ago

  • Triaged changed from No to Yes

#2 Updated by bmbouter over 1 year ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter

As discussed with Daniel and David, the first step is to manually move the files into place and calling bulk_save a batch of 1K Artifacts and see how much faster it is than sequential save of 1K Artifacts.

#3 Updated by bmbouter over 1 year ago

The bulk-create option is 15-20x faster I uploaded a script I used to test it with. Here are some results:

(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 10000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
10000 units: indidvidual save in seconds: 47.14567446708679
10000 units: bulk save in seconds: 3.3910677433013916
(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 1000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
1000 units: indidvidual save in seconds: 4.659780025482178
1000 units: bulk save in seconds: 0.25438880920410156
(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 1000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
1000 units: indidvidual save in seconds: 4.8165528774261475
1000 units: bulk save in seconds: 0.22277045249938965

#4 Updated by bmbouter over 1 year ago

  • Description updated (diff)

#5 Updated by bmbouter over 1 year ago

  • Tags Documentation added

#6 Updated by daviddavis over 1 year ago

  • Groomed changed from No to Yes

#7 Updated by bmbouter over 1 year ago

  • Sprint Candidate changed from No to Yes

#8 Updated by bmbouter over 1 year ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (bmbouter)

Setting back to NEW so it can go through sprint planning.

#9 Updated by bmbouter over 1 year ago

In talking with @jortel, we wanted to see what individual saves do when they are all in one database transaction. Here is an updated output:

1000 units: indidvidual save in seconds: 5.1227333545684814
1000 units: indidvidual save, 1 transaction in seconds: 2.4544191360473633
1000 units: bulk save in seconds: 0.39525651931762695

This shows the bulk_save providing a speedup = 6.2x speedup over the individual save with 1 transaction.

#10 Updated by dalley over 1 year ago

My results:

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=1000
1000 artifacts: individual save in seconds: 4.238337755203247
1000 artifacts: individual save w/ single transaction in seconds: 1.3964695930480957
1000 artifacts: bulk save in seconds: 0.17695045471191406

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=1000
1000 artifacts: individual save in seconds: 5.350154876708984
1000 artifacts: individual save w/ single transaction in seconds: 1.6045520305633545
1000 artifacts: bulk save in seconds: 0.22559666633605957

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=5000
5000 artifacts: individual save in seconds: 24.522949934005737
5000 artifacts: individual save w/ single transaction in seconds: 7.073785781860352
5000 artifacts: bulk save in seconds: 0.9437673091888428

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=5000
5000 artifacts: individual save in seconds: 23.55579137802124
5000 artifacts: individual save w/ single transaction in seconds: 7.318872451782227
5000 artifacts: bulk save in seconds: 1.001615285873413

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=10000
10000 artifacts: individual save in seconds: 58.14913988113403
10000 artifacts: individual save w/ single transaction in seconds: 14.900865077972412
10000 artifacts: bulk save in seconds: 2.214111566543579

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=10000
10000 artifacts: individual save in seconds: 57.241849422454834
10000 artifacts: individual save w/ single transaction in seconds: 15.238527297973633
10000 artifacts: bulk save in seconds: 2.1400439739227295

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=20000
20000 artifacts: individual save in seconds: 97.95429253578186
20000 artifacts: individual save w/ single transaction in seconds: 32.283968925476074
20000 artifacts: bulk save in seconds: 4.602828502655029

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=20000
20000 artifacts: individual save in seconds: 90.98556280136108
20000 artifacts: individual save w/ single transaction in seconds: 28.93546438217163
20000 artifacts: bulk save in seconds: 4.149221420288086

All using the same model as currently in Pulp. Django doesn't see it as a "multi-table" model despite that it uses inheritance, I guess it's a bit more subtle than that.

#11 Updated by dkliban@redhat.com over 1 year ago

  • Sprint set to Sprint 40

#12 Updated by bmbouter about 1 year ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter

#13 Updated by bmbouter about 1 year ago

  • Status changed from ASSIGNED to POST

#14 Updated by bmbouter about 1 year ago

  • Status changed from POST to MODIFIED

#15 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#16 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF