Project

Profile

Help

Issue #3813

closed

File Support - Issue #3770: Pulp 3 is about 2x slower than pulp 2 in syncing a large file repo

Pulp3 Artifacts are not compatible with bulk_save

Added by bmbouter over 5 years ago. Updated almost 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Documentation, Performance
Sprint:
Sprint 40
Quarter:

Description

Motivation

Artifacts are not compatible with bulk_save() because they have a save() method which does not get called.

Solution

Add in documentation that plugin writers can use bulk_save with Artifacts, but they must have the file field be a path to the Artifact already in-place in the storage backend (content addressable storage).


Files

artifact_test.py (1.73 KB) artifact_test.py generate and save many Artifacts using bulk-create and individual save() bmbouter, 07/09/2018 10:59 PM
Actions #1

Updated by CodeHeeler over 5 years ago

  • Triaged changed from No to Yes
Actions #2

Updated by bmbouter over 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter

As discussed with Daniel and David, the first step is to manually move the files into place and calling bulk_save a batch of 1K Artifacts and see how much faster it is than sequential save of 1K Artifacts.

Actions #3

Updated by bmbouter over 5 years ago

The bulk-create option is 15-20x faster I uploaded a script I used to test it with. Here are some results:

(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 10000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
10000 units: indidvidual save in seconds: 47.14567446708679
10000 units: bulk save in seconds: 3.3910677433013916
(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 1000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
1000 units: indidvidual save in seconds: 4.659780025482178
1000 units: bulk save in seconds: 0.25438880920410156
(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 1000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
1000 units: indidvidual save in seconds: 4.8165528774261475
1000 units: bulk save in seconds: 0.22277045249938965
Actions #4

Updated by bmbouter over 5 years ago

  • Description updated (diff)
Actions #5

Updated by bmbouter over 5 years ago

  • Tags Documentation added
Actions #6

Updated by daviddavis over 5 years ago

  • Groomed changed from No to Yes
Actions #7

Updated by bmbouter over 5 years ago

  • Sprint Candidate changed from No to Yes
Actions #8

Updated by bmbouter over 5 years ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (bmbouter)

Setting back to NEW so it can go through sprint planning.

Actions #9

Updated by bmbouter over 5 years ago

In talking with @jortel, we wanted to see what individual saves do when they are all in one database transaction. Here is an updated output:

1000 units: indidvidual save in seconds: 5.1227333545684814
1000 units: indidvidual save, 1 transaction in seconds: 2.4544191360473633
1000 units: bulk save in seconds: 0.39525651931762695

This shows the bulk_save providing a speedup = 6.2x speedup over the individual save with 1 transaction.

Actions #10

Updated by dalley over 5 years ago

My results:

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=1000
1000 artifacts: individual save in seconds: 4.238337755203247
1000 artifacts: individual save w/ single transaction in seconds: 1.3964695930480957
1000 artifacts: bulk save in seconds: 0.17695045471191406

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=1000
1000 artifacts: individual save in seconds: 5.350154876708984
1000 artifacts: individual save w/ single transaction in seconds: 1.6045520305633545
1000 artifacts: bulk save in seconds: 0.22559666633605957

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=5000
5000 artifacts: individual save in seconds: 24.522949934005737
5000 artifacts: individual save w/ single transaction in seconds: 7.073785781860352
5000 artifacts: bulk save in seconds: 0.9437673091888428

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=5000
5000 artifacts: individual save in seconds: 23.55579137802124
5000 artifacts: individual save w/ single transaction in seconds: 7.318872451782227
5000 artifacts: bulk save in seconds: 1.001615285873413

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=10000
10000 artifacts: individual save in seconds: 58.14913988113403
10000 artifacts: individual save w/ single transaction in seconds: 14.900865077972412
10000 artifacts: bulk save in seconds: 2.214111566543579

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=10000
10000 artifacts: individual save in seconds: 57.241849422454834
10000 artifacts: individual save w/ single transaction in seconds: 15.238527297973633
10000 artifacts: bulk save in seconds: 2.1400439739227295

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=20000
20000 artifacts: individual save in seconds: 97.95429253578186
20000 artifacts: individual save w/ single transaction in seconds: 32.283968925476074
20000 artifacts: bulk save in seconds: 4.602828502655029

(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py  --num=20000
20000 artifacts: individual save in seconds: 90.98556280136108
20000 artifacts: individual save w/ single transaction in seconds: 28.93546438217163
20000 artifacts: bulk save in seconds: 4.149221420288086

All using the same model as currently in Pulp. Django doesn't see it as a "multi-table" model despite that it uses inheritance, I guess it's a bit more subtle than that.

Actions #11

Updated by dkliban@redhat.com over 5 years ago

  • Sprint set to Sprint 40
Actions #12

Updated by bmbouter over 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter
Actions #13

Updated by bmbouter over 5 years ago

  • Status changed from ASSIGNED to POST

Added by bmbouter over 5 years ago

Revision 80bfd474 | View on GitHub

Adds docs on using Artifact with bulk_create()

https://pulp.plan.io/issues/3813 closes #3813

Added by bmbouter over 5 years ago

Revision 80bfd474 | View on GitHub

Adds docs on using Artifact with bulk_create()

https://pulp.plan.io/issues/3813 closes #3813

Actions #14

Updated by bmbouter over 5 years ago

  • Status changed from POST to MODIFIED
Actions #15

Updated by daviddavis almost 5 years ago

  • Sprint/Milestone set to 3.0.0
Actions #16

Updated by bmbouter almost 5 years ago

  • Tags deleted (Pulp 3)
Actions #17

Updated by bmbouter over 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #18

Updated by bmbouter almost 4 years ago

  • Tags Performance added
  • Tags deleted (Sync Performance)

Also available in: Atom PDF