Issue #3813
closedFile Support - Issue #3770: Pulp 3 is about 2x slower than pulp 2 in syncing a large file repo
Pulp3 Artifacts are not compatible with bulk_save
Description
Motivation¶
Artifacts are not compatible with bulk_save() because they have a save() method which does not get called.
Solution¶
Add in documentation that plugin writers can use bulk_save with Artifacts, but they must have the file
field be a path to the Artifact already in-place in the storage backend (content addressable storage).
Files
Updated by bmbouter over 6 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to bmbouter
As discussed with Daniel and David, the first step is to manually move the files into place and calling bulk_save a batch of 1K Artifacts and see how much faster it is than sequential save of 1K Artifacts.
Updated by bmbouter over 6 years ago
- File artifact_test.py artifact_test.py added
The bulk-create option is 15-20x faster I uploaded a script I used to test it with. Here are some results:
(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 10000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
10000 units: indidvidual save in seconds: 47.14567446708679
10000 units: bulk save in seconds: 3.3910677433013916
(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 1000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
1000 units: indidvidual save in seconds: 4.659780025482178
1000 units: bulk save in seconds: 0.25438880920410156
(pulp) [vagrant@pulp3 devel]$ python artifact_test.py --num 1000
/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
1000 units: indidvidual save in seconds: 4.8165528774261475
1000 units: bulk save in seconds: 0.22277045249938965
Updated by bmbouter over 6 years ago
- Status changed from ASSIGNED to NEW
- Assignee deleted (
bmbouter)
Setting back to NEW so it can go through sprint planning.
Updated by bmbouter over 6 years ago
In talking with @jortel, we wanted to see what individual saves do when they are all in one database transaction. Here is an updated output:
1000 units: indidvidual save in seconds: 5.1227333545684814
1000 units: indidvidual save, 1 transaction in seconds: 2.4544191360473633
1000 units: bulk save in seconds: 0.39525651931762695
This shows the bulk_save providing a speedup = 6.2x speedup over the individual save with 1 transaction.
Updated by dalley over 6 years ago
My results:
(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py --num=1000
1000 artifacts: individual save in seconds: 4.238337755203247
1000 artifacts: individual save w/ single transaction in seconds: 1.3964695930480957
1000 artifacts: bulk save in seconds: 0.17695045471191406
(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py --num=1000
1000 artifacts: individual save in seconds: 5.350154876708984
1000 artifacts: individual save w/ single transaction in seconds: 1.6045520305633545
1000 artifacts: bulk save in seconds: 0.22559666633605957
(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py --num=5000
5000 artifacts: individual save in seconds: 24.522949934005737
5000 artifacts: individual save w/ single transaction in seconds: 7.073785781860352
5000 artifacts: bulk save in seconds: 0.9437673091888428
(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py --num=5000
5000 artifacts: individual save in seconds: 23.55579137802124
5000 artifacts: individual save w/ single transaction in seconds: 7.318872451782227
5000 artifacts: bulk save in seconds: 1.001615285873413
(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py --num=10000
10000 artifacts: individual save in seconds: 58.14913988113403
10000 artifacts: individual save w/ single transaction in seconds: 14.900865077972412
10000 artifacts: bulk save in seconds: 2.214111566543579
(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py --num=10000
10000 artifacts: individual save in seconds: 57.241849422454834
10000 artifacts: individual save w/ single transaction in seconds: 15.238527297973633
10000 artifacts: bulk save in seconds: 2.1400439739227295
(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py --num=20000
20000 artifacts: individual save in seconds: 97.95429253578186
20000 artifacts: individual save w/ single transaction in seconds: 32.283968925476074
20000 artifacts: bulk save in seconds: 4.602828502655029
(env) [vagrant@pulp3 models_benchmark]$ python3 artifact_test.py --num=20000
20000 artifacts: individual save in seconds: 90.98556280136108
20000 artifacts: individual save w/ single transaction in seconds: 28.93546438217163
20000 artifacts: bulk save in seconds: 4.149221420288086
All using the same model as currently in Pulp. Django doesn't see it as a "multi-table" model despite that it uses inheritance, I guess it's a bit more subtle than that.
Updated by bmbouter over 6 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to bmbouter
Updated by bmbouter over 6 years ago
- Status changed from ASSIGNED to POST
PR available at: https://github.com/pulp/pulp/pull/3568
Added by bmbouter over 6 years ago
Added by bmbouter over 6 years ago
Revision 80bfd474 | View on GitHub
Adds docs on using Artifact with bulk_create()
Updated by bmbouter over 6 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp|80bfd474c9160ac193db3f6f745d153bac0592fe.
Updated by bmbouter almost 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Updated by bmbouter over 4 years ago
- Tags Performance added
- Tags deleted (
Sync Performance)
Adds docs on using Artifact with bulk_create()
https://pulp.plan.io/issues/3813 closes #3813