Actions
Task #2466
closedRemove unnecessary `deepcopy` calls for sync
Start date:
Due date:
% Done:
100%
Estimated time:
Platform Release:
2.11.1
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 12
Quarter:
Description
After some profiling for sync operation the following possible performance improvements were determined:
- do not add to catalog and re-associate units (if not necessary) to decrease number of writes to db
- avoid additional writes to db when no new errata collections were introduced (all errata will still be updated on every sync)
- do not use `deepcopy` during primary.xml processing
First two are fixed by issue #2457.
It depends but on one setup for the repo with 14K rpms and 3.5K errata sync took ~20% less than before in terms of time with the improvements above. 5% goes to Story #1: As a user, I can have Pulp attempt use auto_retry application wide using the 'unsafe_autoretry' parameter, another 5% goes to Story #2: As a user, my rpm sync finishes quickly when upstream metadata hasn't changed and 10% - to deepcopy, Task #3: Make pulp_puppet publish use step processing framework.
NOTE: When one triggers sync via API call or CLI, the following happens:
- Sync task is scheduled and later retrieved from a queue by worker.
- Sync task is executed.
- By default auto-publish is enabled, so in most cases Publish task is scheduled and later retrieved from a queue by worker.
- Publish task is executed.
All the improvements described in this issue are only about Story #2: As a user, my rpm sync finishes quickly when upstream metadata hasn't changed.
Related issues
Actions
Stop using deepcopy in primary.xml processing to speed up sync
closes #2466 https://pulp.plan.io/issues/2466