Issue #4549: Removing docker manifests from a docker repository takes a long time - Docker Support - Pulp

Actions

Send by e-mail Copy link

Issue #4549

closed

Removing docker manifests from a docker repository takes a long time

Added by jsherril@redhat.com almost 6 years ago. Updated over 5 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

amacdona@redhat.com

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version - Docker:

Platform Release:

2.21.0

Target Release - Docker:

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Tags:

Pulp 2

Sprint:

Sprint 56

Quarter:

Description

Removing all docker manifests from a large docker repo seems to take a long time:

~300 manifests takes ~2 minutes
~2000 manifests takes ~30-40 minutes

Reproducer:

1. Create and sync a docker repo such as: https://quay.io datawire/ambassador
2. Remove all docker manifests from the repo: pulp-admin docker repo remove manifest --repo-id=1-docker-dev-7915f7d0-7a98-4131-9c41-1be7b578d442 --not id=foo

Files

almost-all-man-lists-busybox (210 KB) almost-all-man-lists-busybox

cProfile: Removal of all manifest lists in busybox repository

amacdona@redhat.com, 07/10/2019 12:00 AM

Related issues

Actions

Copy link

Updated by ipanova@redhat.com almost 6 years ago

Triaged changed from No to Yes
Sprint set to Sprint 50

We need to investigate this.

Actions

Copy link

Updated by dkliban@redhat.com almost 6 years ago

Status changed from NEW to ASSIGNED
Assignee set to dkliban@redhat.com

Actions

Copy link

Updated by rchan almost 6 years ago

Sprint changed from Sprint 50 to Sprint 51

Actions

Copy link

Updated by dkliban@redhat.com almost 6 years ago

Status changed from ASSIGNED to NEW
Assignee deleted (~~dkliban@redhat.com~~)

Actions

Copy link

Updated by bmbouter almost 6 years ago

Tags Pulp 2 added

Actions

Copy link

Updated by daviddavis almost 6 years ago

Sprint changed from Sprint 51 to Sprint 52

Actions

Copy link

Updated by rchan over 5 years ago

Sprint changed from Sprint 52 to Sprint 53

Actions

Copy link

Updated by dkliban@redhat.com over 5 years ago

Sprint deleted (~~Sprint 53~~)

Actions

Copy link

#10

Updated by dkliban@redhat.com over 5 years ago

Sprint set to Sprint 55

Actions

Copy link

#11

Updated by amacdona@redhat.com over 5 years ago

Status changed from NEW to ASSIGNED
Assignee set to amacdona@redhat.com

Actions

Copy link

#12

Updated by amacdona@redhat.com over 5 years ago

File almost-all-man-lists-busybox almost-all-man-lists-busybox added

I was able to reproduce this with the busybox repository (I'll make a bigger vm to do this again with datawire/ambassador next week) and the slowdown is already apparent, taking about 2 minutes to remove all the manifest-lists.

I've attached the c_profile of that task. How to view: https://docs.pulpproject.org/dev-guide/debugging.html#analyzing-profiles

Potential workaround:¶

The queries for removing content are much more strenuous than adding content. A potential workaround could be to create a new repository and add only the manifest lists that should remain. If for some reason we can't work out a big improvement, I'll profile the workaround to compare.

Actions

Copy link

#13

Updated by dkliban@redhat.com over 5 years ago

Sprint changed from Sprint 55 to Sprint 56

Actions

Copy link

#14

Updated by amacdona@redhat.com over 5 years ago

Related to Issue #5161: Removing manifest_lists from a repository does not purge all newly unlinked manifests added

Actions

Copy link

#15

Updated by amacdona@redhat.com over 5 years ago

There is a minor bug in current Pulp2 removal which will be fixed by this change.
https://pulp.plan.io/issues/5161

Because of that bug, this fix will include a very minor behavior change. More manifests and blobs will be recursively removed when removing manifest_lists that share manifests. The manifests and blobs that will now be removed would have been inaccessible by tags, so should not affect users.

Actions

Copy link

#16

Updated by bherring over 5 years ago

Copied to Test #5181: Removing docker manifests from a docker repository takes a long time added

Actions

Copy link

#17

Updated by amacdona@redhat.com over 5 years ago

Time improvement¶

Removal of 210 Manifests (unpatched): 84 minutes

Removal of 210 Manifests (patched): 45 seconds
Removal of 2997 Manifests(patched): 1 minute 19 seconds
Removal of 4381 Manifests (patched): 1 minute 30 seconds

Memory Usage¶

This speed improvement comes as a tradeoff for memory usage. Unpatched, the memory usage of the celery process was fixed and did not increase with the size of the call. (This is because pulp worked through the task one content unit at a time.)

With the patch, memory usage will increase with the number of content units specified in the removal call.

Patched removal peak RES memory usage of celery process:

REMOVAL : PEAK RES (k)
30 manifests: 234444
145 manifests: 251908
210 manifests: 260616
2996 manifests: 792260
4381 manifests: 902488

(Note: these numbers can vary based on the specific content. Manifests that reference a higher number of blobs will use more memory, etc.)

Reference tasks: (high memory tasks with docker):
Sync datawire/ambassador (2926 tags): 317572
Resting celery: 75260
COPY 2926 tags: 443820

To reproduce:
Big repos from quay.io: datawire/ambassador, calico/typha
Start (or restart) 1 pulp worker between each task
Use `grep VmHWM /proc/$pid/status` to determine peak RES usage

Conclusion:¶

Because memory usage scales up with removal size (limited only by repo size), there is a theoretical memory problem for arbitrarily large docker repositories. However even using a very large repo (~50,000 content units) , the peak RES memory was below 1 GB. Therefore, my best guess is that this problem will not be a practical concern for real docker repositories in the wild.

Actions

Copy link

#18

Updated by amacdona@redhat.com over 5 years ago

Status changed from ASSIGNED to POST

https://github.com/pulp/pulp_docker/pull/385/

Actions

Copy link

#19

Updated by mmccune@redhat.com over 5 years ago

nice work on the optimization. IMO, the memory consumption for this call is a worthy tradeoff for the performance increases we are getting with your optimization. If you were quoting numbers in the 2-10G consumption I'd be concerned.

Added by amacdona@redhat.com over 5 years ago

Revision 76f5894b | View on GitHub

Flatten queries for content unit removal

https://pulp.plan.io/issues/4549

Replace the recursive pattern with a fixed number of larger queries.

Additionally, reorder the content removal to "top down". This will fail more gracefully; failure leaves orphans (safe) rather than user-facing unlinked content (unsafe). This requires the additional plugin step of removing the explicitly given content, which is normally handled by pulp platform.

A side effect of this change is the correction of a bug that did not remove shared content, even if all linked content is removed. https://pulp.plan.io/issues/5161

fixes #5161 fixes #4549

Added by amacdona@redhat.com over 5 years ago

Revision 76f5894b | View on GitHub

Flatten queries for content unit removal

https://pulp.plan.io/issues/4549

Replace the recursive pattern with a fixed number of larger queries.

A side effect of this change is the correction of a bug that did not remove shared content, even if all linked content is removed. https://pulp.plan.io/issues/5161

fixes #5161 fixes #4549

Actions

Copy link

#20

Updated by amacdona@redhat.com over 5 years ago

Status changed from POST to MODIFIED

Applied in changeset pulp_docker|76f5894b7c593eafc8498d5215cb7e517cd4624b.

Added by bherring over 5 years ago

Revision ecba1db4 | View on GitHub

Refactor of test_remove.py with changes from 4549

With the refactor of the docker importer's remove function to
increase performance, content removal needs to be functional verified.

The cases covered with content post-count verification for all units:

1. Remove all manifest_lists sequentially.
2. Remove all manifests sequentially.
3. Remove all blobs sequentially.
4. Remove all manifest_lists batch.
5. Remove all manifests batch.
6. Remove all blobs batch.
7. Remove some non-shared manifest lists.
8. Remove some non-shared manifest.
9. Remove some shared manifests lists and verify shared units are not
   recursively removed.
10. Remove some shared manifests and verify shared units are not
    recursively removed.

The fixture includes:

* 2 relatively independent manifest lists (no shared manifests,
  no shared blobs between them)
* 2 manifest lists that share some (but not all) manifests, and those
  manifest share some (but not all) blobs. This only requires the creation
  of 1 manifest list that shares some content with one of the first
  “independent manifest lists”.
* 2 relatively independent manifests
* 2 manifests that share (some but not all) blobs

In order to sync the content, each content unit must be recursively related
to at least 1 tag.

refs #4549 #5161

closes #5181

Actions

Copy link

#21

Updated by amacdona@redhat.com over 5 years ago

Applied in changeset 76f5894b7c593eafc8498d5215cb7e517cd4624b.

Actions

Copy link

#22

Updated by dalley over 5 years ago

Platform Release set to 2.21.0

Added by amacdona@redhat.com over 5 years ago

Revision fbc90740 | View on GitHub

Flatten queries for content unit removal

https://pulp.plan.io/issues/4549

Replace the recursive pattern with a fixed number of larger queries.

A side effect of this change is the correction of a bug that did not remove shared content, even if all linked content is removed. https://pulp.plan.io/issues/5161

fixes #5161 fixes #4549

(cherry picked from commit 76f5894b7c593eafc8498d5215cb7e517cd4624b)

Added by amacdona@redhat.com over 5 years ago

Revision fbc90740 | View on GitHub

Flatten queries for content unit removal

https://pulp.plan.io/issues/4549

Replace the recursive pattern with a fixed number of larger queries.

A side effect of this change is the correction of a bug that did not remove shared content, even if all linked content is removed. https://pulp.plan.io/issues/5161

fixes #5161 fixes #4549

(cherry picked from commit 76f5894b7c593eafc8498d5215cb7e517cd4624b)

Actions

Copy link

#23

Updated by amacdona@redhat.com over 5 years ago

Applied in changeset pulp_docker|fbc9074078bf4c324c3820bd3201449410ba95ad.

Actions

Copy link

#24

Updated by dalley over 5 years ago

Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Docker Support

Agile boards

Custom queries

Issue #4549

Removing docker manifests from a docker repository takes a long time

Updated by ipanova@redhat.com almost 6 years ago

Updated by dkliban@redhat.com almost 6 years ago

Updated by rchan almost 6 years ago

Updated by dkliban@redhat.com almost 6 years ago

Updated by bmbouter almost 6 years ago

Updated by daviddavis almost 6 years ago

Updated by rchan over 5 years ago

Updated by dkliban@redhat.com over 5 years ago

Updated by dkliban@redhat.com over 5 years ago

Updated by amacdona@redhat.com over 5 years ago

Updated by amacdona@redhat.com over 5 years ago

Potential workaround:¶

Updated by dkliban@redhat.com over 5 years ago

Updated by amacdona@redhat.com over 5 years ago

Updated by amacdona@redhat.com over 5 years ago

Updated by bherring over 5 years ago

Updated by amacdona@redhat.com over 5 years ago

Time improvement¶

Memory Usage¶

Conclusion:¶

Updated by amacdona@redhat.com over 5 years ago

Updated by mmccune@redhat.com over 5 years ago

Added by amacdona@redhat.com over 5 years ago

Added by amacdona@redhat.com over 5 years ago

Updated by amacdona@redhat.com over 5 years ago

Added by bherring over 5 years ago

Updated by amacdona@redhat.com over 5 years ago

Updated by dalley over 5 years ago

Added by amacdona@redhat.com over 5 years ago

Added by amacdona@redhat.com over 5 years ago

Updated by amacdona@redhat.com over 5 years ago

Updated by dalley over 5 years ago