Project

Profile

Help

Issue #8151

return toomanyrequesrs error code during podman push/pull if the resource is locked

Added by ipanova@redhat.com 9 months ago. Updated 6 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
No
Tags:
GalaxyNG
Sprint:
Sprint 93
Quarter:

Description

https://docs.docker.com/registry/spec/api/#errors-2

It can happen that resources we are referencing during pull/push calls are locked by a pulp task- Namespace, Distribution, Repo.

In that case we should return a tomanyrequests 429 error code.

If this won't work out - then we should look into how to acquire locks/release them outside of task during pull/push operations.

Associated revisions

Revision af01b442 View on GitHub
Added by mdellweg 7 months ago

Add 429 polling to BlobUpload commit

fixes #8151 https://pulp.plan.io/issues/8151

Revision af01b442 View on GitHub
Added by mdellweg 7 months ago

Add 429 polling to BlobUpload commit

fixes #8151 https://pulp.plan.io/issues/8151

History

#1 Updated by ipanova@redhat.com 9 months ago

  • Subject changed from return DENIED error code during podman push/pull if the resource is locked to return toomanyrequesrs error code during podman push/pull if the resource is locked
  • Description updated (diff)

https://github.com/pulp/pulp_container/pull/210 https://github.com/ipanova/pulp_container/commit/fda2c568151acb95973d36b5a018ca818f321023 this poc is still error prone. During podman push, we should try creating a task when creating new repo_version or other objects like distribution and issue a 429 to the client so it will come back later. If the task is not finished by that time issue another 429

#2 Updated by ipanova@redhat.com 9 months ago

  • Sprint/Milestone changed from 2.3.0 to 2.4.0

#3 Updated by ipanova@redhat.com 8 months ago

  • Triaged changed from No to Yes

#4 Updated by ipanova@redhat.com 8 months ago

  • Tags GalaxyNG added

#5 Updated by mdellweg 8 months ago

The currently decided upon approach is to:

  1. Schedule a task
  2. Wait a short (ideally < 1 sec) amount of time
  3. If the task finished, return the result to the client
  4. If the task didn't finish, issue 429 with a reference to the task uuid to the client
  5. When the client comes back with a task id, restart at 3.

Design foundations: It is important to not wait long, because this will stall the pulp-api workers. It is valuable to not always, issue 429, because this will result in the client sending everything at least twice.

We should look into types of redirects, to see if we can find a way to trick the client into lightweight task polling.

As a second objective, we will try to find a way to acquire non-blocking now-or-never locks on resources, and maybe keep the solution outlined above as a fallback.

#6 Updated by bmbouter 8 months ago

mdellweg, this all looks good. Good writeup.

The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.

#7 Updated by mdellweg 8 months ago

So we could redirect the client from the POST/PUT call to a task polling url, that returns 429 until it's done, right?

#8 Updated by ipanova@redhat.com 8 months ago

bmbouter wrote:

mdellweg, this all looks good. Good writeup.

The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.

It's the retry-after header that tells to the client how long to wait before issuing following requests to the server. This header can be present both on 429 responses as well as on redirects.

https://tools.ietf.org/html/rfc6585#section-4 https://tools.ietf.org/html/rfc7231#section-7.1.3

The only caveat in this header is not all the clients may take it into account so we should check whether podman/docker do. https://github.com/containers/podman/blob/master/vendor/github.com/containers/image/v5/docker/docker_client.go#L463 https://github.com/containers/image/blob/master/docker/docker_client.go#L463 https://www.openshift.com/blog/mitigate-impact-of-docker-hub-pull-request-limits

#9 Updated by mdellweg 8 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to mdellweg

#10 Updated by ipanova@redhat.com 8 months ago

  • Sprint set to Sprint 91

#11 Updated by ipanova@redhat.com 8 months ago

  • Groomed changed from No to Yes

#12 Updated by mdellweg 8 months ago

My experiments so far:

  • We can redirect the client to the same endpoint, which will result in a GET call after the PUT.
  • "Retry-After" is only defined for 301, 429 and 503
  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

Together, these results do not allow us to proceed in the way outlined.

#13 Updated by ipanova@redhat.com 8 months ago

mdellweg wrote:

My experiments so far:

Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects

  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

how do we pass the task uuid reference then?

Together, these results do not allow us to proceed in the way outlined.

#14 Updated by mdellweg 8 months ago

wrote:

mdellweg wrote:

My experiments so far:

Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects

  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

how do we pass the task uuid reference then? That is the big problem, as 429 is not a redirect.

Together, these results do not allow us to proceed in the way outlined.

Update; 303 (See other) has the same result as the other redirects

#15 Updated by pulpbot 8 months ago

  • Status changed from ASSIGNED to POST

#16 Updated by rchan 8 months ago

  • Sprint changed from Sprint 91 to Sprint 92

#17 Updated by ipanova@redhat.com 7 months ago

  • Sprint/Milestone changed from 2.4.0 to 2.5.0

#18 Updated by rchan 7 months ago

  • Sprint changed from Sprint 92 to Sprint 93

#19 Updated by mdellweg 7 months ago

New update: It turns out, the podman only retried for the blob uploads, and not the tagging operation. And docker never retries at all. So this approach does not seem to work out.

#20 Updated by mdellweg 7 months ago

  • Status changed from POST to MODIFIED

#21 Updated by pulpbot 6 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Please register to edit this issue

Also available in: Atom PDF