Project

Profile

Help

Issue #8151

return toomanyrequesrs error code during podman push/pull if the resource is locked

Added by ipanova@redhat.com about 1 month ago. Updated 2 days ago.

Status:
ASSIGNED
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
No
Tags:
GalaxyNG
Sprint:
Sprint 91
Quarter:

Description

https://docs.docker.com/registry/spec/api/#errors-2

It can happen that resources we are referencing during pull/push calls are locked by a pulp task- Namespace, Distribution, Repo.

In that case we should return a tomanyrequests 429 error code.

If this won't work out - then we should look into how to acquire locks/release them outside of task during pull/push operations.

History

#1 Updated by ipanova@redhat.com about 1 month ago

  • Subject changed from return DENIED error code during podman push/pull if the resource is locked to return toomanyrequesrs error code during podman push/pull if the resource is locked
  • Description updated (diff)

https://github.com/pulp/pulp_container/pull/210 https://github.com/ipanova/pulp_container/commit/fda2c568151acb95973d36b5a018ca818f321023 this poc is still error prone. During podman push, we should try creating a task when creating new repo_version or other objects like distribution and issue a 429 to the client so it will come back later. If the task is not finished by that time issue another 429

#2 Updated by ipanova@redhat.com about 1 month ago

  • Sprint/Milestone changed from 2.3.0 to 2.4.0

#3 Updated by ipanova@redhat.com 17 days ago

  • Triaged changed from No to Yes

#4 Updated by ipanova@redhat.com 16 days ago

  • Tags GalaxyNG added

#5 Updated by mdellweg 7 days ago

The currently decided upon approach is to:

  1. Schedule a task
  2. Wait a short (ideally < 1 sec) amount of time
  3. If the task finished, return the result to the client
  4. If the task didn't finish, issue 429 with a reference to the task uuid to the client
  5. When the client comes back with a task id, restart at 3.

Design foundations: It is important to not wait long, because this will stall the pulp-api workers. It is valuable to not always, issue 429, because this will result in the client sending everything at least twice.

We should look into types of redirects, to see if we can find a way to trick the client into lightweight task polling.

As a second objective, we will try to find a way to acquire non-blocking now-or-never locks on resources, and maybe keep the solution outlined above as a fallback.

#6 Updated by bmbouter 7 days ago

mdellweg, this all looks good. Good writeup.

The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.

#7 Updated by mdellweg 3 days ago

So we could redirect the client from the POST/PUT call to a task polling url, that returns 429 until it's done, right?

#8 Updated by ipanova@redhat.com 3 days ago

bmbouter wrote:

mdellweg, this all looks good. Good writeup.

The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.

It's the retry-after header that tells to the client how long to wait before issuing following requests to the server. This header can be present both on 429 responses as well as on redirects.

https://tools.ietf.org/html/rfc6585#section-4 https://tools.ietf.org/html/rfc7231#section-7.1.3

The only caveat in this header is not all the clients may take it into account so we should check whether podman/docker do. https://github.com/containers/podman/blob/master/vendor/github.com/containers/image/v5/docker/docker_client.go#L463 https://github.com/containers/image/blob/master/docker/docker_client.go#L463 https://www.openshift.com/blog/mitigate-impact-of-docker-hub-pull-request-limits

#9 Updated by mdellweg 3 days ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to mdellweg

#10 Updated by ipanova@redhat.com 3 days ago

  • Sprint set to Sprint 91

#11 Updated by ipanova@redhat.com 3 days ago

  • Groomed changed from No to Yes

#12 Updated by mdellweg 2 days ago

My experiments so far:

  • We can redirect the client to the same endpoint, which will result in a GET call after the PUT.
  • "Retry-After" is only defined for 301, 429 and 503
  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

Together, these results do not allow us to proceed in the way outlined.

#13 Updated by ipanova@redhat.com 2 days ago

mdellweg wrote:

My experiments so far:

Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects

  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

how do we pass the task uuid reference then?

Together, these results do not allow us to proceed in the way outlined.

#14 Updated by mdellweg 2 days ago

wrote:

mdellweg wrote:

My experiments so far:

Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects

  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

how do we pass the task uuid reference then? That is the big problem, as 429 is not a redirect.

Together, these results do not allow us to proceed in the way outlined.

Update; 303 (See other) has the same result as the other redirects

Please register to edit this issue

Also available in: Atom PDF