Project

Profile

Help

Issue #8151

closed

return toomanyrequesrs error code during podman push/pull if the resource is locked

Added by ipanova@redhat.com over 3 years ago. Updated about 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Platform Release:
OS:
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
No
Tags:
GalaxyNG
Sprint:
Sprint 93
Quarter:

Description

https://docs.docker.com/registry/spec/api/#errors-2

It can happen that resources we are referencing during pull/push calls are locked by a pulp task- Namespace, Distribution, Repo.

In that case we should return a tomanyrequests 429 error code.

If this won't work out - then we should look into how to acquire locks/release them outside of task during pull/push operations.

Actions #1

Updated by ipanova@redhat.com about 3 years ago

  • Subject changed from return DENIED error code during podman push/pull if the resource is locked to return toomanyrequesrs error code during podman push/pull if the resource is locked
  • Description updated (diff)

https://github.com/pulp/pulp_container/pull/210 https://github.com/ipanova/pulp_container/commit/fda2c568151acb95973d36b5a018ca818f321023 this poc is still error prone. During podman push, we should try creating a task when creating new repo_version or other objects like distribution and issue a 429 to the client so it will come back later. If the task is not finished by that time issue another 429

Actions #2

Updated by ipanova@redhat.com about 3 years ago

  • Sprint/Milestone changed from 2.3.0 to 2.4.0
Actions #3

Updated by ipanova@redhat.com about 3 years ago

  • Triaged changed from No to Yes
Actions #4

Updated by ipanova@redhat.com about 3 years ago

  • Tags GalaxyNG added
Actions #5

Updated by mdellweg about 3 years ago

The currently decided upon approach is to:

  1. Schedule a task
  2. Wait a short (ideally < 1 sec) amount of time
  3. If the task finished, return the result to the client
  4. If the task didn't finish, issue 429 with a reference to the task uuid to the client
  5. When the client comes back with a task id, restart at 3.

Design foundations: It is important to not wait long, because this will stall the pulp-api workers. It is valuable to not always, issue 429, because this will result in the client sending everything at least twice.

We should look into types of redirects, to see if we can find a way to trick the client into lightweight task polling.

As a second objective, we will try to find a way to acquire non-blocking now-or-never locks on resources, and maybe keep the solution outlined above as a fallback.

Actions #6

Updated by bmbouter about 3 years ago

mdellweg, this all looks good. Good writeup.

The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.

Actions #7

Updated by mdellweg about 3 years ago

So we could redirect the client from the POST/PUT call to a task polling url, that returns 429 until it's done, right?

Actions #8

Updated by ipanova@redhat.com about 3 years ago

bmbouter wrote:

mdellweg, this all looks good. Good writeup.

The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.

It's the retry-after header that tells to the client how long to wait before issuing following requests to the server. This header can be present both on 429 responses as well as on redirects.

https://tools.ietf.org/html/rfc6585#section-4 https://tools.ietf.org/html/rfc7231#section-7.1.3

The only caveat in this header is not all the clients may take it into account so we should check whether podman/docker do. https://github.com/containers/podman/blob/master/vendor/github.com/containers/image/v5/docker/docker_client.go#L463 https://github.com/containers/image/blob/master/docker/docker_client.go#L463 https://www.openshift.com/blog/mitigate-impact-of-docker-hub-pull-request-limits

Actions #9

Updated by mdellweg about 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to mdellweg
Actions #10

Updated by ipanova@redhat.com about 3 years ago

  • Sprint set to Sprint 91
Actions #11

Updated by ipanova@redhat.com about 3 years ago

  • Groomed changed from No to Yes
Actions #12

Updated by mdellweg about 3 years ago

My experiments so far:

  • We can redirect the client to the same endpoint, which will result in a GET call after the PUT.
  • "Retry-After" is only defined for 301, 429 and 503
  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

Together, these results do not allow us to proceed in the way outlined.

Actions #13

Updated by ipanova@redhat.com about 3 years ago

mdellweg wrote:

My experiments so far:

Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects

  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

how do we pass the task uuid reference then?

Together, these results do not allow us to proceed in the way outlined.

Actions #14

Updated by mdellweg about 3 years ago

wrote:

mdellweg wrote:

My experiments so far:

Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects

  • podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
  • when issued 429, podman goes all the way back and repeats the PUT call

how do we pass the task uuid reference then? That is the big problem, as 429 is not a redirect.

Together, these results do not allow us to proceed in the way outlined.

Update; 303 (See other) has the same result as the other redirects

Actions #15

Updated by pulpbot about 3 years ago

  • Status changed from ASSIGNED to POST
Actions #16

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 91 to Sprint 92
Actions #17

Updated by ipanova@redhat.com about 3 years ago

  • Sprint/Milestone changed from 2.4.0 to 2.5.0
Actions #18

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 92 to Sprint 93
Actions #19

Updated by mdellweg about 3 years ago

New update: It turns out, the podman only retried for the blob uploads, and not the tagging operation. And docker never retries at all. So this approach does not seem to work out.

Added by mdellweg about 3 years ago

Revision af01b442 | View on GitHub

Add 429 polling to BlobUpload commit

fixes #8151 https://pulp.plan.io/issues/8151

Added by mdellweg about 3 years ago

Revision af01b442 | View on GitHub

Add 429 polling to BlobUpload commit

fixes #8151 https://pulp.plan.io/issues/8151

Actions #20

Updated by mdellweg about 3 years ago

  • Status changed from POST to MODIFIED
Actions #21

Updated by pulpbot about 3 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF