Issue #8151
closedreturn toomanyrequesrs error code during podman push/pull if the resource is locked
Added by ipanova@redhat.com almost 4 years ago. Updated over 3 years ago.
Description
https://docs.docker.com/registry/spec/api/#errors-2
It can happen that resources we are referencing during pull/push calls are locked by a pulp task- Namespace, Distribution, Repo.
In that case we should return a tomanyrequests 429 error code.
If this won't work out - then we should look into how to acquire locks/release them outside of task during pull/push operations.
Updated by ipanova@redhat.com almost 4 years ago
- Subject changed from return DENIED error code during podman push/pull if the resource is locked to return toomanyrequesrs error code during podman push/pull if the resource is locked
- Description updated (diff)
https://github.com/pulp/pulp_container/pull/210 https://github.com/ipanova/pulp_container/commit/fda2c568151acb95973d36b5a018ca818f321023 this poc is still error prone. During podman push, we should try creating a task when creating new repo_version or other objects like distribution and issue a 429 to the client so it will come back later. If the task is not finished by that time issue another 429
Updated by ipanova@redhat.com almost 4 years ago
- Sprint/Milestone changed from 2.3.0 to 2.4.0
Updated by mdellweg almost 4 years ago
The currently decided upon approach is to:
- Schedule a task
- Wait a short (ideally < 1 sec) amount of time
- If the task finished, return the result to the client
- If the task didn't finish, issue 429 with a reference to the task uuid to the client
- When the client comes back with a task id, restart at 3.
Design foundations: It is important to not wait long, because this will stall the pulp-api workers. It is valuable to not always, issue 429, because this will result in the client sending everything at least twice.
We should look into types of redirects, to see if we can find a way to trick the client into lightweight task polling.
As a second objective, we will try to find a way to acquire non-blocking now-or-never locks on resources, and maybe keep the solution outlined above as a fallback.
Updated by bmbouter almost 4 years ago
mdellweg, this all looks good. Good writeup.
The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.
Updated by mdellweg almost 4 years ago
So we could redirect the client from the POST/PUT call to a task polling url, that returns 429 until it's done, right?
Updated by ipanova@redhat.com almost 4 years ago
bmbouter wrote:
mdellweg, this all looks good. Good writeup.
The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.
It's the retry-after
header that tells to the client how long to wait before issuing following requests to the server. This header can be present both on 429 responses as well as on redirects.
https://tools.ietf.org/html/rfc6585#section-4 https://tools.ietf.org/html/rfc7231#section-7.1.3
The only caveat in this header is not all the clients may take it into account so we should check whether podman/docker do. https://github.com/containers/podman/blob/master/vendor/github.com/containers/image/v5/docker/docker_client.go#L463 https://github.com/containers/image/blob/master/docker/docker_client.go#L463 https://www.openshift.com/blog/mitigate-impact-of-docker-hub-pull-request-limits
Updated by mdellweg almost 4 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to mdellweg
Updated by ipanova@redhat.com almost 4 years ago
- Groomed changed from No to Yes
Updated by mdellweg almost 4 years ago
My experiments so far:
- We can redirect the client to the same endpoint, which will result in a GET call after the PUT.
- "Retry-After" is only defined for 301, 429 and 503
- podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
- when issued 429, podman goes all the way back and repeats the PUT call
Together, these results do not allow us to proceed in the way outlined.
Updated by ipanova@redhat.com almost 4 years ago
mdellweg wrote:
My experiments so far:
- We can redirect the client to the same endpoint, which will result in a GET call after the PUT.
- "Retry-After" is only defined for 301, 429 and 503
Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects
- podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
- when issued 429, podman goes all the way back and repeats the PUT call
how do we pass the task uuid reference then?
Together, these results do not allow us to proceed in the way outlined.
Updated by mdellweg almost 4 years ago
ipanova@redhat.com wrote:
mdellweg wrote:
My experiments so far:
- We can redirect the client to the same endpoint, which will result in a GET call after the PUT.
- "Retry-After" is only defined for 301, 429 and 503
Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects
- podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
- when issued 429, podman goes all the way back and repeats the PUT call
how do we pass the task uuid reference then? That is the big problem, as 429 is not a redirect.
Together, these results do not allow us to proceed in the way outlined.
Update; 303 (See other) has the same result as the other redirects
Updated by pulpbot over 3 years ago
- Status changed from ASSIGNED to POST
Updated by ipanova@redhat.com over 3 years ago
- Sprint/Milestone changed from 2.4.0 to 2.5.0
Updated by mdellweg over 3 years ago
New update: It turns out, the podman only retried for the blob uploads, and not the tagging operation. And docker never retries at all. So this approach does not seem to work out.
Added by mdellweg over 3 years ago
Added by mdellweg over 3 years ago
Revision af01b442 | View on GitHub
Add 429 polling to BlobUpload commit
Updated by mdellweg over 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset af01b442daa7a47b2ffc4674eafbc7a0331e2fa0.
Updated by pulpbot over 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Add 429 polling to BlobUpload commit
fixes #8151 https://pulp.plan.io/issues/8151