Issue #8151: return toomanyrequesrs error code during podman push/pull if the resource is locked - Container Support - Pulp

Actions

Send by e-mail Copy link

Issue #8151

closed

return toomanyrequesrs error code during podman push/pull if the resource is locked

Added by ipanova@redhat.com almost 4 years ago. Updated almost 4 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

mdellweg

Sprint/Milestone:

2.5.0

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Platform Release:

OS:

Triaged:

Yes

Groomed:

Yes

Sprint Candidate:

Tags:

GalaxyNG

Sprint:

Sprint 93

Quarter:

Description

https://docs.docker.com/registry/spec/api/#errors-2

It can happen that resources we are referencing during pull/push calls are locked by a pulp task- Namespace, Distribution, Repo.

In that case we should return a tomanyrequests 429 error code.

If this won't work out - then we should look into how to acquire locks/release them outside of task during pull/push operations.

Actions

Copy link

Updated by ipanova@redhat.com almost 4 years ago

Subject changed from return DENIED error code during podman push/pull if the resource is locked to return toomanyrequesrs error code during podman push/pull if the resource is locked
Description updated (diff)

https://github.com/pulp/pulp_container/pull/210 https://github.com/ipanova/pulp_container/commit/fda2c568151acb95973d36b5a018ca818f321023 this poc is still error prone. During podman push, we should try creating a task when creating new repo_version or other objects like distribution and issue a 429 to the client so it will come back later. If the task is not finished by that time issue another 429

Actions

Copy link

Updated by ipanova@redhat.com almost 4 years ago

Sprint/Milestone changed from 2.3.0 to 2.4.0

Actions

Copy link

Updated by ipanova@redhat.com almost 4 years ago

Triaged changed from No to Yes

Actions

Copy link

Updated by ipanova@redhat.com almost 4 years ago

Tags GalaxyNG added

Actions

Copy link

Updated by mdellweg almost 4 years ago

The currently decided upon approach is to:

Schedule a task
Wait a short (ideally < 1 sec) amount of time
If the task finished, return the result to the client
If the task didn't finish, issue 429 with a reference to the task uuid to the client
When the client comes back with a task id, restart at 3.

Design foundations: It is important to not wait long, because this will stall the pulp-api workers. It is valuable to not always, issue 429, because this will result in the client sending everything at least twice.

We should look into types of redirects, to see if we can find a way to trick the client into lightweight task polling.

As a second objective, we will try to find a way to acquire non-blocking now-or-never locks on resources, and maybe keep the solution outlined above as a fallback.

Actions

Copy link

Updated by bmbouter almost 4 years ago

mdellweg, this all looks good. Good writeup.

The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.

Actions

Copy link

Updated by mdellweg almost 4 years ago

So we could redirect the client from the POST/PUT call to a task polling url, that returns 429 until it's done, right?

Actions

Copy link

Updated by ipanova@redhat.com almost 4 years ago

bmbouter wrote:

mdellweg, this all looks good. Good writeup.

The only addition I have is that while 429 may let you specify a backoff and that's great, to have the client come back with a different URL I believe you'll have to use a HTTP 302. I think a 429 won't have the client come back to the same url.

It's the retry-after header that tells to the client how long to wait before issuing following requests to the server. This header can be present both on 429 responses as well as on redirects.

https://tools.ietf.org/html/rfc6585#section-4 https://tools.ietf.org/html/rfc7231#section-7.1.3

The only caveat in this header is not all the clients may take it into account so we should check whether podman/docker do. https://github.com/containers/podman/blob/master/vendor/github.com/containers/image/v5/docker/docker_client.go#L463 https://github.com/containers/image/blob/master/docker/docker_client.go#L463 https://www.openshift.com/blog/mitigate-impact-of-docker-hub-pull-request-limits

Actions

Copy link

Updated by mdellweg almost 4 years ago

Status changed from NEW to ASSIGNED
Assignee set to mdellweg

Actions

Copy link

#10

Updated by ipanova@redhat.com almost 4 years ago

Sprint set to Sprint 91

Actions

Copy link

#11

Updated by ipanova@redhat.com almost 4 years ago

Groomed changed from No to Yes

Actions

Copy link

#12

Updated by mdellweg almost 4 years ago

My experiments so far:

We can redirect the client to the same endpoint, which will result in a GET call after the PUT.
"Retry-After" is only defined for 301, 429 and 503
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After
podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429
when issued 429, podman goes all the way back and repeats the PUT call

Together, these results do not allow us to proceed in the way outlined.

Actions

Copy link

#13

Updated by ipanova@redhat.com almost 4 years ago

mdellweg wrote:

My experiments so far:

We can redirect the client to the same endpoint, which will result in a GET call after the PUT.

"Retry-After" is only defined for 301, 429 and 503

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After

Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects

podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429

when issued 429, podman goes all the way back and repeats the PUT call

how do we pass the task uuid reference then?

Together, these results do not allow us to proceed in the way outlined.

Actions

Copy link

#14

Updated by mdellweg almost 4 years ago

ipanova@redhat.com wrote:

mdellweg wrote:

My experiments so far:

We can redirect the client to the same endpoint, which will result in a GET call after the PUT.

"Retry-After" is only defined for 301, 429 and 503

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After

Based on RFC https://tools.ietf.org/html/rfc7231#section-7.1.3 any 3xx redirect call can contain this header but this does not really matter if podman ignores it on redirects

podman ignores the "Retry-After" on 301 and 302 responses, but plays nice with 429

when issued 429, podman goes all the way back and repeats the PUT call

how do we pass the task uuid reference then? That is the big problem, as 429 is not a redirect.

Together, these results do not allow us to proceed in the way outlined.

Update; 303 (See other) has the same result as the other redirects

Actions

Copy link

#15

Updated by pulpbot almost 4 years ago

Status changed from ASSIGNED to POST

PR: https://github.com/pulp/pulp_container/pull/259

Actions

Copy link

#16

Updated by rchan almost 4 years ago

Sprint changed from Sprint 91 to Sprint 92

Actions

Copy link

#17

Updated by ipanova@redhat.com almost 4 years ago

Sprint/Milestone changed from 2.4.0 to 2.5.0

Actions

Copy link

#18

Updated by rchan almost 4 years ago

Sprint changed from Sprint 92 to Sprint 93

Actions

Copy link

#19

Updated by mdellweg almost 4 years ago

New update: It turns out, the podman only retried for the blob uploads, and not the tagging operation. And docker never retries at all. So this approach does not seem to work out.