Story #8881
Updated by bmbouter over 3 years ago
## Background Currently Pulp will only retry on specific http error codes: 429 - Too Many Requests 502 - Bad Gateway 503 - Service Unavailable 504 - Gateway Timeout That happens here: https://github.com/pulp/pulpcore/blob/master/pulpcore/download/http.py#L15-L32 ## The need to retry 1. It's expected by users. Almost all (if not all) download tools, e.g. curl, wget, etc have retry behaviors in them 2. With sync having so many downloads occur and a single error halting the entire process, as it is now, Pulp is not very reliable ## What to retry on With the implementation of this feature Pulp will retry in the following situations: * All HTTP 5xx response codes * HTTP 429 * Socket timeouts and TCP disconnects This is a simplified set of cases that mimics the retry behaviors outlined by [AWS](https://docs.aws.amazon.com/general/latest/gr/api-retries.html) and [Google Cloud](https://cloud.google.com/storage/docs/retry-strategy). ## Exponential Backoff and Jitter Retries will continue to use the backoff and jitter [already used today](https://github.com/pulp/pulpcore/blob/master/pulpcore/download/http.py#L197-L199). ## Number of Retries The default will be 5, which was the Pulp2 default. ## User configuration The [`Remote`](https://github.com/pulp/pulpcore/blob/e2ad45731e42d07a09a41aed254565baa153e535/pulpcore/app/models/repository.py#L209) will get a new field: `retry_count `retry_count` = models.PositiveIntegerField(default=5)` models.PositiveIntegerField(default=5)