Project

Profile

Help

Story #8881

Updated by bmbouter over 2 years ago

## Background 

 Currently Pulp will only retry on specific http error codes: 

 429 - Too Many Requests 
 502 - Bad Gateway 
 503 - Service Unavailable 
 504 - Gateway Timeout 

 That happens here:    https://github.com/pulp/pulpcore/blob/master/pulpcore/download/http.py#L15-L32 

 ## The need to retry 

 1. It's expected by users. Almost all (if not all) download tools, e.g. curl, wget, etc have retry behaviors in them 
 2. With sync having so many downloads occur and a single error halting the entire process, as it is now, Pulp is not very reliable 

 ## What to retry on 

 With the implementation of this feature Pulp will retry in the following situations: 

 * All HTTP 5xx response codes 
 * HTTP 429 
 * Socket timeouts and TCP disconnects 

 This is a simplified set of cases that mimics the retry behaviors outlined by [AWS](https://docs.aws.amazon.com/general/latest/gr/api-retries.html) and [Google Cloud](https://cloud.google.com/storage/docs/retry-strategy). 

 ## Exponential Backoff and Jitter 

 Retries will continue to use the backoff and jitter [already used today](https://github.com/pulp/pulpcore/blob/master/pulpcore/download/http.py#L197-L199). 

 ## Number of Retries 

 The default will be 5, which was the Pulp2 default. 

 ## User configuration 

 The [`Remote`](https://github.com/pulp/pulpcore/blob/e2ad45731e42d07a09a41aed254565baa153e535/pulpcore/app/models/repository.py#L209) will get a new field: 

 `retry_count `retry_count` = models.PositiveIntegerField(default=5)` models.PositiveIntegerField(default=5) 




Back