Story #8881
closed
As a user, Pulp will retry downloads by default 3 times and I can configure this value on my Remote
Status:
CLOSED - CURRENTRELEASE
Description
Background¶
Currently Pulp will only retry on specific http error codes:
429 - Too Many Requests
502 - Bad Gateway
503 - Service Unavailable
504 - Gateway Timeout
That happens here: https://github.com/pulp/pulpcore/blob/master/pulpcore/download/http.py#L15-L32
The need to retry¶
- It's expected by users. Almost all (if not all) download tools, e.g. curl, wget, etc have retry behaviors in them
- With sync having so many downloads occur and a single error halting the entire process, as it is now, Pulp is not very reliable
What to retry on¶
With the implementation of this feature Pulp will retry in the following situations:
- All HTTP 5xx response codes
- HTTP 429
- Socket timeouts and TCP disconnects
This is a simplified set of cases that mimics the retry behaviors outlined by AWS and Google Cloud.
Exponential Backoff and Jitter¶
Retries will continue to use the backoff and jitter already used today.
Number of Retries¶
The default will be 5, which was the Pulp2 default.
User configuration¶
The Remote
will get a new field:
retry_count = models.PositiveIntegerField(default=5)
"configurable per-remote, w/a new Remote field" means a data-migration, which has impacts on backporting. I'd like to see the code w/a hardcoded 5 as one commit, and add the migration/check-per-remote to be a separate commit. Could be same PR, could even be different ones.
I volunteer for this if I still have time whenever we decide it needs to be picked up.
- Description updated (diff)
- Has duplicate Story #8873: pulp 3 stops when encounters 403 error added
I would suggest to lower the default value to 3.
If we plan to retry on all the codes, I don't think retying, for example, on code 500 for 5 times is a good idea,with the default base and factor of the backoff.expo
we will loose ~30 seconds.
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
- Status changed from ASSIGNED to POST
- Has duplicate deleted (Story #8873: pulp 3 stops when encounters 403 error)
- Has duplicate Issue #8878: default Download Concurrency is too high for many centos mirrors added
- Blocked by Issue #8899: IntegerFields cannot be set null after being set to a value even if allow_null=True added
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
- Sprint changed from Sprint 98 to Sprint 99
- Subject changed from As a user, Pulp will retry downloads by default 5 times and I can configure this value on my Remote to As a user, Pulp will retry downloads by default 3 times and I can configure this value on my Remote
- Sprint/Milestone set to 3.14.0
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
- Related to Issue #6589: Issues synching large Redhat repos added
Also available in: Atom
PDF
Retry file downloads on more types of errors
closes: #8881 https://pulp.plan.io/issues/8881 closes: #8867 https://pulp.plan.io/issues/8867