Project

Profile

Help

Story #8881

As a user, Pulp will retry downloads by default 3 times and I can configure this value on my Remote

Added by bmbouter 4 months ago. Updated 3 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 99
Quarter:

Description

Background

Currently Pulp will only retry on specific http error codes:

429 - Too Many Requests 502 - Bad Gateway 503 - Service Unavailable 504 - Gateway Timeout

That happens here: https://github.com/pulp/pulpcore/blob/master/pulpcore/download/http.py#L15-L32

The need to retry

  1. It's expected by users. Almost all (if not all) download tools, e.g. curl, wget, etc have retry behaviors in them
  2. With sync having so many downloads occur and a single error halting the entire process, as it is now, Pulp is not very reliable

What to retry on

With the implementation of this feature Pulp will retry in the following situations:

  • All HTTP 5xx response codes
  • HTTP 429
  • Socket timeouts and TCP disconnects

This is a simplified set of cases that mimics the retry behaviors outlined by AWS and Google Cloud.

Exponential Backoff and Jitter

Retries will continue to use the backoff and jitter already used today.

Number of Retries

The default will be 5, which was the Pulp2 default.

User configuration

The Remote will get a new field:

retry_count = models.PositiveIntegerField(default=5)


Related issues

Related to RPM Support - Issue #6589: Issues synching large Redhat reposCLOSED - DUPLICATE<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Has duplicate Pulp - Issue #8878: default Download Concurrency is too high for many centos mirrorsCLOSED - DUPLICATE<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Blocked by Pulp - Issue #8899: IntegerFields cannot be set null after being set to a value even if allow_null=TrueCLOSED - NOTABUG<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

Associated revisions

Revision 39b44a4a View on GitHub
Added by dalley 3 months ago

Retry file downloads on more types of errors

closes: #8881 https://pulp.plan.io/issues/8881 closes: #8867 https://pulp.plan.io/issues/8867

History

#1 Updated by ggainey 4 months ago

"configurable per-remote, w/a new Remote field" means a data-migration, which has impacts on backporting. I'd like to see the code w/a hardcoded 5 as one commit, and add the migration/check-per-remote to be a separate commit. Could be same PR, could even be different ones.

#2 Updated by dalley 4 months ago

I volunteer for this if I still have time whenever we decide it needs to be picked up.

#3 Updated by bmbouter 4 months ago

  • Description updated (diff)

#4 Updated by ipanova@redhat.com 4 months ago

  • Has duplicate Story #8873: pulp 3 stops when encounters 403 error added

#5 Updated by ipanova@redhat.com 4 months ago

I would suggest to lower the default value to 3. If we plan to retry on all the codes, I don't think retying, for example, on code 500 for 5 times is a good idea,with the default base and factor of the backoff.expo we will loose ~30 seconds.

#6 Updated by dalley 4 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley

#7 Updated by dalley 4 months ago

  • Status changed from ASSIGNED to POST

#8 Updated by ipanova@redhat.com 4 months ago

  • Sprint set to Sprint 98

#9 Updated by ipanova@redhat.com 4 months ago

  • Has duplicate deleted (Story #8873: pulp 3 stops when encounters 403 error)

#10 Updated by bmbouter 3 months ago

  • Has duplicate Issue #8878: default Download Concurrency is too high for many centos mirrors added

#11 Updated by dalley 3 months ago

  • Blocked by Issue #8899: IntegerFields cannot be set null after being set to a value even if allow_null=True added

#12 Updated by dalley 3 months ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#13 Updated by rchan 3 months ago

  • Sprint changed from Sprint 98 to Sprint 99

#15 Updated by bmbouter 3 months ago

  • Subject changed from As a user, Pulp will retry downloads by default 5 times and I can configure this value on my Remote to As a user, Pulp will retry downloads by default 3 times and I can configure this value on my Remote

#16 Updated by pulpbot 3 months ago

  • Sprint/Milestone set to 3.14.0

#17 Updated by pulpbot 3 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

#18 Updated by dalley 3 months ago

  • Related to Issue #6589: Issues synching large Redhat repos added

Please register to edit this issue

Also available in: Atom PDF