Story #3421
closed
As a plugin writer I have HTTPDownloaders which provide exponential backoff for HTTP 429 errors
Status:
CLOSED - CURRENTRELEASE
Description
When HTTP downloads fail with 429 responses (Too Many Requests), the HttpDownloader could wait-and-retry later with possible success. This would allow for downloads to complete over time instead of the first few completing and then all subsequent requests being rejected. For example this is how github rate limits you if you try to download too many files from them at once.
Use Cases¶
- the downloaders will exponential backoff and retry when a HTTP 429 response is encountered
- As a plugin writer, I can disable the exponential backoff and retry feature
Implementation Options¶
Having a coordinated rate limiting implementation is straightforward, but then it needs to be configured probably site-by-site by either users or plugin writers. This is a pain and prone to many problems. Instead a dumb exponential backoff behavior can cause rejected requests to retry automatically when the server is overloaded up to 10 times.
We could do something like use backoff-async and passthrough those options to the plugin writer, but I'm hoping for something really simple hence the hardcoded 10 times.
When to finally fail?¶
After 10 times. If it backs off exponentially it would back of for roughly 17 minutes or so.
Yes we need to add some sort of rate limiting to the HttpDownloaders.
What about having the downloaders do an exponential backoff and retry with the 429 errors?
+1. That's a good option.
- Tracker changed from Issue to Story
- % Done set to 0
- Project changed from Ansible Plugin to Pulp
- Subject changed from Github rate limits requests to As a plugin writer I have HTTPDownloaders which provide exponential backoff for HTTP 429 errors
- Description updated (diff)
Rewriting to be a core story for the downloaders since all plugins would benefit from this feature.
- Sprint Candidate changed from No to Yes
- Description updated (diff)
- Groomed changed from No to Yes
- Status changed from NEW to ASSIGNED
- Assignee set to bmbouter
- Sprint set to Sprint 34
Moving onto sprint since it's groomed and a core beta deliverable. I'm also taking as assigned to post something on it tomorrow.
- Status changed from ASSIGNED to CLOSED - NOTABUG
- Assignee deleted (
bmbouter)
- Sprint Candidate changed from Yes to No
- Sprint deleted (
Sprint 34)
- Tags deleted (
Pulp 3 MVP)
I couldn't reproduce this issue, and it's possible that the remote servers were having some issues the day it was a problem. In those situations maybe the right thing to do is to fail and not continue retrying anyway?
Since we can't reproduce the problem I think we should hold off on adding retry support for now. I'm removing from the sprint and closing as notabug. We should reopen if the need for retry support of 429 errors.
- Status changed from CLOSED - NOTABUG to ASSIGNED
- Assignee set to bmbouter
- Sprint set to Sprint 35
- Tags Pulp 3 MVP added
After working through some pulp_ansible bugs I can now reproduce this regularly. I'm reopening to add the 429 retry support. Currently the tasks get a lot of errors like this one:
{
"code": null,
"description": "429, message='Too Many Requests'",
"traceback": null
},
- Description updated (diff)
- Status changed from ASSIGNED to POST
Since we went with the 'backoff' library really we just had to choose how many times we wanted to retry for. Since we had already said 10 I just went with that. I updated the ticket description with this.
It got pulp_ansible syncing everything out of github with no failures!
PR available at: https://github.com/pulp/pulp/pull/3425
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
- Sprint/Milestone set to 3.0.0
- Tags deleted (
Pulp 3, Pulp 3 MVP)
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Also available in: Atom
PDF
Adds 429 Retry Support
All retry support is provided by the 'backoff' library which also provides an asyncio compatible interface.
This updates the docs and docstrings to call out the new retry functionality.
https://pulp.plan.io/issues/3421 closes #3421