Project

Profile

Help

Story #3421

Updated by bmbouter about 6 years ago

When HTTP downloads fail with 429 responses (Too Many Requests), the HttpDownloader could wait-and-retry later with possible success. This would allow for downloads to complete over time instead of the first few completing and then all subsequent requests being rejected. For example this is how github rate limits you if you try to download too many files from them at once. 

 h3. Use Cases 

 * the downloaders will exponential backoff and retry when a HTTP 429 response is encountered 
 * As a plugin writer, I can disable the exponential backoff and retry feature 

 h3. Implementation Options 

 Having a "coordinated rate limiting":https://quentin.pradet.me/blog/how-do-you-rate-limit-calls-with-aiohttp.html implementation is straightforward, but then it needs to be configured probably site-by-site by either users or plugin writers. This is a pain and prone to many problems. Instead a dumb exponential backoff behavior can cause rejected requests to retry automatically when the server is overloaded up to 10 times. 

 We could do something like use "backoff-async":https://pypi.python.org/pypi/backoff-async/2.0.0 and passthrough those options to the plugin writer, but I'm hoping for something really simple hence the hardcoded 10 times. 

 h3. When to finally fail? 

 After 10 times. If When the HTTPDownloader gets a 429 over an over, it backs off exponentially it would back of sleeping for roughly X seconds which uses the following exponential values: 1, 2, 4, 8, 16, 32, ... 

 <pre><code class="python"> 
 >>> [pow(2, i) for i in range(0,10)] 
 [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]    # These are all in seconds 
 >>> sum([pow(2, i) for i in range(0,10)]) / float(60) 
 17.05    # This is 17 minutes or so. total 
 </code></pre> 

 This will bail after 10 times, causing each downloader to wait a maximum of 17 minutes before letting the 429 exception get raised uncaught

Back