Issue #3149
closedpulpcore.plugin.download.asyncio has surprising timeout handling
Description
aiohttp defines two timeouts in the ClientSession read_timeout
and conn_timeout
. The read_timeout
is cumulative for the entire connection, i.e. all data must be read within this timeout.
In case of longer connections used for downloads (i.e. connections using the streaming API to get the data in smaller chunks), this may lead to timeouts on a connection that is perfectly fine, as the timeout is global and not per read operation.
For example:
async def single_download():
async with aiohttp.ClientSession(read_timeout=3, conn_timeout=3) as session:
downloader = HttpDownloader('http://127.0.0.1:8000/large_file.txt', session=session)
return await downloader.run()
loop = asyncio.get_event_loop()
result = loop.run_until_complete(single_download())
Results in:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.6/asyncio/base_events.py", line 467, in run_until_complete
return future.result()
File "<stdin>", line 4, in single_download
File "/home/user/github/pulp-3-from-source/venv/src/pulpcore/plugin/pulpcore/plugin/download/asyncio/base.py", line 50, in wrapper
raise error
File "/home/user/github/pulp-3-from-source/venv/src/pulpcore/plugin/pulpcore/plugin/download/asyncio/base.py", line 47, in wrapper
return await func(downloader)
File "/home/user/github/pulp-3-from-source/venv/src/pulpcore/plugin/pulpcore/plugin/download/asyncio/http.py", line 130, in run
to_return = await self._handle_response(response)
File "/home/user/github/pulp-3-from-source/venv/src/pulpcore/plugin/pulpcore/plugin/download/asyncio/http.py", line 112, in _handle_response
chunk = await response.content.read(1024 * 1024)
File "/home/user/github/pulp-3-from-source/venv/lib64/python3.6/site-packages/aiohttp/streams.py", line 607, in read
return (yield from super().read(n))
File "/home/user/github/pulp-3-from-source/venv/lib64/python3.6/site-packages/aiohttp/streams.py", line 330, in read
yield from self._wait('read')
File "/home/user/github/pulp-3-from-source/venv/lib64/python3.6/site-packages/aiohttp/streams.py", line 259, in _wait
yield from waiter
File "/home/user/github/pulp-3-from-source/venv/lib64/python3.6/site-packages/aiohttp/helpers.py", line 727, in __exit__
raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError
after approx. 3 seconds.
One could use a high per connection timeout and an additional timeout for read() for long running connections. Something along these lines:
async with client.get(download_url, timeout=long_timout) as resp:
try:
...
new_data = await asyncio.wait_for(resp.content.read(self.CHUNK_SIZE),
per_read_timeout)
...
except asyncio.TimeoutError:
...
Related issues
Updated by dalley almost 7 years ago
- Sprint/Milestone set to 47
- Triaged changed from No to Yes
Updated by jortel@redhat.com almost 7 years ago
- Sprint/Milestone changed from 53 to 54
Updated by jortel@redhat.com over 6 years ago
- Sprint Candidate changed from No to Yes
Updated by bmbouter over 6 years ago
- Related to Issue #3918: DeclarativeVersion cannot sync longer than 5 minutes or a timeout error is raised added
Updated by bmbouter over 6 years ago
- Status changed from NEW to CLOSED - WONTFIX
FYI, In the aiohttp docs now (3.3.2) I see additional options named ['total', 'connect', 'sock_connect', 'sock_read']: https://aiohttp.readthedocs.io/en/stable/client_quickstart.html#timeouts
gmbnomis I can see how the plugin writer can do things that make no sense with those types of options. I think they are aiohttp options, so there isn't much Pulp can do to resolve them. aiohttp could have some sanity validation added though, if it doesn't already do that. I recently filed an issue in upstream aiohttp and it was a good experience. Their tracker is here: https://github.com/aio-libs/aiohttp/issues/
In terms of what Pulp can do for this, I can only see closing this as WONTFIX since we aren't in control of those options. It's good to leave here as a reference though and WONTFIX will do that. gmbnomis, if you have other ideas of what should happen please post. I just don't see another option.
Updated by gmbnomis over 6 years ago
bmbouter This issue is not about preventing plugin writers from doing stupid things (life experience tells me that this would be futile...). It was intended as a kind of "heads up" that aiohttp timeouts seem not to be made for long downloads (at that time at least). And apparently that is what happened in #3918.
According to the documentation you linked, the sock_read
timeout seems to implement the timeout implemented by the wait_for
in the code I posted above. Thus, there should be no need for this anymore.
Nevertheless, I think Pulp should have sane default timeout definitions in the downloader for the typical use case (in order to ease the life of plugin writers). Probably, a very long or no total timeout. Connection and read timeouts should probably be in the range of seconds (or minutes?). Are there proven nectar default timeout settings you could reuse?