Issue #8865
closedRange requests for on demand content return the full file (Kickstarts fail for on_demand repos)
Description
When fetching an rpm from the content app and specifying the RANGE header, like so:
curl -k http://foreman-nuc2.usersys.redhat.com/pulp/content/Demo/Library/custom/CentOS7/main/Packages/s/sg3_utils-1.37-19.el7.x86_64.rpm -H "Range: bytes=1384-44339" > foo.rpm
if the package is being lazily downloaded, the entire file is returned and not JUST the range:
$ ls -l foo.rpm -h
-rw-rw-r--. 1 jlsherri jlsherri 646K Jun 4 09:03 foo.rpm
Once the package is downloaded, it behaves as you'd expect:
$ curl -k http://foreman-nuc2.usersys.redhat.com/pulp/content/Demo/Library/custom/CentOS7/main/Packages/s/sg3_utils-1.37-19.el7.x86_64.rpm -H "Range: bytes=1384-44339" > foo2.rpm
$ ls -l foo2.rpm
-rw-rw-r--. 1 jlsherri jlsherri 42956 Jun 4 09:04 foo2.rpm
For el7 (at least, probably more), this causes yum/anaconda to hang up the connection as soon as it gets the amount of requested data, which makes the content app really unhappy and leads to this error:
[2021-06-04 11:57:02 +0000] [27275] [ERROR] Error handling request
Traceback (most recent call last):
File "/usr/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 422, in _handle_request
resp = await self._request_handler(request)
File "/usr/lib64/python3.6/site-packages/aiohttp/web_app.py", line 499, in _handle
resp = await handler(request)
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 138, in stream_content
return await self._match_and_stream(path, request)
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 387, in _match_and_stream
request, StreamResponse(headers=headers), ca
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 501, in _stream_content_artifact
response = await self._stream_remote_artifact(request, response, remote_artifact)
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 651, in _stream_remote_artifact
download_result = await downloader.run()
File "/usr/lib/python3.6/site-packages/pulpcore/download/base.py", line 227, in run
return await self._run(extra_data=extra_data)
File "/usr/lib/python3.6/site-packages/pulp_rpm/app/downloaders.py", line 90, in _run
to_return = await self._handle_response(response)
File "/usr/lib/python3.6/site-packages/pulpcore/download/http.py", line 189, in _handle_response
await self.handle_data(chunk)
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 636, in handle_data
await response.write(data)
File "/usr/lib64/python3.6/site-packages/aiohttp/web_response.py", line 470, in write
await self._payload_writer.write(data)
File "/usr/lib64/python3.6/site-packages/aiohttp/http_writer.py", line 107, in write
self._write(chunk)
File "/usr/lib64/python3.6/site-packages/aiohttp/http_writer.py", line 67, in _write
raise ConnectionResetError("Cannot write to closing transport")
ConnectionResetError: Cannot write to closing transport
[04/Jun/2021:11:57:02 +0000] "GET /pulp/content/Demo/Library/custom/CentOS7/main/Packages/s/sg3_utils-1.37-19.el7.x86_64.rpm HTTP/1.1" 500 0 "-" "urlgrabber/3.10 yum/3.4.3"
and since anaconda receives the entire rpm instead of just the range it requested (the rpm header), it re-tries the request, and pulp continually just tries to return the entire file
Related issues
Updated by dalley over 3 years ago
I'm not able to reproduce this on latest master, I will try with 3.11
In [1]: Artifact.objects.all()
Out[1]: <QuerySet [<Artifact: pk=864fb941-43c8-4ff6-b747-0c8e755881c4>]>
In [2]: exit
## A completely different file from ^^ one, this one has never been downloaded before
(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ curl -k http://pulp3-source-centos7.localhost.example.com/pulp/content/fixture/duck-0.8-1.noarch.rpm -H "Range: bytes=1-200" > duck-0.8-1.noarch.rpm
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 200 100 200 0 0 464 0 --:--:-- --:--:-- --:--:-- 464
(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ ls -al
total 240
... snip ...
-rw-rw-r--. 1 vagrant vagrant 200 Jun 4 16:16 duck-0.8-1.noarch.rpm
-rw-rw-r--. 1 vagrant vagrant 200 Jun 4 16:09 fox-1.1-2.noarch.rpm
... snip ...
(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ python manage.py shell_plus
... snip ...
In [1]: Artifact.objects.all()
Out[1]: <QuerySet [<Artifact: pk=864fb941-43c8-4ff6-b747-0c8e755881c4>, <Artifact: pk=47e95aca-8bf2-4268-8a74-2238df24eeb7>]>
Both files were streamed but both provided 200 bytes back as requested by curl.
Updated by dalley over 3 years ago
I'm not entirely sure if ^^ was a fluke or not, and maybe it is reproducible on other versions, but in any case I did verify this behavior on 3.11
Updated by dkliban@redhat.com over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
- Triaged changed from No to Yes
- Sprint set to Sprint 98
Updated by jsherril@redhat.com over 3 years ago
I was also able to reproduce this (on 3.11), by
- syncing an on_demand repo
- changeing the repo to immediate it
- syncing it again
The behavior was exactly the same, which surprised me
Updated by dalley over 3 years ago
It was a fluke, I can reproduce this on master.
Updated by pulpbot over 3 years ago
Updated by dalley over 3 years ago
- Status changed from POST to NEW
- Assignee deleted (
dalley)
I won't have time to work on this, but I'll go ahead and get the test merged in https://github.com/pulp/pulpcore/pull/1399
Updated by bmbouter over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to bmbouter
Updated by bmbouter over 3 years ago
So what's the recommended behavior here?
- Have it stream and save all of the file even though the client asked for a portion of it (and serve just that to the client)?
- Have it fetch what the client is asking for every time (basically making ignoring policy=immediate)?
Updated by dalley over 3 years ago
IMO, #1 is the better option. But then, maybe it depends on why exactly Anaconda is making the range requests. Does it make range requests against every RPM to read the headers, or just ones that it thinks it may need to install?
Updated by jsherril@redhat.com over 3 years ago
I think it only fetches the ones it thinks it may need to install. I agree that option 1) is probably preferred, but i don't think that option 2) is that terrible (assuming the header request will use the on-disk rpm if available, that was a little unclear).
Most likely the header will be requested and then the rpm later on.
Updated by bmbouter over 3 years ago
I'm going to pursue option 1 as it will result in fewer requests to external servers over time.
Also I think we need to get the response headers right, so I'm going to mimic what is responded by an official centos mirror for example:
$ curl -i https://packages.oit.ncsu.edu/centos/7/os/x86_64/Packages/sg3_utils-1.37-19.el7.x86_64.rpm -H "Range: bytes=1384-44339"
HTTP/1.1 206 Partial Content
Date: Thu, 08 Jul 2021 14:35:58 GMT
Server: Apache
Last-Modified: Fri, 03 Apr 2020 21:08:05 GMT
ETag: "a16b8-5a269504aa76b"
Accept-Ranges: bytes
Content-Length: 42956
Content-Range: bytes 1384-44339/661176
Content-Type: application/x-rpm
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
Updated by pulpbot over 3 years ago
- Status changed from ASSIGNED to POST
Added by bmbouter over 3 years ago
Updated by bmbouter over 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|df2c8f109ec14e6bdb204058eb575d62fb58ad1a.
Updated by dalley over 3 years ago
- Related to Backport #9057: Backport 8865 "incorrect responses to range requests for on_demand content" to 3.14.z added
Updated by ipanova@redhat.com over 3 years ago
- Sprint/Milestone changed from 3.14.2 to 3.15.0
Updated by pulpbot about 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Adds Range header support to content app
This unskips the
Range
header and adds support for it to the content app.closes #8865