Issue #8865

Range requests for on demand content return the full file (Kickstarts fail for on_demand repos)

Added by 12 days ago. Updated 2 days ago.

Start date:
Due date:
Estimated time:
2. Medium
Platform Release:
Sprint Candidate:
Sprint 98


When fetching an rpm from the content app and specifying the RANGE header, like so:

curl -k  -H "Range: bytes=1384-44339"   > foo.rpm 

if the package is being lazily downloaded, the entire file is returned and not JUST the range:

$ ls -l foo.rpm -h
-rw-rw-r--. 1 jlsherri jlsherri 646K Jun  4 09:03 foo.rpm

Once the package is downloaded, it behaves as you'd expect:

$ curl -k  -H "Range: bytes=1384-44339"   > foo2.rpm 

$ ls -l foo2.rpm 
-rw-rw-r--. 1 jlsherri jlsherri 42956 Jun  4 09:04 foo2.rpm

For el7 (at least, probably more), this causes yum/anaconda to hang up the connection as soon as it gets the amount of requested data, which makes the content app really unhappy and leads to this error:

[2021-06-04 11:57:02 +0000] [27275] [ERROR] Error handling request
Traceback (most recent call last):
File "/usr/lib64/python3.6/site-packages/aiohttp/", line 422, in _handle_request
resp = await self._request_handler(request)
File "/usr/lib64/python3.6/site-packages/aiohttp/", line 499, in _handle
resp = await handler(request)
File "/usr/lib/python3.6/site-packages/pulpcore/content/", line 138, in stream_content
return await self._match_and_stream(path, request)
File "/usr/lib/python3.6/site-packages/pulpcore/content/", line 387, in _match_and_stream
request, StreamResponse(headers=headers), ca
File "/usr/lib/python3.6/site-packages/pulpcore/content/", line 501, in _stream_content_artifact
response = await self._stream_remote_artifact(request, response, remote_artifact)
File "/usr/lib/python3.6/site-packages/pulpcore/content/", line 651, in _stream_remote_artifact
download_result = await
File "/usr/lib/python3.6/site-packages/pulpcore/download/", line 227, in run
return await self._run(extra_data=extra_data)
File "/usr/lib/python3.6/site-packages/pulp_rpm/app/", line 90, in _run
to_return = await self._handle_response(response)
File "/usr/lib/python3.6/site-packages/pulpcore/download/", line 189, in _handle_response
await self.handle_data(chunk)
File "/usr/lib/python3.6/site-packages/pulpcore/content/", line 636, in handle_data
await response.write(data)
File "/usr/lib64/python3.6/site-packages/aiohttp/", line 470, in write
await self._payload_writer.write(data)
File "/usr/lib64/python3.6/site-packages/aiohttp/", line 107, in write
File "/usr/lib64/python3.6/site-packages/aiohttp/", line 67, in _write
raise ConnectionResetError("Cannot write to closing transport")
ConnectionResetError: Cannot write to closing transport
[04/Jun/2021:11:57:02 +0000] "GET /pulp/content/Demo/Library/custom/CentOS7/main/Packages/s/sg3_utils-1.37-19.el7.x86_64.rpm HTTP/1.1" 500 0 "-" "urlgrabber/3.10 yum/3.4.3"

and since anaconda receives the entire rpm instead of just the range it requested (the rpm header), it re-tries the request, and pulp continually just tries to return the entire file


#2 Updated by dalley 12 days ago

I'm not able to reproduce this on latest master, I will try with 3.11

In [1]: Artifact.objects.all() 
Out[1]: <QuerySet [<Artifact: pk=864fb941-43c8-4ff6-b747-0c8e755881c4>]>

In [2]: exit

## A completely different file from ^^ one, this one has never been downloaded before

(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ curl -k   -H "Range: bytes=1-200" > duck-0.8-1.noarch.rpm                                                                                                                        
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   200  100   200    0     0    464      0 --:--:-- --:--:-- --:--:--   464
(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ ls -al
total 240
... snip ...
-rw-rw-r--. 1 vagrant vagrant   200 Jun  4 16:16 duck-0.8-1.noarch.rpm
-rw-rw-r--. 1 vagrant vagrant   200 Jun  4 16:09 fox-1.1-2.noarch.rpm
... snip ...

(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ python shell_plus                                          
... snip ...

In [1]: Artifact.objects.all()
Out[1]: <QuerySet [<Artifact: pk=864fb941-43c8-4ff6-b747-0c8e755881c4>, <Artifact: pk=47e95aca-8bf2-4268-8a74-2238df24eeb7>]>

Both files were streamed but both provided 200 bytes back as requested by curl.

#3 Updated by dalley 11 days ago

I'm not entirely sure if ^^ was a fluke or not, and maybe it is reproducible on other versions, but in any case I did verify this behavior on 3.11

#4 Updated by 8 days ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley
  • Triaged changed from No to Yes
  • Sprint set to Sprint 98

#5 Updated by 8 days ago

I was also able to reproduce this (on 3.11), by

  1. syncing an on_demand repo
  2. changeing the repo to immediate it
  3. syncing it again

The behavior was exactly the same, which surprised me

#6 Updated by dalley 8 days ago

It was a fluke, I can reproduce this on master.

#7 Updated by dalley 8 days ago

  • Status changed from ASSIGNED to POST

#9 Updated by dalley 2 days ago

  • Status changed from POST to NEW
  • Assignee deleted (dalley)

I won't have time to work on this, but I'll go ahead and get the test merged in

Please register to edit this issue

Also available in: Atom PDF