Project

Profile

Help

Issue #8865

closed

Range requests for on demand content return the full file (Kickstarts fail for on_demand repos)

Added by jsherril@redhat.com almost 3 years ago. Updated over 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 100
Quarter:

Description

When fetching an rpm from the content app and specifying the RANGE header, like so:

curl -k  http://foreman-nuc2.usersys.redhat.com/pulp/content/Demo/Library/custom/CentOS7/main/Packages/s/sg3_utils-1.37-19.el7.x86_64.rpm  -H "Range: bytes=1384-44339"   > foo.rpm 

if the package is being lazily downloaded, the entire file is returned and not JUST the range:

$ ls -l foo.rpm -h
-rw-rw-r--. 1 jlsherri jlsherri 646K Jun  4 09:03 foo.rpm

Once the package is downloaded, it behaves as you'd expect:

$ curl -k  http://foreman-nuc2.usersys.redhat.com/pulp/content/Demo/Library/custom/CentOS7/main/Packages/s/sg3_utils-1.37-19.el7.x86_64.rpm  -H "Range: bytes=1384-44339"   > foo2.rpm 


$ ls -l foo2.rpm 
-rw-rw-r--. 1 jlsherri jlsherri 42956 Jun  4 09:04 foo2.rpm

For el7 (at least, probably more), this causes yum/anaconda to hang up the connection as soon as it gets the amount of requested data, which makes the content app really unhappy and leads to this error:

[2021-06-04 11:57:02 +0000] [27275] [ERROR] Error handling request
Traceback (most recent call last):
File "/usr/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 422, in _handle_request
resp = await self._request_handler(request)
File "/usr/lib64/python3.6/site-packages/aiohttp/web_app.py", line 499, in _handle
resp = await handler(request)
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 138, in stream_content
return await self._match_and_stream(path, request)
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 387, in _match_and_stream
request, StreamResponse(headers=headers), ca
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 501, in _stream_content_artifact
response = await self._stream_remote_artifact(request, response, remote_artifact)
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 651, in _stream_remote_artifact
download_result = await downloader.run()
File "/usr/lib/python3.6/site-packages/pulpcore/download/base.py", line 227, in run
return await self._run(extra_data=extra_data)
File "/usr/lib/python3.6/site-packages/pulp_rpm/app/downloaders.py", line 90, in _run
to_return = await self._handle_response(response)
File "/usr/lib/python3.6/site-packages/pulpcore/download/http.py", line 189, in _handle_response
await self.handle_data(chunk)
File "/usr/lib/python3.6/site-packages/pulpcore/content/handler.py", line 636, in handle_data
await response.write(data)
File "/usr/lib64/python3.6/site-packages/aiohttp/web_response.py", line 470, in write
await self._payload_writer.write(data)
File "/usr/lib64/python3.6/site-packages/aiohttp/http_writer.py", line 107, in write
self._write(chunk)
File "/usr/lib64/python3.6/site-packages/aiohttp/http_writer.py", line 67, in _write
raise ConnectionResetError("Cannot write to closing transport")
ConnectionResetError: Cannot write to closing transport
[04/Jun/2021:11:57:02 +0000] "GET /pulp/content/Demo/Library/custom/CentOS7/main/Packages/s/sg3_utils-1.37-19.el7.x86_64.rpm HTTP/1.1" 500 0 "-" "urlgrabber/3.10 yum/3.4.3"

and since anaconda receives the entire rpm instead of just the range it requested (the rpm header), it re-tries the request, and pulp continually just tries to return the entire file


Related issues

Related to Pulp - Backport #9057: Backport 8865 "incorrect responses to range requests for on_demand content" to 3.14.zCLOSED - CURRENTRELEASEdaviddavis

Actions
Actions #2

Updated by dalley almost 3 years ago

I'm not able to reproduce this on latest master, I will try with 3.11

In [1]: Artifact.objects.all() 
Out[1]: <QuerySet [<Artifact: pk=864fb941-43c8-4ff6-b747-0c8e755881c4>]>

In [2]: exit


## A completely different file from ^^ one, this one has never been downloaded before

(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ curl -k http://pulp3-source-centos7.localhost.example.com/pulp/content/fixture/duck-0.8-1.noarch.rpm   -H "Range: bytes=1-200" > duck-0.8-1.noarch.rpm                                                                                                                        
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   200  100   200    0     0    464      0 --:--:-- --:--:-- --:--:--   464
(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ ls -al
total 240
... snip ...
-rw-rw-r--. 1 vagrant vagrant   200 Jun  4 16:16 duck-0.8-1.noarch.rpm
-rw-rw-r--. 1 vagrant vagrant   200 Jun  4 16:09 fox-1.1-2.noarch.rpm
... snip ...

(pulp) [vagrant@pulp3-source-centos7 pulpcore]$ python manage.py shell_plus                                          
... snip ...

In [1]: Artifact.objects.all()
Out[1]: <QuerySet [<Artifact: pk=864fb941-43c8-4ff6-b747-0c8e755881c4>, <Artifact: pk=47e95aca-8bf2-4268-8a74-2238df24eeb7>]>

Both files were streamed but both provided 200 bytes back as requested by curl.

Actions #3

Updated by dalley almost 3 years ago

I'm not entirely sure if ^^ was a fluke or not, and maybe it is reproducible on other versions, but in any case I did verify this behavior on 3.11

Actions #4

Updated by dkliban@redhat.com almost 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley
  • Triaged changed from No to Yes
  • Sprint set to Sprint 98
Actions #5

Updated by jsherril@redhat.com almost 3 years ago

I was also able to reproduce this (on 3.11), by

  1. syncing an on_demand repo
  2. changeing the repo to immediate it
  3. syncing it again

The behavior was exactly the same, which surprised me

Actions #6

Updated by dalley almost 3 years ago

It was a fluke, I can reproduce this on master.

https://github.com/pulp/pulpcore/pull/1399

Actions #7

Updated by dalley almost 3 years ago

  • Status changed from ASSIGNED to POST
Actions #9

Updated by dalley almost 3 years ago

  • Status changed from POST to NEW
  • Assignee deleted (dalley)

I won't have time to work on this, but I'll go ahead and get the test merged in https://github.com/pulp/pulpcore/pull/1399

Actions #10

Updated by rchan almost 3 years ago

  • Sprint changed from Sprint 98 to Sprint 99
Actions #11

Updated by dalley over 2 years ago

  • Priority changed from Normal to High
Actions #12

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 99 to Sprint 100
Actions #13

Updated by bmbouter over 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to bmbouter
Actions #14

Updated by bmbouter over 2 years ago

So what's the recommended behavior here?

  1. Have it stream and save all of the file even though the client asked for a portion of it (and serve just that to the client)?
  2. Have it fetch what the client is asking for every time (basically making ignoring policy=immediate)?
Actions #15

Updated by dalley over 2 years ago

IMO, #1 is the better option. But then, maybe it depends on why exactly Anaconda is making the range requests. Does it make range requests against every RPM to read the headers, or just ones that it thinks it may need to install?

Actions #16

Updated by jsherril@redhat.com over 2 years ago

I think it only fetches the ones it thinks it may need to install. I agree that option 1) is probably preferred, but i don't think that option 2) is that terrible (assuming the header request will use the on-disk rpm if available, that was a little unclear).

Most likely the header will be requested and then the rpm later on.

Actions #17

Updated by bmbouter over 2 years ago

I'm going to pursue option 1 as it will result in fewer requests to external servers over time.

Also I think we need to get the response headers right, so I'm going to mimic what is responded by an official centos mirror for example:

$ curl -i https://packages.oit.ncsu.edu/centos/7/os/x86_64/Packages/sg3_utils-1.37-19.el7.x86_64.rpm  -H "Range: bytes=1384-44339"
HTTP/1.1 206 Partial Content
Date: Thu, 08 Jul 2021 14:35:58 GMT
Server: Apache
Last-Modified: Fri, 03 Apr 2020 21:08:05 GMT
ETag: "a16b8-5a269504aa76b"
Accept-Ranges: bytes
Content-Length: 42956
Content-Range: bytes 1384-44339/661176
Content-Type: application/x-rpm

Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
Actions #18

Updated by pulpbot over 2 years ago

  • Status changed from ASSIGNED to POST

Added by bmbouter over 2 years ago

Revision df2c8f10 | View on GitHub

Adds Range header support to content app

This unskips the Range header and adds support for it to the content app.

closes #8865

Actions #19

Updated by bmbouter over 2 years ago

  • Status changed from POST to MODIFIED
Actions #20

Updated by dalley over 2 years ago

  • Related to Backport #9057: Backport 8865 "incorrect responses to range requests for on_demand content" to 3.14.z added
Actions #21

Updated by dalley over 2 years ago

  • Sprint/Milestone set to 3.14.2
Actions #22

Updated by ipanova@redhat.com over 2 years ago

  • Sprint/Milestone changed from 3.14.2 to 3.15.0
Actions #24

Updated by pulpbot over 2 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF