Issue #4603
closedpulp_streamer streams decodes responses, but sends the 'gzip' Content-Encoding header
Description
This does not occur 100% of the time, but often enough that it can be reproduced consistently.
1) Create an 'on_demand' repository pointing to a RHEL 7 kickstart repository.
2) Sync the repository.
3) Download the repository
pulp-admin repo download --verify-all --repo-id ks
As the pulp_worker was downloading the file the following was emitted by pulp_streamer:
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] Unhandled Error
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] Traceback (most recent call last):
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] File "/usr/lib64/python2.7/site-packages/twisted/application/app.py", line 399, in startReactor
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] self.config, oldstdout, oldstderr, self.profiler, reactor)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] File "/usr/lib64/python2.7/site-packages/twisted/application/app.py", line 312, in runReactorWithLogging
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] reactor.run()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1261, in run
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] self.mainLoop()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1270, in mainLoop
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] self.runUntilCurrent()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] --- <exception caught here> ---
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 869, in runUntilCurrent
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] f(*a, **kw)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 236, in write
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] http.Request.write(self, data)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 1110, in write
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] self.channel.writeSequence(toChunk(data))
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] exceptions.AttributeError: 'NoneType' object has no attribute 'writeSequence'
Then the pulp_worker emmits the following:
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing: incorrect>
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) Traceback (most recent call last):
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) File "/usr/lib/python2.7/site-packages/nectar/downloaders/threaded.py", line 292, in _fetch
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) for chunk in chunks:
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) File "/usr/lib/python2.7/site-packages/requests/models.py", line 755, in generate
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) raise ContentDecodingError(e)
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while de>
Mar 28 21:03:26 pulp2.dev pulp[19562]: pulp.server.controllers.repository:INFO: Download of /var/lib/pulp/content/units/distribution/ea/91381d46b89bcb5fd5f595b7bc0fc7802404b3ae4899901f8fb076af0c00b2/images/pxeboot/initrd.img failed:
('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing: incorrect header check',)).
I was also able to cause this by simply requesting the initrd.img from the on_demand repository.
This can also be reproduced with this repo[0].
[0] https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-kickstart/
Related issues
Updated by dkliban@redhat.com over 5 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dkliban@redhat.com
Updated by CodeHeeler over 5 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 51
Updated by dkliban@redhat.com over 5 years ago
- Subject changed from pulp_streamer fails to properly stream response to pulp_streamer streams decodes responses, but sends the 'gzip' Content-Encoding header
- Description updated (diff)
The problem occurs only for content that can be compressed with gzip. nectar, the download library inside pulp workers, advertises that it can receive content compressed using gzip. When the worker asks the web server for one of the distribution files, the web server responds with the files compressed using gzip. The worker uses nectar to decode the file and then writes it to disk. The same nectar code is used in the streamer. So when the streamer is passing along the file, it is passing it along in the decoded form. So when the worker gets this data during the 'deferred_download' task or the 'download' task, it is attempting to decompress it, but the streamer had already done that.
Updated by daviddavis over 5 years ago
- Has duplicate Issue #4524: Broken symlinks for subrepos are created during on_demand sync of kickstart trees added
Updated by dkliban@redhat.com over 5 years ago
While python-requests allows the user to specify whether auto decoding should occur when reading the request data, nectar does not expose such functionality. Nectar's behaviour is hard-coded to NOT decode only files that have .gz extension[0]. In order to change that, the DownloaderConfig[1] needs to accept another option called 'stream'. When set to true, the HttpThreadedDownloader will stream raw bytes without decoding them. The streamer then needs to be updated to pass 'stream=True' to the DownloaderConfig that it creates for the downloaders.
[0] https://github.com/pulp/nectar/blob/master/nectar/downloaders/threaded.py#L255
[1] https://github.com/pulp/nectar/blob/master/nectar/config.py#L9
Updated by dkliban@redhat.com over 5 years ago
- Status changed from ASSIGNED to NEW
Updated by dkliban@redhat.com over 5 years ago
- Status changed from NEW to ASSIGNED
Updated by dkliban@redhat.com over 5 years ago
The easiest way to reproduce the problem is to on_demand sync a kickstart repo and then request an image file with 'Accept-Encoding' header set to 'gzip'. The response is supposed to be read by gzip, but as can be seen below, gzip doesn't recognize it.
pulp-admin rpm repo create --download-policy on_demand --feed https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-kickstart/ --repo-id ks
pulp-admin rpm repo sync run --repo-id ks
curl -sH 'Accept-encoding: gzip' -k -L https://localhost/pulp/repos/pulp/pulp/fixtures/rpm-kickstart/images/pxeboot/vmlinuz | gunzip -
gzip: stdin: not in gzip format
Updated by kersom over 5 years ago
- Related to Test #4628: Test pulp_streamer stream decodes responses, but sends the 'gzip' Content-Encoding header added
Updated by dkliban@redhat.com over 5 years ago
- Status changed from ASSIGNED to POST
Added by dkliban@redhat.com over 5 years ago
Added by dkliban@redhat.com over 5 years ago
Revision 48d365f1 | View on GitHub
Problem: streamer decodes data while streaming
Solution: configure nectar downloaders to not decode responses
This patch relies on a new config option for the nectar downloaders.
Updated by dkliban@redhat.com over 5 years ago
- Related to Issue #4649: Pulp 2 Nightly fails test_package_paths and test_download_policies added
Updated by dkliban@redhat.com over 5 years ago
- Status changed from POST to MODIFIED
Added by dkliban@redhat.com over 5 years ago
Revision f064a14c | View on GitHub
Problem: streamer decodes data while streaming
Solution: configure nectar downloaders to not decode responses
This patch relies on a new config option for the nectar downloaders.
re: #4603 https://pulp.plan.io/issues/4603 (cherry picked from commit 48d365f157e41b4b3ce4f0f33a0eab73ced8eea1)
Updated by dkliban@redhat.com over 5 years ago
- Status changed from MODIFIED to 5
Added by bherring over 5 years ago
Revision 7e1b98e9 | View on GitHub
Adding pulp_streamer compression check tests
The problem occurs only for content that can be compressed with gzip.
Nectar, the download library inside pulp workers, advertises that it can
receive content compressed using gzip. When the worker asks the web server
for one of the distribution files, the web server responds with the files
compressed using gzip. The worker uses nectar to decode the file and then
writes it to disk. The same nectar code is used in the streamer. When
the streamer is passing along the file, it is passing it along in the
decoded form. When the worker gets this data during the
deferred_download
task or the download
task, it is attempting to
decompress it, but the streamer had already done that.
Adding a test to verify the content being published is compressed
closes #4603
Updated by dkliban@redhat.com over 5 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Updated by dkliban@redhat.com over 5 years ago
- Has duplicate deleted (Issue #4524: Broken symlinks for subrepos are created during on_demand sync of kickstart trees)
Problem: downloader can't be configured to NOT decode
Solution: add downloader config option 'stream'
When 'stream' is True, the downloader does not decode the response. When 'stream' is False, the downloader decodes the response unless it's a file ending in .gz.
re: #4603 https://pulp.plan.io/issues/4603