Project

Profile

Help

Issue #4603

pulp_streamer streams decodes responses, but sends the 'gzip' Content-Encoding header

Added by dkliban@redhat.com 8 months ago. Updated about 2 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
2.19.1
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 51

Description

This does not occur 100% of the time, but often enough that it can be reproduced consistently.

1) Create an 'on_demand' repository pointing to a RHEL 7 kickstart repository.
2) Sync the repository.
3) Download the repository

pulp-admin repo download --verify-all --repo-id ks

As the pulp_worker was downloading the file the following was emitted by pulp_streamer:

Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] Unhandled Error
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]     Traceback (most recent call last):
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/application/app.py", line 399, in startReactor
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         self.config, oldstdout, oldstderr, self.profiler, reactor)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/application/app.py", line 312, in runReactorWithLogging
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         reactor.run()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1261, in run
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         self.mainLoop()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1270, in mainLoop
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         self.runUntilCurrent()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]     --- <exception caught here> ---
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 869, in runUntilCurrent
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         f(*a, **kw)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 236, in write
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         http.Request.write(self, data)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 1110, in write
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         self.channel.writeSequence(toChunk(data))
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]     exceptions.AttributeError: 'NoneType' object has no attribute 'writeSequence'

Then the pulp_worker emmits the following:

Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing: incorrect>
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) Traceback (most recent call last):
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088)   File "/usr/lib/python2.7/site-packages/nectar/downloaders/threaded.py", line 292, in _fetch
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088)     for chunk in chunks:
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088)   File "/usr/lib/python2.7/site-packages/requests/models.py", line 755, in generate
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088)     raise ContentDecodingError(e)
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while de>
Mar 28 21:03:26 pulp2.dev pulp[19562]: pulp.server.controllers.repository:INFO: Download of /var/lib/pulp/content/units/distribution/ea/91381d46b89bcb5fd5f595b7bc0fc7802404b3ae4899901f8fb076af0c00b2/images/pxeboot/initrd.img failed: 
('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing: incorrect header check',)).

I was also able to cause this by simply requesting the initrd.img from the on_demand repository.

This can also be reproduced with this repo0.

[0] https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-kickstart/


Related issues

Related to Pulp - Test #4628: Test pulp_streamer stream decodes responses, but sends the 'gzip' Content-Encoding header CLOSED - COMPLETE Actions
Related to Pulp - Issue #4649: Pulp 2 Nightly fails test_package_paths and test_download_policies CLOSED - WONTFIX Actions

Associated revisions

Revision 00a21c8d View on GitHub
Added by dkliban@redhat.com 8 months ago

Problem: downloader can't be configured to NOT decode

Solution: add downloader config option 'stream'

When 'stream' is True, the downloader does not decode the response.
When 'stream' is False, the downloader decodes the response unless it's a file ending in .gz.

re: #4603
https://pulp.plan.io/issues/4603

Revision 48d365f1 View on GitHub
Added by dkliban@redhat.com 8 months ago

Problem: streamer decodes data while streaming

Solution: configure nectar downloaders to not decode responses

This patch relies on a new config option for the nectar downloaders.

re: #4603
https://pulp.plan.io/issues/4603

Revision f064a14c View on GitHub
Added by dkliban@redhat.com 6 months ago

Problem: streamer decodes data while streaming

Solution: configure nectar downloaders to not decode responses

This patch relies on a new config option for the nectar downloaders.

re: #4603
https://pulp.plan.io/issues/4603
(cherry picked from commit 48d365f157e41b4b3ce4f0f33a0eab73ced8eea1)

Revision 7e1b98e9 View on GitHub
Added by bherring 6 months ago

Adding pulp_streamer compression check tests

The problem occurs only for content that can be compressed with gzip.
Nectar, the download library inside pulp workers, advertises that it can
receive content compressed using gzip. When the worker asks the web server
for one of the distribution files, the web server responds with the files
compressed using gzip. The worker uses nectar to decode the file and then
writes it to disk. The same nectar code is used in the streamer. When
the streamer is passing along the file, it is passing it along in the
decoded form. When the worker gets this data during the
``deferred_download`` task or the ``download`` task, it is attempting to
decompress it, but the streamer had already done that.

Adding a test to verify the content being published is compressed

closes #4603

History

#2 Updated by dkliban@redhat.com 8 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com

#3 Updated by CodeHeeler 8 months ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 51

#4 Updated by dkliban@redhat.com 8 months ago

  • Subject changed from pulp_streamer fails to properly stream response to pulp_streamer streams decodes responses, but sends the 'gzip' Content-Encoding header
  • Description updated (diff)

The problem occurs only for content that can be compressed with gzip. nectar, the download library inside pulp workers, advertises that it can receive content compressed using gzip. When the worker asks the web server for one of the distribution files, the web server responds with the files compressed using gzip. The worker uses nectar to decode the file and then writes it to disk. The same nectar code is used in the streamer. So when the streamer is passing along the file, it is passing it along in the decoded form. So when the worker gets this data during the 'deferred_download' task or the 'download' task, it is attempting to decompress it, but the streamer had already done that.

#5 Updated by daviddavis 8 months ago

  • Duplicated by Issue #4524: Broken symlinks for subrepos are created during on_demand sync of kickstart trees added

#6 Updated by dkliban@redhat.com 8 months ago

While python-requests allows the user to specify whether auto decoding should occur when reading the request data, nectar does not expose such functionality. Nectar's behaviour is hard-coded to NOT decode only files that have .gz extension0. In order to change that, the DownloaderConfig1 needs to accept another option called 'stream'. When set to true, the HttpThreadedDownloader will stream raw bytes without decoding them. The streamer then needs to be updated to pass 'stream=True' to the DownloaderConfig that it creates for the downloaders.

[0] https://github.com/pulp/nectar/blob/master/nectar/downloaders/threaded.py#L255
[1] https://github.com/pulp/nectar/blob/master/nectar/config.py#L9

#8 Updated by dkliban@redhat.com 8 months ago

  • Status changed from ASSIGNED to NEW

#9 Updated by dkliban@redhat.com 8 months ago

  • Status changed from NEW to ASSIGNED

#10 Updated by dkliban@redhat.com 8 months ago

The easiest way to reproduce the problem is to on_demand sync a kickstart repo and then request an image file with 'Accept-Encoding' header set to 'gzip'. The response is supposed to be read by gzip, but as can be seen below, gzip doesn't recognize it.


pulp-admin rpm repo create --download-policy on_demand --feed https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-kickstart/ --repo-id ks
pulp-admin rpm repo sync run --repo-id ks
curl -sH 'Accept-encoding: gzip' -k -L  https://localhost/pulp/repos/pulp/pulp/fixtures/rpm-kickstart/images/pxeboot/vmlinuz  | gunzip -

gzip: stdin: not in gzip format

#11 Updated by kersom 8 months ago

  • Related to Test #4628: Test pulp_streamer stream decodes responses, but sends the 'gzip' Content-Encoding header added

#13 Updated by dkliban@redhat.com 7 months ago

  • Related to Issue #4649: Pulp 2 Nightly fails test_package_paths and test_download_policies added

#14 Updated by dkliban@redhat.com 7 months ago

  • Status changed from POST to MODIFIED

#15 Updated by bmbouter 7 months ago

  • Tags Pulp 2 added

#16 Updated by dkliban@redhat.com 7 months ago

  • Platform Release set to 2.19.1

#17 Updated by dkliban@redhat.com 7 months ago

  • Sprint/Milestone set to 2.19.1

#18 Updated by dkliban@redhat.com 6 months ago

  • Status changed from MODIFIED to ON_QA

#19 Updated by dkliban@redhat.com 6 months ago

  • Status changed from ON_QA to CLOSED - CURRENTRELEASE

#20 Updated by dkliban@redhat.com 4 months ago

  • Duplicated by deleted (Issue #4524: Broken symlinks for subrepos are created during on_demand sync of kickstart trees)

Please register to edit this issue

Also available in: Atom PDF