Project

Profile

Help

Issue #4603

closed

pulp_streamer streams decodes responses, but sends the 'gzip' Content-Encoding header

Added by dkliban@redhat.com almost 5 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
2.19.1
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 51
Quarter:

Description

This does not occur 100% of the time, but often enough that it can be reproduced consistently.

1) Create an 'on_demand' repository pointing to a RHEL 7 kickstart repository.
2) Sync the repository.
3) Download the repository

pulp-admin repo download --verify-all --repo-id ks

As the pulp_worker was downloading the file the following was emitted by pulp_streamer:

Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-] Unhandled Error
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]     Traceback (most recent call last):
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/application/app.py", line 399, in startReactor
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         self.config, oldstdout, oldstderr, self.profiler, reactor)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/application/app.py", line 312, in runReactorWithLogging
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         reactor.run()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1261, in run
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         self.mainLoop()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1270, in mainLoop
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         self.runUntilCurrent()
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]     --- <exception caught here> ---
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 869, in runUntilCurrent
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         f(*a, **kw)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 236, in write
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         http.Request.write(self, data)
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]       File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 1110, in write
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]         self.channel.writeSequence(toChunk(data))
Mar 28 21:03:26 pulp2.dev pulp_streamer[17983]: [-]     exceptions.AttributeError: 'NoneType' object has no attribute 'writeSequence'

Then the pulp_worker emmits the following:

Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing: incorrect>
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) Traceback (most recent call last):
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088)   File "/usr/lib/python2.7/site-packages/nectar/downloaders/threaded.py", line 292, in _fetch
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088)     for chunk in chunks:
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088)   File "/usr/lib/python2.7/site-packages/requests/models.py", line 755, in generate
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088)     raise ContentDecodingError(e)
Mar 28 21:03:26 pulp2.dev pulp[19562]: nectar.downloaders.threaded:ERROR: (19562-49088) ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while de>
Mar 28 21:03:26 pulp2.dev pulp[19562]: pulp.server.controllers.repository:INFO: Download of /var/lib/pulp/content/units/distribution/ea/91381d46b89bcb5fd5f595b7bc0fc7802404b3ae4899901f8fb076af0c00b2/images/pxeboot/initrd.img failed: 
('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing: incorrect header check',)).

I was also able to cause this by simply requesting the initrd.img from the on_demand repository.

This can also be reproduced with this repo[0].

[0] https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-kickstart/


Related issues

Related to Pulp - Test #4628: Test pulp_streamer stream decodes responses, but sends the 'gzip' Content-Encoding headerCLOSED - COMPLETEbherringActions
Related to Pulp - Issue #4649: Pulp 2 Nightly fails test_package_paths and test_download_policiesCLOSED - WONTFIXbherringActions
Actions #2

Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com
Actions #3

Updated by CodeHeeler almost 5 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 51
Actions #4

Updated by dkliban@redhat.com almost 5 years ago

  • Subject changed from pulp_streamer fails to properly stream response to pulp_streamer streams decodes responses, but sends the 'gzip' Content-Encoding header
  • Description updated (diff)

The problem occurs only for content that can be compressed with gzip. nectar, the download library inside pulp workers, advertises that it can receive content compressed using gzip. When the worker asks the web server for one of the distribution files, the web server responds with the files compressed using gzip. The worker uses nectar to decode the file and then writes it to disk. The same nectar code is used in the streamer. So when the streamer is passing along the file, it is passing it along in the decoded form. So when the worker gets this data during the 'deferred_download' task or the 'download' task, it is attempting to decompress it, but the streamer had already done that.

Actions #5

Updated by daviddavis almost 5 years ago

  • Has duplicate Issue #4524: Broken symlinks for subrepos are created during on_demand sync of kickstart trees added
Actions #6

Updated by dkliban@redhat.com almost 5 years ago

While python-requests allows the user to specify whether auto decoding should occur when reading the request data, nectar does not expose such functionality. Nectar's behaviour is hard-coded to NOT decode only files that have .gz extension[0]. In order to change that, the DownloaderConfig[1] needs to accept another option called 'stream'. When set to true, the HttpThreadedDownloader will stream raw bytes without decoding them. The streamer then needs to be updated to pass 'stream=True' to the DownloaderConfig that it creates for the downloaders.

[0] https://github.com/pulp/nectar/blob/master/nectar/downloaders/threaded.py#L255
[1] https://github.com/pulp/nectar/blob/master/nectar/config.py#L9

Actions #8

Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from ASSIGNED to NEW
Actions #9

Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from NEW to ASSIGNED
Actions #10

Updated by dkliban@redhat.com almost 5 years ago

The easiest way to reproduce the problem is to on_demand sync a kickstart repo and then request an image file with 'Accept-Encoding' header set to 'gzip'. The response is supposed to be read by gzip, but as can be seen below, gzip doesn't recognize it.


pulp-admin rpm repo create --download-policy on_demand --feed https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-kickstart/ --repo-id ks
pulp-admin rpm repo sync run --repo-id ks
curl -sH 'Accept-encoding: gzip' -k -L  https://localhost/pulp/repos/pulp/pulp/fixtures/rpm-kickstart/images/pxeboot/vmlinuz  | gunzip -

gzip: stdin: not in gzip format
Actions #11

Updated by kersom almost 5 years ago

  • Related to Test #4628: Test pulp_streamer stream decodes responses, but sends the 'gzip' Content-Encoding header added

Added by dkliban@redhat.com almost 5 years ago

Revision 00a21c8d | View on GitHub

Problem: downloader can't be configured to NOT decode

Solution: add downloader config option 'stream'

When 'stream' is True, the downloader does not decode the response. When 'stream' is False, the downloader decodes the response unless it's a file ending in .gz.

re: #4603 https://pulp.plan.io/issues/4603

Added by dkliban@redhat.com almost 5 years ago

Revision 48d365f1 | View on GitHub

Problem: streamer decodes data while streaming

Solution: configure nectar downloaders to not decode responses

This patch relies on a new config option for the nectar downloaders.

re: #4603 https://pulp.plan.io/issues/4603

Actions #13

Updated by dkliban@redhat.com almost 5 years ago

  • Related to Issue #4649: Pulp 2 Nightly fails test_package_paths and test_download_policies added
Actions #14

Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from POST to MODIFIED
Actions #15

Updated by bmbouter almost 5 years ago

  • Tags Pulp 2 added
Actions #16

Updated by dkliban@redhat.com almost 5 years ago

  • Platform Release set to 2.19.1
Actions #17

Updated by dkliban@redhat.com almost 5 years ago

  • Sprint/Milestone set to 2.19.1

Added by dkliban@redhat.com almost 5 years ago

Revision f064a14c | View on GitHub

Problem: streamer decodes data while streaming

Solution: configure nectar downloaders to not decode responses

This patch relies on a new config option for the nectar downloaders.

re: #4603 https://pulp.plan.io/issues/4603 (cherry picked from commit 48d365f157e41b4b3ce4f0f33a0eab73ced8eea1)

Actions #18

Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from MODIFIED to 5

Added by bherring almost 5 years ago

Revision 7e1b98e9 | View on GitHub

Adding pulp_streamer compression check tests

The problem occurs only for content that can be compressed with gzip. Nectar, the download library inside pulp workers, advertises that it can receive content compressed using gzip. When the worker asks the web server for one of the distribution files, the web server responds with the files compressed using gzip. The worker uses nectar to decode the file and then writes it to disk. The same nectar code is used in the streamer. When the streamer is passing along the file, it is passing it along in the decoded form. When the worker gets this data during the deferred_download task or the download task, it is attempting to decompress it, but the streamer had already done that.

Adding a test to verify the content being published is compressed

closes #4603

Actions #19

Updated by dkliban@redhat.com almost 5 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #20

Updated by dkliban@redhat.com over 4 years ago

  • Has duplicate deleted (Issue #4524: Broken symlinks for subrepos are created during on_demand sync of kickstart trees)

Also available in: Atom PDF