Project

Profile

Help

Issue #1788

closed

The pulp_streamer is subject to requests issue #2863

Added by jcline@redhat.com about 8 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Urgent
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Master
Platform Release:
2.8.7
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 5
Quarter:

Description

Here is a broad overview of the issue [0] [1] in question:

When the streamer downloads a file, it does so with the Python `requests` library. The `requests` library provides connection pooling by making use of the `urllib3` library. This library has a bug. The bug is that each connection has TLS context associated with it, and the way the connections are pooled does not take this into account. To fully understand how and why this is the case I recommend reading the `urllib3` source, specifically the `PoolManager` class.

The correct place to fix this is in the `urllib3` and `requests` libraries. There is a pull request[2] open on `urllib3` which introduces a way to consider the TLS context of a connection when determining the pool it is placed in. I believe once this is merged, a pull request will need to be made against `requests` that makes use of this new feature. Once that is done, these patches must be backported to the EL6 and EL7 packages. `python-urllib3` is in the base OS repository so, from what I've been told, we shouldn't package a version ourselves. An errata would need to be issued.

The above solution will not be trivial. It has been suggested that we attempt to work around the issue in the streamer until such time as the fixes land in `urllib3`, `requests`, and the downstream packages of those projects. The following is a list of suggested solutions:

  • Stop connection pooling in the streamer. The problem with this is that we will create a large number of TCP connections with the upstream servers (one per file). It is likely the servers will throttle us by either returning HTTP 503s or refusing the TCP connection. This will work slightly better than not at all, but it is not something that should be used in production (in my opinion).
  • Stop using Nectar in the streamer. Nectar is a wrapper around `requests`, so if we switched to a different library we might avoid this problem. The only other serious candidate is the Twisted web client[3]. This has two downsides. The first is that we would no longer support alternate content sources with lazy sync. The second is that there may be problems with the Twisted web client we are currently unaware of and we will find ourselves in a similar situation after doing this. We also have a large range of versions to support (8 through 15) whereas the version of `requests` in RHEL is quite new due to a recent errata (2.6).
  • Try to manage a 'pool' of sessions in the streamer. I investigated this and decided it would be very difficult to get working at all, and if we did manage to get it functional it would take much longer. It would also be very risky. The reason for this is that we need to use locking because all this is happening inside of Twisted, and when this is combined with the need to clean up our 'pool' it is difficult to avoid either deadlocking or killing connections suddenly.

Most of these have a significant amount of work associated with them.

[0] https://github.com/kennethreitz/requests/issues/3058
[1] https://github.com/kennethreitz/requests/issues/2863
[2] https://github.com/shazow/urllib3/pull/751
[3] https://twistedmatrix.com/documents/current/web/howto/client.html

Also available in: Atom PDF