Project

Profile

Help

Issue #1788

Updated by jcline@redhat.com about 8 years ago

Here is a broad overview of the The issue [0] [1] in question: 

 When the streamer downloads a file, it does so with the Python `requests` library. The `requests` library provides connection pooling by making use of the `urllib3` library. This library has a bug. The bug (https://github.com/kennethreitz/requests/issues/2863 and its duplicate https://github.com/kennethreitz/requests/issues/3058) is that each connection has TLS context associated with it, and the way the connections are pooled does not take this into account. To fully understand how with stale CAs, client certificates, client keys, and why this is the case I recommend reading the `urllib3` source, specifically the `PoolManager` class. 

 The correct place to fix this is in the `urllib3` and `requests` libraries. There is a pull request[2] open on `urllib3` which introduces a way to consider the other TLS context of a connection when determining the pool it is placed in. I believe once this is merged, a pull request will need to be made against `requests` that makes use of this new feature. Once that is done, these patches must be backported to the EL6 and EL7 packages. `python-urllib3` is in the base OS repository so, from what I've been told, we shouldn't package a version ourselves. An errata would need to be issued. 

 The above solution will not be trivial. It has been suggested that we attempt to work around the issue in the streamer until settings. Until such time as the fixes land in `urllib3`, `requests`, and the downstream packages of those projects. The following #2863 is a list of suggested solutions: 

 * Stop connection pooling in the streamer. The problem with this is that we will create a large number of TCP connections with the upstream servers (one per file). It is likely the servers will throttle us by either returning HTTP 503s or refusing the TCP connection. This will work slightly better than not at all, but fixed, it is not something has been proposed (in https://pulp.plan.io/issues/1771) that should be used in production (in my opinion). 

 * Stop using Nectar in the streamer. Nectar is a wrapper around `requests`, so if we switched to keep a different library we might avoid this problem. The only other serious candidate is the Twisted web client[3]. This has two downsides. The first is that we would no longer support alternate content sources with lazy sync. The second is that there may be problems with the Twisted web client we are currently unaware collection of and we will find ourselves in a similar situation after doing this. We also Sessions (which have a large range collection of versions to support (8 through 15) whereas the version pools) for each set of `requests` in RHEL is quite new due to a recent errata (2.6). 

 * Try to manage a 'pool' of sessions in the streamer. I investigated this and decided it would be very difficult to get working at all, and if TLS settings we did manage to get it functional it would take much longer. It would also be very risky. The reason for this is that we need to use locking because all this is happening inside of Twisted, and when this is combined with the need to clean up our 'pool' it is difficult to avoid either deadlocking or killing connections suddenly.  

 Most of these have a significant amount of work associated with them. 


 [0] https://github.com/kennethreitz/requests/issues/3058 
 [1] https://github.com/kennethreitz/requests/issues/2863 
 [2] https://github.com/shazow/urllib3/pull/751 
 [3] https://twistedmatrix.com/documents/current/web/howto/client.html use.

Back