Occasional HTTP 403 errors when downloading rpms from on_demand / lazy sync repositories
The streamer is resulting in occasional 403s.
The patched version of the HTTPAdaptor included in the streamer package basically replicates the upstream patch that has been accepted and merged to python-requests proposed/3.0.0. However, the patch to python-requests only addresses part of the problem. It propagates settings to the urllib3 PoolManager. However, the criteria (pool-key) used by the PoolManager to select a connection pool only includes (scheme, host, port). This means connections will be reused even when different certificates are specified. As of this PR, urllib3 was patched to have custom pool keys which (for ssl) includes the certificate path. Good effort. As a result, even when using the latest patched urllib3 (in F25) connections with different certificate paths will not reuse connections when different certificates are specified.
The latest patches to python-requests and urllib3 cannot be used as is with pulp2 because of how Pulp/Nectar handles certificates. Pulp stores certificates in the DB and passes to nectar as PEM strings. Nectar writes them to a temporary file and passes the path to requests. The certificate path will be different every time. This will cause the urllib3 PoolManager to create a new pool for every request which results in never sharing connections.
EL7 has urllib3 1.10
F25 has urllib3 1.15
#1 Updated by firstname.lastname@example.org almost 5 years ago
The urllib3 (as found in 1.15) PoolManager may already be doing a reasonable thing by comparing the cert/key paths so not convinced submitting an patch to urllib3 changing the HTTPSPoolKey to read the file and using the actual certificates when compared would be an improvement (but might be). Even if we did and they accepted, getting all the upstream patches on EL6/7 won't happen anytime soon. This would work for pulp3 assuming the PR for storing certificates on disk instead of DB and PR for download lib is merged because we would no longer be passing temporary file paths to python-requests.
A possible solution:
Create a SessionPool that caches sessions based on:
- proxy settings
This seems cleaner and less brittle than patching things with an adapter subclass.
This would compensate for the current flaws in both python-requests and urllib3. And cert handling behavior in nectar and pulp. This would eliminate the need for the streamer/requests module which was an attempt to resolve this problem.
To address a separate but related problem, the pool could also store referenced certificates in /var/cache/streamer/certificates to ensure the sessions cached in the pool would not reference a deleted certificate. As would be the case with nectar writing and passing temporary files. Or, in pulp3 an importer being removed. The content of /var/cache/streamer/certificates would be cleared when by the SessionPool on streamer startup to prevent leaking certificate files.
Unused cached sessions should be purged after a specified about of time (expected cache behavior).
Even though, the question of python-requests and urllib3 being thread safe has not been settled, we could start by storing the cached sessions in a SessionPool instance attribute protected by an RLock. If thread safety becomes an issue or cannot be confirmed, this could be easily moved to a thread local. I do not think there would be any significant impact on performance if connections were not reused between threads. But we'd want to think though that. I have already confirmed that Twisted uses a thread pool.
#3 Updated by mhrivnak almost 5 years ago
This generally sounds good to me. Some quick thoughts:
We don't need anything fancy for evicting expired entries on a time period. We could just to an eviction sweep every time an item is being retrieved from the cache. If sessions sit around unused for a long time before they get noticed, that's no problem.
We identified some time ago when exploring this idea that it's ok for Session objects to simply fall out of scope and get garbage collected. We do that all the time already in other places, so there's no need to do anything to "close" or clean up a session. Just remove references to it.
I lean toward allowing Sessions to be shared by threads, since "requests" claims they are thread-safe. We can do further isolation if necessary later.