Project

Profile

Help

Downloading » History » Sprint/Milestone 20

jortel@redhat.com, 08/30/2017 10:22 PM

1 1 jortel@redhat.com
# Downloading
2
3 4 jortel@redhat.com
In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction.
4 3 jortel@redhat.com
5 5 jortel@redhat.com
The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion.
6 3 jortel@redhat.com
7 1 jortel@redhat.com
## Use Cases
8
9 2 jortel@redhat.com
### Importer
10 1 jortel@redhat.com
11 5 jortel@redhat.com
As an importer, I need to download single files.
12
13 9 jortel@redhat.com
**jupiter**:
14 5 jortel@redhat.com
15 15 jortel@redhat.com
~~~python
16 6 jortel@redhat.com
download = HttpDownload(
17
    url=url,
18
    writer=FileWriter(path),
19
    timeout=Timeout(connect=10, read=15),
20
    user=User(name='elmer', password='...'),
21
    ssl=SSL(ca_certificate='path-to-certificate',
22
            client_certificate='path-to-certificate',
23
            client_key='path-to-key',
24
            validation=True),
25
    proxy_url='http://user:password@gateway.org')
26 5 jortel@redhat.com
27
try:
28
    download()
29
except DownloadError:
30
    # An error occurred.
31
else:
32
   # Go read the downloaded file \o/
33
~~~
34
35 9 jortel@redhat.com
**saturn**:
36 1 jortel@redhat.com
37 15 jortel@redhat.com
~~~python
38 6 jortel@redhat.com
ssl_context = aiohttpSSLContext()
39
ssl_context.load_cert_chain('path-to-CA_certificate')
40
ssl_context.load_cert_chain('path-to-CLIENT_certificate')
41
ssl_context.load_cert_chain('path-to-CLIENT_key')
42
43
connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context)
44
45
session = aiohttp.ClientSession(
46
    connector=connector,
47
    read_timeout=15,
48
    auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8'))
49
50
downloader_obj = HttpDownloader(
51
    session,
52
    url,
53
    proxy='http://gateway.org',
54
    proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')
55
56 5 jortel@redhat.com
downloader_coroutine = downloader_obj.run()
57
loop = asyncio._get_running_loop()
58
done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
59
for task in done:
60
    try:
61 1 jortel@redhat.com
        result = task.result()  # This is a DownloadResult
62
    except aiohttp.ClientError:
63
        # An error occurred.
64 5 jortel@redhat.com
~~~
65
66 6 jortel@redhat.com
question: How can the connect timeout be set in aiohttp?
67
68 1 jortel@redhat.com
-----
69
70 9 jortel@redhat.com
As an importer, I can leverage all settings supported by underlying protocol specific client lib.
71
72
**jupiter**:
73
74 1 jortel@redhat.com
Commonly used settings supported by abstraction. Additional settings could be supported by subclassing.
75 9 jortel@redhat.com
76 15 jortel@redhat.com
~~~python
77
78 9 jortel@redhat.com
class SpecialDownload(HttpDownload):
79
80
    def _settings(self):
81
        settings = super()._settings()
82
        settings['special'] = <special value>
83
        return settings
84
~~~
85
86
**saturn**:
87
88 10 jortel@redhat.com
The underlying client lib arguments directly exposed.
89 9 jortel@redhat.com
90
-----
91 1 jortel@redhat.com
92 10 jortel@redhat.com
As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download.
93
94 1 jortel@redhat.com
**jupiter**:
95
96 10 jortel@redhat.com
Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests.
97
98 15 jortel@redhat.com
~~~python
99
100 10 jortel@redhat.com
download = HttpDownload(..)
101 14 jortel@redhat.com
monitor = DownloadMonitor(download)
102 10 jortel@redhat.com
...  # perform download.
103 14 jortel@redhat.com
artifact = Artifact(**monitor.facts())
104 10 jortel@redhat.com
artifact.save()
105
~~~
106 1 jortel@redhat.com
107
**saturn**:
108 10 jortel@redhat.com
109
The *size* and all *digests* always calculated.
110
111 15 jortel@redhat.com
~~~python
112
113 10 jortel@redhat.com
downloader_obj = HttpDownloader(...)
114
...  # perform download.
115
result = task.result(**result.artifact_attributes)
116
artifact = Artifact()
117
artifact.save()
118
~~~
119
120 11 jortel@redhat.com
-----
121
122 1 jortel@redhat.com
As an importer, I need to download files concurrently.
123
124 11 jortel@redhat.com
**jupiter**:
125
126
Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once.
127
128 15 jortel@redhat.com
~~~python
129
130 11 jortel@redhat.com
downloads = (HttpDownload(...) for _ in range(10))
131
132
with Batch(downloads, backlog=3) as batch:
133
    for plan in batch():
134
        try:
135
            plan.result()
136
        except DownloadError:
137
            # An error occurred.
138
        else:
139 1 jortel@redhat.com
            # Use the downloaded file \o/
140
~~~
141 11 jortel@redhat.com
142
**saturn**:
143
144
Using the asyncio run loop. This example does not restrict the number of downloads in memory at once.
145 12 jortel@redhat.com
146 15 jortel@redhat.com
~~~python
147
148 16 jortel@redhat.com
downloaders = (HttpDownloader...) for _ in range(10))
149 11 jortel@redhat.com
150
loop = asyncio._get_running_loop()
151 16 jortel@redhat.com
done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloaders]))
152 11 jortel@redhat.com
for task in done:
153
    try:
154
        result = task.result()  # This is a DownloadResult
155
    except aiohttp.ClientError:
156
        # An error occurred.
157
~~~
158
159 1 jortel@redhat.com
-----
160
161 16 jortel@redhat.com
As an importer, I want to validate downloaded files.
162
163 1 jortel@redhat.com
**jupiter**:
164
165 17 jortel@redhat.com
Supported by adding provided or custom validations to the download. A validation error raises *ValidationError* which IsA *DownloadError*.
166
167 16 jortel@redhat.com
~~~python
168
169
download = HttpDownload(...)
170
download.append(DigestValidation('sha256', '0x1234'))
171
172
try:
173
    download()
174
except DownloadError:
175
    # An error occurred.
176
~~~
177
178
**saturn**:
179
180 17 jortel@redhat.com
Supported by passing the *expected_digests* dictionary and catching *DigestValidationError*.
181 16 jortel@redhat.com
182
~~~python
183
184
downloader_obj = HttpDownloader(..., expected_digests={'sha256': '0x1234'})
185
186
downloader_coroutine = downloader_obj.run()
187
loop = asyncio._get_running_loop()
188
done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
189
for task in done:
190
    try:
191
        result = task.result()  # This is a DownloadResult
192
    except (aiohttp.ClientError, DigestValidationError):
193
        # An error occurred.
194
~~~
195
196
-----
197
198 18 jortel@redhat.com
As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading.
199
200
**jupiter**:
201
202
~~~python
203
~~~
204
205
**saturn**:
206
207
~~~python
208
~~~
209
210
-----
211
212
As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory.
213
214
**jupiter**:
215
216 19 jortel@redhat.com
Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once.
217
218 18 jortel@redhat.com
~~~python
219 19 jortel@redhat.com
220
downloads = (HttpDownload(...) for _ in range(10))
221
222
with Batch(downloads, backlog=3) as batch:
223
    for plan in batch():
224
        try:
225
            plan.result()
226
        except DownloadError:
227
            # An error occurred.
228
        else:
229
            # Use the downloaded file \o/
230 18 jortel@redhat.com
~~~
231
232
**saturn**:
233
234 19 jortel@redhat.com
Using the GroupDownloader?
235
236 18 jortel@redhat.com
~~~python
237
~~~
238
239
-----
240
241
As an importer, I can perform concurrent downloading using a synchronous pattern.
242
243 1 jortel@redhat.com
**jupiter**:
244 18 jortel@redhat.com
245 19 jortel@redhat.com
Using the *Batch*. See other examples.
246 18 jortel@redhat.com
247
**saturn**:
248
249 19 jortel@redhat.com
Using either the *GroupDownloader* or asyncio loop directly. See other examples.
250 18 jortel@redhat.com
251 1 jortel@redhat.com
-----
252
253 18 jortel@redhat.com
As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads.
254 1 jortel@redhat.com
255 18 jortel@redhat.com
**jupiter**:
256
257 20 jortel@redhat.com
The Download.context is designed to support this. The *shared* context can be used to safely share anything This includes python-requests sessions (using a Cache), auth tokens and resolved mirror lists. The sharing is done through collaboration. When it's appropriate for individual downloads to share things, an external actor like the Batch or the Streamer will ensure that all of the download  
258 19 jortel@redhat.com
objects have the same context.
259 18 jortel@redhat.com
260
**saturn**:
261
262 19 jortel@redhat.com
Each downloader could define a class attribute. This global can be used share anything. This includes python-requests sessions (using a Cache), auth tokens and resolved mirror lists. The sharing is done through collaboration. Sharing is global and unconditional.
263 1 jortel@redhat.com
264 20 jortel@redhat.com
Question: how will thread safety be provided? The streamer will have multiple twisted threads using these downloaders.
265
266 18 jortel@redhat.com
-----
267
268
As an importer I can customize how downloading is performed. For example, to support mirror lists
269
270
**jupiter**:
271 1 jortel@redhat.com
272 20 jortel@redhat.com
All download objects can be customized in one of two ways. First, by registering an event handler for well defined events. And, second by subclassing.
273
274 18 jortel@redhat.com
~~~python
275
~~~
276
277
**saturn**:
278
279
~~~python
280
~~~
281
282
-----
283
284
As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections.
285 1 jortel@redhat.com
286 18 jortel@redhat.com
**jupiter**:
287 1 jortel@redhat.com
288 20 jortel@redhat.com
This is supported by sharing connection pools and limiting the total number of downloads in progress concurrently. See resource sharing and concurrency limiting use cases.
289 18 jortel@redhat.com
290
**saturn**:
291
292 20 jortel@redhat.com
This is supported by sharing connection pools and limiting the total number of downloads in progress concurrently. See resource sharing and concurrency limiting use cases.
293 18 jortel@redhat.com
294
-----
295
296
As an importer, I can terminate concurrent downlading at any point and not leak resources.
297
298
**jupiter**:
299
300
~~~python
301
~~~
302
303
**saturn**:
304
305
~~~python
306
~~~
307
308
-----
309
310 1 jortel@redhat.com
As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP.
311
312 18 jortel@redhat.com
**jupiter**:
313
314
~~~python
315
~~~
316
317
**saturn**:
318
319
~~~python
320
~~~
321
322
-----
323
324 1 jortel@redhat.com
### Streamer
325
326 18 jortel@redhat.com
As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box.
327 1 jortel@redhat.com
328 18 jortel@redhat.com
**jupiter**:
329
330
~~~python
331
~~~
332
333
**saturn**:
334
335
~~~python
336
~~~
337
338 1 jortel@redhat.com
-----
339
340 18 jortel@redhat.com
As the streamer, I can download using any protocol supported by the importer.
341
342
**jupiter**:
343
344
~~~python
345 16 jortel@redhat.com
~~~
346 18 jortel@redhat.com
347
**saturn**:
348
349
~~~python
350 1 jortel@redhat.com
~~~
351
352 18 jortel@redhat.com
-----
353
354
As the streamer, I want to validate downloaded files.
355
356 1 jortel@redhat.com
**jupiter**:
357
358
~~~python
359
~~~
360
361
**saturn**:
362
363
~~~python
364
~~~
365 18 jortel@redhat.com
366
-----
367
368
As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things.
369
370
**jupiter**:
371
372
~~~python
373
~~~
374
375
**saturn**:
376
377
~~~python
378
~~~
379
380
-----
381
382
As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer.
383
384
**jupiter**:
385
386
~~~python
387
~~~
388
389
**saturn**:
390
391
~~~python
392
~~~
393
394
-----
395
396
As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk.
397
398
**jupiter**:
399
400
~~~python
401
~~~
402
403
**saturn**:
404
405
~~~python
406
~~~
407
408
-----
409
410
As the streamer, I need to forward HTTP headers from the download response to the twisted response.
411
412
**jupiter**:
413
414
~~~python
415
~~~
416
417
**saturn**:
418
419
~~~python
420
~~~
421
422
-----
423
424
As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists
425
426
**jupiter**:
427
428
~~~python
429
~~~
430
431
**saturn**:
432
433
~~~python
434
~~~
435
436
-----