Project

Profile

Help

Downloading » History » Sprint/Milestone 21

jortel@redhat.com, 09/01/2017 03:06 PM

1 1 jortel@redhat.com
# Downloading
2
3 4 jortel@redhat.com
In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction.
4 3 jortel@redhat.com
5 5 jortel@redhat.com
The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion.
6 3 jortel@redhat.com
7 21 jortel@redhat.com
## Design Goals & Constraints
8
9
**juniper**:
10
11
  - constraints:
12
13
>   - object oriented
14
>   - support semantic versioning
15
16
  - goals
17
18
>   - encapsulate underlying technologies
19
>   - consistent interface across downloads. standard arguments, return values and raised exceptions.
20
>   - delegation pattern for common customization:
21
>
22
>>   - handling of downloaded bits to *Writers*
23
>>   - validation delegated to *Validations*
24
>>   - optional digest and size calculation delegated to *DownloadMonitor*
25
>>   - error handling delegated to *Event* handlers.
26
>
27
>   - external participation of download process through defined event registration and callback.
28
>   - delegate concurrency to standard lib (*concurrent.futures*).
29
>   - delegate protocol implementation to client libs.
30
31
**saturn**:
32
33
  - constraints:
34
35
>   - object oriented
36
>   - support semantic versioning
37
38
  - goals
39
40
>   - direct exposure of client libs.
41
>   - minimum encapsulation of underlying technologies.
42
>   - minimum \# of first class concepts (classes) and abstractions.
43
>   - minimum \# lines of code to maintain.
44
>   - delegate concurrency to standard lib (*asyncio*).
45
>   - delegate protocol implementation to client libs.
46
47 1 jortel@redhat.com
## Use Cases
48
49 2 jortel@redhat.com
### Importer
50 1 jortel@redhat.com
51 5 jortel@redhat.com
As an importer, I need to download single files.
52
53 9 jortel@redhat.com
**jupiter**:
54 5 jortel@redhat.com
55 15 jortel@redhat.com
~~~python
56 6 jortel@redhat.com
download = HttpDownload(
57
    url=url,
58
    writer=FileWriter(path),
59
    timeout=Timeout(connect=10, read=15),
60
    user=User(name='elmer', password='...'),
61
    ssl=SSL(ca_certificate='path-to-certificate',
62
            client_certificate='path-to-certificate',
63
            client_key='path-to-key',
64
            validation=True),
65
    proxy_url='http://user:password@gateway.org')
66 5 jortel@redhat.com
67
try:
68
    download()
69
except DownloadError:
70
    # An error occurred.
71
else:
72
   # Go read the downloaded file \o/
73
~~~
74
75 9 jortel@redhat.com
**saturn**:
76 1 jortel@redhat.com
77 15 jortel@redhat.com
~~~python
78 6 jortel@redhat.com
ssl_context = aiohttpSSLContext()
79
ssl_context.load_cert_chain('path-to-CA_certificate')
80
ssl_context.load_cert_chain('path-to-CLIENT_certificate')
81
ssl_context.load_cert_chain('path-to-CLIENT_key')
82
83
connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context)
84
85
session = aiohttp.ClientSession(
86
    connector=connector,
87
    read_timeout=15,
88
    auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8'))
89
90
downloader_obj = HttpDownloader(
91
    session,
92
    url,
93
    proxy='http://gateway.org',
94
    proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')
95
96 5 jortel@redhat.com
downloader_coroutine = downloader_obj.run()
97
loop = asyncio._get_running_loop()
98
done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
99
for task in done:
100
    try:
101 1 jortel@redhat.com
        result = task.result()  # This is a DownloadResult
102
    except aiohttp.ClientError:
103
        # An error occurred.
104 5 jortel@redhat.com
~~~
105
106 6 jortel@redhat.com
question: How can the connect timeout be set in aiohttp?
107
108 1 jortel@redhat.com
-----
109
110 9 jortel@redhat.com
As an importer, I can leverage all settings supported by underlying protocol specific client lib.
111
112
**jupiter**:
113
114 1 jortel@redhat.com
Commonly used settings supported by abstraction. Additional settings could be supported by subclassing.
115 9 jortel@redhat.com
116 15 jortel@redhat.com
~~~python
117
118 9 jortel@redhat.com
class SpecialDownload(HttpDownload):
119
120
    def _settings(self):
121
        settings = super()._settings()
122
        settings['special'] = <special value>
123
        return settings
124
~~~
125
126
**saturn**:
127
128 10 jortel@redhat.com
The underlying client lib arguments directly exposed.
129 9 jortel@redhat.com
130
-----
131 1 jortel@redhat.com
132 10 jortel@redhat.com
As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download.
133
134 1 jortel@redhat.com
**jupiter**:
135
136 10 jortel@redhat.com
Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests.
137
138 15 jortel@redhat.com
~~~python
139
140 10 jortel@redhat.com
download = HttpDownload(..)
141 14 jortel@redhat.com
monitor = DownloadMonitor(download)
142 10 jortel@redhat.com
...  # perform download.
143 14 jortel@redhat.com
artifact = Artifact(**monitor.facts())
144 10 jortel@redhat.com
artifact.save()
145
~~~
146 1 jortel@redhat.com
147
**saturn**:
148 10 jortel@redhat.com
149
The *size* and all *digests* always calculated.
150
151 15 jortel@redhat.com
~~~python
152
153 10 jortel@redhat.com
downloader_obj = HttpDownloader(...)
154
...  # perform download.
155
result = task.result(**result.artifact_attributes)
156
artifact = Artifact()
157
artifact.save()
158
~~~
159
160 11 jortel@redhat.com
-----
161
162 1 jortel@redhat.com
As an importer, I need to download files concurrently.
163
164 11 jortel@redhat.com
**jupiter**:
165
166
Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once.
167
168 15 jortel@redhat.com
~~~python
169
170 11 jortel@redhat.com
downloads = (HttpDownload(...) for _ in range(10))
171
172
with Batch(downloads, backlog=3) as batch:
173
    for plan in batch():
174
        try:
175
            plan.result()
176
        except DownloadError:
177
            # An error occurred.
178
        else:
179 1 jortel@redhat.com
            # Use the downloaded file \o/
180
~~~
181 11 jortel@redhat.com
182
**saturn**:
183
184
Using the asyncio run loop. This example does not restrict the number of downloads in memory at once.
185 12 jortel@redhat.com
186 15 jortel@redhat.com
~~~python
187
188 16 jortel@redhat.com
downloaders = (HttpDownloader...) for _ in range(10))
189 11 jortel@redhat.com
190
loop = asyncio._get_running_loop()
191 16 jortel@redhat.com
done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloaders]))
192 11 jortel@redhat.com
for task in done:
193
    try:
194
        result = task.result()  # This is a DownloadResult
195
    except aiohttp.ClientError:
196
        # An error occurred.
197
~~~
198
199 1 jortel@redhat.com
-----
200
201 16 jortel@redhat.com
As an importer, I want to validate downloaded files.
202
203 1 jortel@redhat.com
**jupiter**:
204
205 17 jortel@redhat.com
Supported by adding provided or custom validations to the download. A validation error raises *ValidationError* which IsA *DownloadError*.
206
207 16 jortel@redhat.com
~~~python
208
209
download = HttpDownload(...)
210
download.append(DigestValidation('sha256', '0x1234'))
211
212
try:
213
    download()
214
except DownloadError:
215
    # An error occurred.
216
~~~
217
218
**saturn**:
219
220 17 jortel@redhat.com
Supported by passing the *expected_digests* dictionary and catching *DigestValidationError*.
221 16 jortel@redhat.com
222
~~~python
223
224
downloader_obj = HttpDownloader(..., expected_digests={'sha256': '0x1234'})
225
226
downloader_coroutine = downloader_obj.run()
227
loop = asyncio._get_running_loop()
228
done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
229
for task in done:
230
    try:
231
        result = task.result()  # This is a DownloadResult
232
    except (aiohttp.ClientError, DigestValidationError):
233
        # An error occurred.
234
~~~
235
236
-----
237
238 18 jortel@redhat.com
As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading.
239
240
**jupiter**:
241
242
~~~python
243
~~~
244
245
**saturn**:
246
247
~~~python
248
~~~
249
250
-----
251
252
As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory.
253
254
**jupiter**:
255
256 19 jortel@redhat.com
Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once.
257
258 18 jortel@redhat.com
~~~python
259 19 jortel@redhat.com
260
downloads = (HttpDownload(...) for _ in range(10))
261
262
with Batch(downloads, backlog=3) as batch:
263
    for plan in batch():
264
        try:
265
            plan.result()
266
        except DownloadError:
267
            # An error occurred.
268
        else:
269
            # Use the downloaded file \o/
270 18 jortel@redhat.com
~~~
271
272
**saturn**:
273
274 19 jortel@redhat.com
Using the GroupDownloader?
275
276 18 jortel@redhat.com
~~~python
277
~~~
278
279
-----
280
281
As an importer, I can perform concurrent downloading using a synchronous pattern.
282
283 1 jortel@redhat.com
**jupiter**:
284 18 jortel@redhat.com
285 19 jortel@redhat.com
Using the *Batch*. See other examples.
286 18 jortel@redhat.com
287
**saturn**:
288
289 19 jortel@redhat.com
Using either the *GroupDownloader* or asyncio loop directly. See other examples.
290 18 jortel@redhat.com
291 1 jortel@redhat.com
-----
292
293 18 jortel@redhat.com
As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads.
294 1 jortel@redhat.com
295 18 jortel@redhat.com
**jupiter**:
296
297 20 jortel@redhat.com
The Download.context is designed to support this. The *shared* context can be used to safely share anything This includes python-requests sessions (using a Cache), auth tokens and resolved mirror lists. The sharing is done through collaboration. When it's appropriate for individual downloads to share things, an external actor like the Batch or the Streamer will ensure that all of the download  
298 19 jortel@redhat.com
objects have the same context.
299 18 jortel@redhat.com
300
**saturn**:
301
302 19 jortel@redhat.com
Each downloader could define a class attribute. This global can be used share anything. This includes python-requests sessions (using a Cache), auth tokens and resolved mirror lists. The sharing is done through collaboration. Sharing is global and unconditional.
303 1 jortel@redhat.com
304 20 jortel@redhat.com
Question: how will thread safety be provided? The streamer will have multiple twisted threads using these downloaders.
305
306 18 jortel@redhat.com
-----
307
308
As an importer I can customize how downloading is performed. For example, to support mirror lists
309
310
**jupiter**:
311 1 jortel@redhat.com
312 20 jortel@redhat.com
All download objects can be customized in one of two ways. First, by registering an event handler for well defined events. And, second by subclassing.
313
314 18 jortel@redhat.com
~~~python
315
~~~
316
317
**saturn**:
318
319
~~~python
320
~~~
321
322
-----
323
324
As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections.
325 1 jortel@redhat.com
326 18 jortel@redhat.com
**jupiter**:
327 1 jortel@redhat.com
328 20 jortel@redhat.com
This is supported by sharing connection pools and limiting the total number of downloads in progress concurrently. See resource sharing and concurrency limiting use cases.
329 18 jortel@redhat.com
330
**saturn**:
331
332 20 jortel@redhat.com
This is supported by sharing connection pools and limiting the total number of downloads in progress concurrently. See resource sharing and concurrency limiting use cases.
333 18 jortel@redhat.com
334
-----
335
336
As an importer, I can terminate concurrent downlading at any point and not leak resources.
337
338
**jupiter**:
339
340
~~~python
341
~~~
342
343
**saturn**:
344
345
~~~python
346
~~~
347
348
-----
349
350 1 jortel@redhat.com
As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP.
351
352 18 jortel@redhat.com
**jupiter**:
353
354
~~~python
355
~~~
356
357
**saturn**:
358
359
~~~python
360
~~~
361
362
-----
363
364 1 jortel@redhat.com
### Streamer
365
366 18 jortel@redhat.com
As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box.
367 1 jortel@redhat.com
368 18 jortel@redhat.com
**jupiter**:
369
370
~~~python
371
~~~
372
373
**saturn**:
374
375
~~~python
376
~~~
377
378 1 jortel@redhat.com
-----
379
380 18 jortel@redhat.com
As the streamer, I can download using any protocol supported by the importer.
381
382
**jupiter**:
383
384
~~~python
385 16 jortel@redhat.com
~~~
386 18 jortel@redhat.com
387
**saturn**:
388
389
~~~python
390 1 jortel@redhat.com
~~~
391
392 18 jortel@redhat.com
-----
393
394
As the streamer, I want to validate downloaded files.
395
396 1 jortel@redhat.com
**jupiter**:
397
398
~~~python
399
~~~
400
401
**saturn**:
402
403
~~~python
404
~~~
405 18 jortel@redhat.com
406
-----
407
408
As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things.
409
410
**jupiter**:
411
412
~~~python
413
~~~
414
415
**saturn**:
416
417
~~~python
418
~~~
419
420
-----
421
422
As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer.
423
424
**jupiter**:
425
426
~~~python
427
~~~
428
429
**saturn**:
430
431
~~~python
432
~~~
433
434
-----
435
436
As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk.
437
438
**jupiter**:
439
440
~~~python
441
~~~
442
443
**saturn**:
444
445
~~~python
446
~~~
447
448
-----
449
450
As the streamer, I need to forward HTTP headers from the download response to the twisted response.
451
452
**jupiter**:
453
454
~~~python
455
~~~
456
457
**saturn**:
458
459
~~~python
460
~~~
461
462
-----
463
464
As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists
465
466
**jupiter**:
467
468
~~~python
469
~~~
470
471
**saturn**:
472
473
~~~python
474
~~~
475
476
-----