Project

Profile

Help

Downloading » History » Sprint/Milestone 18

jortel@redhat.com, 08/30/2017 05:57 PM

1 1 jortel@redhat.com
# Downloading
2
3 4 jortel@redhat.com
In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction.
4 3 jortel@redhat.com
5 5 jortel@redhat.com
The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion.
6 3 jortel@redhat.com
7 1 jortel@redhat.com
## Use Cases
8
9 2 jortel@redhat.com
### Importer
10 1 jortel@redhat.com
11 5 jortel@redhat.com
As an importer, I need to download single files.
12
13 9 jortel@redhat.com
**jupiter**:
14 5 jortel@redhat.com
15 15 jortel@redhat.com
~~~python
16 6 jortel@redhat.com
download = HttpDownload(
17
    url=url,
18
    writer=FileWriter(path),
19
    timeout=Timeout(connect=10, read=15),
20
    user=User(name='elmer', password='...'),
21
    ssl=SSL(ca_certificate='path-to-certificate',
22
            client_certificate='path-to-certificate',
23
            client_key='path-to-key',
24
            validation=True),
25
    proxy_url='http://user:password@gateway.org')
26 5 jortel@redhat.com
27
try:
28
    download()
29
except DownloadError:
30
    # An error occurred.
31
else:
32
   # Go read the downloaded file \o/
33
~~~
34
35 9 jortel@redhat.com
**saturn**:
36 1 jortel@redhat.com
37 15 jortel@redhat.com
~~~python
38 6 jortel@redhat.com
ssl_context = aiohttpSSLContext()
39
ssl_context.load_cert_chain('path-to-CA_certificate')
40
ssl_context.load_cert_chain('path-to-CLIENT_certificate')
41
ssl_context.load_cert_chain('path-to-CLIENT_key')
42
43
connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context)
44
45
session = aiohttp.ClientSession(
46
    connector=connector,
47
    read_timeout=15,
48
    auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8'))
49
50
downloader_obj = HttpDownloader(
51
    session,
52
    url,
53
    proxy='http://gateway.org',
54
    proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')
55
56 5 jortel@redhat.com
downloader_coroutine = downloader_obj.run()
57
loop = asyncio._get_running_loop()
58
done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
59
for task in done:
60
    try:
61 1 jortel@redhat.com
        result = task.result()  # This is a DownloadResult
62
    except aiohttp.ClientError:
63
        # An error occurred.
64 5 jortel@redhat.com
~~~
65
66 6 jortel@redhat.com
question: How can the connect timeout be set in aiohttp?
67
68 1 jortel@redhat.com
-----
69
70 9 jortel@redhat.com
As an importer, I can leverage all settings supported by underlying protocol specific client lib.
71
72
**jupiter**:
73
74 1 jortel@redhat.com
Commonly used settings supported by abstraction. Additional settings could be supported by subclassing.
75 9 jortel@redhat.com
76 15 jortel@redhat.com
~~~python
77
78 9 jortel@redhat.com
class SpecialDownload(HttpDownload):
79
80
    def _settings(self):
81
        settings = super()._settings()
82
        settings['special'] = <special value>
83
        return settings
84
~~~
85
86
**saturn**:
87
88 10 jortel@redhat.com
The underlying client lib arguments directly exposed.
89 9 jortel@redhat.com
90
-----
91 1 jortel@redhat.com
92 10 jortel@redhat.com
As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download.
93
94 1 jortel@redhat.com
**jupiter**:
95
96 10 jortel@redhat.com
Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests.
97
98 15 jortel@redhat.com
~~~python
99
100 10 jortel@redhat.com
download = HttpDownload(..)
101 14 jortel@redhat.com
monitor = DownloadMonitor(download)
102 10 jortel@redhat.com
...  # perform download.
103 14 jortel@redhat.com
artifact = Artifact(**monitor.facts())
104 10 jortel@redhat.com
artifact.save()
105
~~~
106 1 jortel@redhat.com
107
**saturn**:
108 10 jortel@redhat.com
109
The *size* and all *digests* always calculated.
110
111 15 jortel@redhat.com
~~~python
112
113 10 jortel@redhat.com
downloader_obj = HttpDownloader(...)
114
...  # perform download.
115
result = task.result(**result.artifact_attributes)
116
artifact = Artifact()
117
artifact.save()
118
~~~
119
120 11 jortel@redhat.com
-----
121
122 1 jortel@redhat.com
As an importer, I need to download files concurrently.
123
124 11 jortel@redhat.com
**jupiter**:
125
126
Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once.
127
128 15 jortel@redhat.com
~~~python
129
130 11 jortel@redhat.com
downloads = (HttpDownload(...) for _ in range(10))
131
132
with Batch(downloads, backlog=3) as batch:
133
    for plan in batch():
134
        try:
135
            plan.result()
136
        except DownloadError:
137
            # An error occurred.
138
        else:
139 1 jortel@redhat.com
            # Use the downloaded file \o/
140
~~~
141 11 jortel@redhat.com
142
**saturn**:
143
144
Using the asyncio run loop. This example does not restrict the number of downloads in memory at once.
145 12 jortel@redhat.com
146 15 jortel@redhat.com
~~~python
147
148 16 jortel@redhat.com
downloaders = (HttpDownloader...) for _ in range(10))
149 11 jortel@redhat.com
150
loop = asyncio._get_running_loop()
151 16 jortel@redhat.com
done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloaders]))
152 11 jortel@redhat.com
for task in done:
153
    try:
154
        result = task.result()  # This is a DownloadResult
155
    except aiohttp.ClientError:
156
        # An error occurred.
157
~~~
158
159 1 jortel@redhat.com
-----
160
161 16 jortel@redhat.com
As an importer, I want to validate downloaded files.
162
163 1 jortel@redhat.com
**jupiter**:
164
165 17 jortel@redhat.com
Supported by adding provided or custom validations to the download. A validation error raises *ValidationError* which IsA *DownloadError*.
166
167 16 jortel@redhat.com
~~~python
168
169
download = HttpDownload(...)
170
download.append(DigestValidation('sha256', '0x1234'))
171
172
try:
173
    download()
174
except DownloadError:
175
    # An error occurred.
176
~~~
177
178
**saturn**:
179
180 17 jortel@redhat.com
Supported by passing the *expected_digests* dictionary and catching *DigestValidationError*.
181 16 jortel@redhat.com
182
~~~python
183
184
downloader_obj = HttpDownloader(..., expected_digests={'sha256': '0x1234'})
185
186
downloader_coroutine = downloader_obj.run()
187
loop = asyncio._get_running_loop()
188
done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
189
for task in done:
190
    try:
191
        result = task.result()  # This is a DownloadResult
192
    except (aiohttp.ClientError, DigestValidationError):
193
        # An error occurred.
194
~~~
195
196
-----
197
198 18 jortel@redhat.com
As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading.
199
200
**jupiter**:
201
202
~~~python
203
~~~
204
205
**saturn**:
206
207
~~~python
208
~~~
209
210
-----
211
212
As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory.
213
214
**jupiter**:
215
216
~~~python
217
~~~
218
219
**saturn**:
220
221
~~~python
222
~~~
223
224
-----
225
226
As an importer, I can perform concurrent downloading using a synchronous pattern.
227
228
**jupiter**:
229
230
~~~python
231
~~~
232
233
**saturn**:
234
235
~~~python
236
~~~
237
238
-----
239
240
As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads.
241
242
**jupiter**:
243
244
~~~python
245
~~~
246
247
**saturn**:
248
249
~~~python
250
~~~
251
252
-----
253
254
As an importer I can customize how downloading is performed. For example, to support mirror lists
255
256
**jupiter**:
257
258
~~~python
259
~~~
260
261
**saturn**:
262
263
~~~python
264
~~~
265
266
-----
267
268
As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections.
269
270
**jupiter**:
271
272
~~~python
273
~~~
274
275
**saturn**:
276
277
~~~python
278
~~~
279
280
-----
281
282
As an importer, I can terminate concurrent downlading at any point and not leak resources.
283
284
**jupiter**:
285
286
~~~python
287
~~~
288
289
**saturn**:
290
291
~~~python
292
~~~
293
294
-----
295
296 1 jortel@redhat.com
As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP.
297
298 18 jortel@redhat.com
**jupiter**:
299
300
~~~python
301
~~~
302
303
**saturn**:
304
305
~~~python
306
~~~
307
308
-----
309
310 1 jortel@redhat.com
### Streamer
311
312 18 jortel@redhat.com
As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box.
313 1 jortel@redhat.com
314 18 jortel@redhat.com
**jupiter**:
315
316
~~~python
317
~~~
318
319
**saturn**:
320
321
~~~python
322
~~~
323
324 1 jortel@redhat.com
-----
325
326 18 jortel@redhat.com
As the streamer, I can download using any protocol supported by the importer.
327
328
**jupiter**:
329
330
~~~python
331 16 jortel@redhat.com
~~~
332 18 jortel@redhat.com
333
**saturn**:
334
335
~~~python
336 1 jortel@redhat.com
~~~
337
338 18 jortel@redhat.com
-----
339
340
As the streamer, I want to validate downloaded files.
341
342 1 jortel@redhat.com
**jupiter**:
343
344
~~~python
345
~~~
346
347
**saturn**:
348
349
~~~python
350
~~~
351 18 jortel@redhat.com
352
-----
353
354
As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things.
355
356
**jupiter**:
357
358
~~~python
359
~~~
360
361
**saturn**:
362
363
~~~python
364
~~~
365
366
-----
367
368
As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer.
369
370
**jupiter**:
371
372
~~~python
373
~~~
374
375
**saturn**:
376
377
~~~python
378
~~~
379
380
-----
381
382
As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk.
383
384
**jupiter**:
385
386
~~~python
387
~~~
388
389
**saturn**:
390
391
~~~python
392
~~~
393
394
-----
395
396
As the streamer, I need to forward HTTP headers from the download response to the twisted response.
397
398
**jupiter**:
399
400
~~~python
401
~~~
402
403
**saturn**:
404
405
~~~python
406
~~~
407
408
-----
409
410
As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists
411
412
**jupiter**:
413
414
~~~python
415
~~~
416
417
**saturn**:
418
419
~~~python
420
~~~
421
422
-----