Downloading » History » Sprint/Milestone 19
jortel@redhat.com, 08/30/2017 06:15 PM
1 | 1 | jortel@redhat.com | # Downloading |
---|---|---|---|
2 | |||
3 | 4 | jortel@redhat.com | In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction. |
4 | 3 | jortel@redhat.com | |
5 | 5 | jortel@redhat.com | The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion. |
6 | 3 | jortel@redhat.com | |
7 | 1 | jortel@redhat.com | ## Use Cases |
8 | |||
9 | 2 | jortel@redhat.com | ### Importer |
10 | 1 | jortel@redhat.com | |
11 | 5 | jortel@redhat.com | As an importer, I need to download single files. |
12 | |||
13 | 9 | jortel@redhat.com | **jupiter**: |
14 | 5 | jortel@redhat.com | |
15 | 15 | jortel@redhat.com | ~~~python |
16 | 6 | jortel@redhat.com | download = HttpDownload( |
17 | url=url, |
||
18 | writer=FileWriter(path), |
||
19 | timeout=Timeout(connect=10, read=15), |
||
20 | user=User(name='elmer', password='...'), |
||
21 | ssl=SSL(ca_certificate='path-to-certificate', |
||
22 | client_certificate='path-to-certificate', |
||
23 | client_key='path-to-key', |
||
24 | validation=True), |
||
25 | proxy_url='http://user:password@gateway.org') |
||
26 | 5 | jortel@redhat.com | |
27 | try: |
||
28 | download() |
||
29 | except DownloadError: |
||
30 | # An error occurred. |
||
31 | else: |
||
32 | # Go read the downloaded file \o/ |
||
33 | ~~~ |
||
34 | |||
35 | 9 | jortel@redhat.com | **saturn**: |
36 | 1 | jortel@redhat.com | |
37 | 15 | jortel@redhat.com | ~~~python |
38 | 6 | jortel@redhat.com | ssl_context = aiohttpSSLContext() |
39 | ssl_context.load_cert_chain('path-to-CA_certificate') |
||
40 | ssl_context.load_cert_chain('path-to-CLIENT_certificate') |
||
41 | ssl_context.load_cert_chain('path-to-CLIENT_key') |
||
42 | |||
43 | connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context) |
||
44 | |||
45 | session = aiohttp.ClientSession( |
||
46 | connector=connector, |
||
47 | read_timeout=15, |
||
48 | auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')) |
||
49 | |||
50 | downloader_obj = HttpDownloader( |
||
51 | session, |
||
52 | url, |
||
53 | proxy='http://gateway.org', |
||
54 | proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8') |
||
55 | |||
56 | 5 | jortel@redhat.com | downloader_coroutine = downloader_obj.run() |
57 | loop = asyncio._get_running_loop() |
||
58 | done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) |
||
59 | for task in done: |
||
60 | try: |
||
61 | 1 | jortel@redhat.com | result = task.result() # This is a DownloadResult |
62 | except aiohttp.ClientError: |
||
63 | # An error occurred. |
||
64 | 5 | jortel@redhat.com | ~~~ |
65 | |||
66 | 6 | jortel@redhat.com | question: How can the connect timeout be set in aiohttp? |
67 | |||
68 | 1 | jortel@redhat.com | ----- |
69 | |||
70 | 9 | jortel@redhat.com | As an importer, I can leverage all settings supported by underlying protocol specific client lib. |
71 | |||
72 | **jupiter**: |
||
73 | |||
74 | 1 | jortel@redhat.com | Commonly used settings supported by abstraction. Additional settings could be supported by subclassing. |
75 | 9 | jortel@redhat.com | |
76 | 15 | jortel@redhat.com | ~~~python |
77 | |||
78 | 9 | jortel@redhat.com | class SpecialDownload(HttpDownload): |
79 | |||
80 | def _settings(self): |
||
81 | settings = super()._settings() |
||
82 | settings['special'] = <special value> |
||
83 | return settings |
||
84 | ~~~ |
||
85 | |||
86 | **saturn**: |
||
87 | |||
88 | 10 | jortel@redhat.com | The underlying client lib arguments directly exposed. |
89 | 9 | jortel@redhat.com | |
90 | ----- |
||
91 | 1 | jortel@redhat.com | |
92 | 10 | jortel@redhat.com | As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download. |
93 | |||
94 | 1 | jortel@redhat.com | **jupiter**: |
95 | |||
96 | 10 | jortel@redhat.com | Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests. |
97 | |||
98 | 15 | jortel@redhat.com | ~~~python |
99 | |||
100 | 10 | jortel@redhat.com | download = HttpDownload(..) |
101 | 14 | jortel@redhat.com | monitor = DownloadMonitor(download) |
102 | 10 | jortel@redhat.com | ... # perform download. |
103 | 14 | jortel@redhat.com | artifact = Artifact(**monitor.facts()) |
104 | 10 | jortel@redhat.com | artifact.save() |
105 | ~~~ |
||
106 | 1 | jortel@redhat.com | |
107 | **saturn**: |
||
108 | 10 | jortel@redhat.com | |
109 | The *size* and all *digests* always calculated. |
||
110 | |||
111 | 15 | jortel@redhat.com | ~~~python |
112 | |||
113 | 10 | jortel@redhat.com | downloader_obj = HttpDownloader(...) |
114 | ... # perform download. |
||
115 | result = task.result(**result.artifact_attributes) |
||
116 | artifact = Artifact() |
||
117 | artifact.save() |
||
118 | ~~~ |
||
119 | |||
120 | 11 | jortel@redhat.com | ----- |
121 | |||
122 | 1 | jortel@redhat.com | As an importer, I need to download files concurrently. |
123 | |||
124 | 11 | jortel@redhat.com | **jupiter**: |
125 | |||
126 | Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once. |
||
127 | |||
128 | 15 | jortel@redhat.com | ~~~python |
129 | |||
130 | 11 | jortel@redhat.com | downloads = (HttpDownload(...) for _ in range(10)) |
131 | |||
132 | with Batch(downloads, backlog=3) as batch: |
||
133 | for plan in batch(): |
||
134 | try: |
||
135 | plan.result() |
||
136 | except DownloadError: |
||
137 | # An error occurred. |
||
138 | else: |
||
139 | 1 | jortel@redhat.com | # Use the downloaded file \o/ |
140 | ~~~ |
||
141 | 11 | jortel@redhat.com | |
142 | **saturn**: |
||
143 | |||
144 | Using the asyncio run loop. This example does not restrict the number of downloads in memory at once. |
||
145 | 12 | jortel@redhat.com | |
146 | 15 | jortel@redhat.com | ~~~python |
147 | |||
148 | 16 | jortel@redhat.com | downloaders = (HttpDownloader...) for _ in range(10)) |
149 | 11 | jortel@redhat.com | |
150 | loop = asyncio._get_running_loop() |
||
151 | 16 | jortel@redhat.com | done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloaders])) |
152 | 11 | jortel@redhat.com | for task in done: |
153 | try: |
||
154 | result = task.result() # This is a DownloadResult |
||
155 | except aiohttp.ClientError: |
||
156 | # An error occurred. |
||
157 | ~~~ |
||
158 | |||
159 | 1 | jortel@redhat.com | ----- |
160 | |||
161 | 16 | jortel@redhat.com | As an importer, I want to validate downloaded files. |
162 | |||
163 | 1 | jortel@redhat.com | **jupiter**: |
164 | |||
165 | 17 | jortel@redhat.com | Supported by adding provided or custom validations to the download. A validation error raises *ValidationError* which IsA *DownloadError*. |
166 | |||
167 | 16 | jortel@redhat.com | ~~~python |
168 | |||
169 | download = HttpDownload(...) |
||
170 | download.append(DigestValidation('sha256', '0x1234')) |
||
171 | |||
172 | try: |
||
173 | download() |
||
174 | except DownloadError: |
||
175 | # An error occurred. |
||
176 | ~~~ |
||
177 | |||
178 | **saturn**: |
||
179 | |||
180 | 17 | jortel@redhat.com | Supported by passing the *expected_digests* dictionary and catching *DigestValidationError*. |
181 | 16 | jortel@redhat.com | |
182 | ~~~python |
||
183 | |||
184 | downloader_obj = HttpDownloader(..., expected_digests={'sha256': '0x1234'}) |
||
185 | |||
186 | downloader_coroutine = downloader_obj.run() |
||
187 | loop = asyncio._get_running_loop() |
||
188 | done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) |
||
189 | for task in done: |
||
190 | try: |
||
191 | result = task.result() # This is a DownloadResult |
||
192 | except (aiohttp.ClientError, DigestValidationError): |
||
193 | # An error occurred. |
||
194 | ~~~ |
||
195 | |||
196 | ----- |
||
197 | |||
198 | 18 | jortel@redhat.com | As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading. |
199 | |||
200 | **jupiter**: |
||
201 | |||
202 | ~~~python |
||
203 | ~~~ |
||
204 | |||
205 | **saturn**: |
||
206 | |||
207 | ~~~python |
||
208 | ~~~ |
||
209 | |||
210 | ----- |
||
211 | |||
212 | As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory. |
||
213 | |||
214 | **jupiter**: |
||
215 | |||
216 | 19 | jortel@redhat.com | Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once. |
217 | |||
218 | 18 | jortel@redhat.com | ~~~python |
219 | 19 | jortel@redhat.com | |
220 | downloads = (HttpDownload(...) for _ in range(10)) |
||
221 | |||
222 | with Batch(downloads, backlog=3) as batch: |
||
223 | for plan in batch(): |
||
224 | try: |
||
225 | plan.result() |
||
226 | except DownloadError: |
||
227 | # An error occurred. |
||
228 | else: |
||
229 | # Use the downloaded file \o/ |
||
230 | 18 | jortel@redhat.com | ~~~ |
231 | |||
232 | **saturn**: |
||
233 | |||
234 | 19 | jortel@redhat.com | Using the GroupDownloader? |
235 | |||
236 | 18 | jortel@redhat.com | ~~~python |
237 | ~~~ |
||
238 | |||
239 | ----- |
||
240 | |||
241 | As an importer, I can perform concurrent downloading using a synchronous pattern. |
||
242 | |||
243 | 1 | jortel@redhat.com | **jupiter**: |
244 | 18 | jortel@redhat.com | |
245 | 19 | jortel@redhat.com | Using the *Batch*. See other examples. |
246 | 18 | jortel@redhat.com | |
247 | **saturn**: |
||
248 | |||
249 | 19 | jortel@redhat.com | Using either the *GroupDownloader* or asyncio loop directly. See other examples. |
250 | 18 | jortel@redhat.com | |
251 | 1 | jortel@redhat.com | ----- |
252 | |||
253 | 18 | jortel@redhat.com | As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads. |
254 | 1 | jortel@redhat.com | |
255 | 18 | jortel@redhat.com | **jupiter**: |
256 | |||
257 | 19 | jortel@redhat.com | The Download.context is designed to support this. The *shared* context can be used to share anyting This includes python-requests sessions (using a Cache), auth tokens and resolved mirror lists. The sharing is done through collaboration. When it's appropriate for individual downloads to share things, an external actor like the Batch or the Streamer will ensure that all of the download |
258 | objects have the same context. |
||
259 | 18 | jortel@redhat.com | |
260 | **saturn**: |
||
261 | |||
262 | 19 | jortel@redhat.com | Each downloader could define a class attribute. This global can be used share anything. This includes python-requests sessions (using a Cache), auth tokens and resolved mirror lists. The sharing is done through collaboration. Sharing is global and unconditional. |
263 | 18 | jortel@redhat.com | |
264 | ----- |
||
265 | |||
266 | As an importer I can customize how downloading is performed. For example, to support mirror lists |
||
267 | |||
268 | **jupiter**: |
||
269 | |||
270 | ~~~python |
||
271 | ~~~ |
||
272 | |||
273 | **saturn**: |
||
274 | |||
275 | ~~~python |
||
276 | ~~~ |
||
277 | |||
278 | ----- |
||
279 | |||
280 | As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections. |
||
281 | |||
282 | **jupiter**: |
||
283 | |||
284 | ~~~python |
||
285 | ~~~ |
||
286 | |||
287 | **saturn**: |
||
288 | |||
289 | ~~~python |
||
290 | ~~~ |
||
291 | |||
292 | ----- |
||
293 | |||
294 | As an importer, I can terminate concurrent downlading at any point and not leak resources. |
||
295 | |||
296 | **jupiter**: |
||
297 | |||
298 | ~~~python |
||
299 | ~~~ |
||
300 | |||
301 | **saturn**: |
||
302 | |||
303 | ~~~python |
||
304 | ~~~ |
||
305 | |||
306 | ----- |
||
307 | |||
308 | 1 | jortel@redhat.com | As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP. |
309 | |||
310 | 18 | jortel@redhat.com | **jupiter**: |
311 | |||
312 | ~~~python |
||
313 | ~~~ |
||
314 | |||
315 | **saturn**: |
||
316 | |||
317 | ~~~python |
||
318 | ~~~ |
||
319 | |||
320 | ----- |
||
321 | |||
322 | 1 | jortel@redhat.com | ### Streamer |
323 | |||
324 | 18 | jortel@redhat.com | As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box. |
325 | 1 | jortel@redhat.com | |
326 | 18 | jortel@redhat.com | **jupiter**: |
327 | |||
328 | ~~~python |
||
329 | ~~~ |
||
330 | |||
331 | **saturn**: |
||
332 | |||
333 | ~~~python |
||
334 | ~~~ |
||
335 | |||
336 | 1 | jortel@redhat.com | ----- |
337 | |||
338 | 18 | jortel@redhat.com | As the streamer, I can download using any protocol supported by the importer. |
339 | |||
340 | **jupiter**: |
||
341 | |||
342 | ~~~python |
||
343 | 16 | jortel@redhat.com | ~~~ |
344 | 18 | jortel@redhat.com | |
345 | **saturn**: |
||
346 | |||
347 | ~~~python |
||
348 | 1 | jortel@redhat.com | ~~~ |
349 | |||
350 | 18 | jortel@redhat.com | ----- |
351 | |||
352 | As the streamer, I want to validate downloaded files. |
||
353 | |||
354 | 1 | jortel@redhat.com | **jupiter**: |
355 | |||
356 | ~~~python |
||
357 | ~~~ |
||
358 | |||
359 | **saturn**: |
||
360 | |||
361 | ~~~python |
||
362 | ~~~ |
||
363 | 18 | jortel@redhat.com | |
364 | ----- |
||
365 | |||
366 | As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things. |
||
367 | |||
368 | **jupiter**: |
||
369 | |||
370 | ~~~python |
||
371 | ~~~ |
||
372 | |||
373 | **saturn**: |
||
374 | |||
375 | ~~~python |
||
376 | ~~~ |
||
377 | |||
378 | ----- |
||
379 | |||
380 | As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer. |
||
381 | |||
382 | **jupiter**: |
||
383 | |||
384 | ~~~python |
||
385 | ~~~ |
||
386 | |||
387 | **saturn**: |
||
388 | |||
389 | ~~~python |
||
390 | ~~~ |
||
391 | |||
392 | ----- |
||
393 | |||
394 | As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk. |
||
395 | |||
396 | **jupiter**: |
||
397 | |||
398 | ~~~python |
||
399 | ~~~ |
||
400 | |||
401 | **saturn**: |
||
402 | |||
403 | ~~~python |
||
404 | ~~~ |
||
405 | |||
406 | ----- |
||
407 | |||
408 | As the streamer, I need to forward HTTP headers from the download response to the twisted response. |
||
409 | |||
410 | **jupiter**: |
||
411 | |||
412 | ~~~python |
||
413 | ~~~ |
||
414 | |||
415 | **saturn**: |
||
416 | |||
417 | ~~~python |
||
418 | ~~~ |
||
419 | |||
420 | ----- |
||
421 | |||
422 | As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists |
||
423 | |||
424 | **jupiter**: |
||
425 | |||
426 | ~~~python |
||
427 | ~~~ |
||
428 | |||
429 | **saturn**: |
||
430 | |||
431 | ~~~python |
||
432 | ~~~ |
||
433 | |||
434 | ----- |