Downloading » History » Sprint/Milestone 7
jortel@redhat.com, 08/29/2017 05:52 PM
1 | 1 | jortel@redhat.com | # Downloading |
---|---|---|---|
2 | |||
3 | 4 | jortel@redhat.com | In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction. |
4 | 3 | jortel@redhat.com | |
5 | 5 | jortel@redhat.com | The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion. |
6 | 3 | jortel@redhat.com | |
7 | 1 | jortel@redhat.com | ## Use Cases |
8 | |||
9 | 2 | jortel@redhat.com | ### Importer |
10 | 1 | jortel@redhat.com | |
11 | 5 | jortel@redhat.com | As an importer, I need to download single files. |
12 | |||
13 | jupiter |
||
14 | |||
15 | ~~~ |
||
16 | 6 | jortel@redhat.com | download = HttpDownload( |
17 | url=url, |
||
18 | writer=FileWriter(path), |
||
19 | timeout=Timeout(connect=10, read=15), |
||
20 | user=User(name='elmer', password='...'), |
||
21 | ssl=SSL(ca_certificate='path-to-certificate', |
||
22 | client_certificate='path-to-certificate', |
||
23 | client_key='path-to-key', |
||
24 | validation=True), |
||
25 | proxy_url='http://user:password@gateway.org') |
||
26 | 5 | jortel@redhat.com | |
27 | try: |
||
28 | download() |
||
29 | except DownloadError: |
||
30 | # An error occurred. |
||
31 | else: |
||
32 | # Go read the downloaded file \o/ |
||
33 | ~~~ |
||
34 | |||
35 | 1 | jortel@redhat.com | saturn |
36 | |||
37 | ~~~ |
||
38 | 6 | jortel@redhat.com | ssl_context = aiohttpSSLContext() |
39 | ssl_context.load_cert_chain('path-to-CA_certificate') |
||
40 | ssl_context.load_cert_chain('path-to-CLIENT_certificate') |
||
41 | ssl_context.load_cert_chain('path-to-CLIENT_key') |
||
42 | |||
43 | connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context) |
||
44 | |||
45 | session = aiohttp.ClientSession( |
||
46 | connector=connector, |
||
47 | read_timeout=15, |
||
48 | auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')) |
||
49 | |||
50 | downloader_obj = HttpDownloader( |
||
51 | session, |
||
52 | url, |
||
53 | proxy='http://gateway.org', |
||
54 | proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8') |
||
55 | |||
56 | 5 | jortel@redhat.com | downloader_coroutine = downloader_obj.run() |
57 | loop = asyncio._get_running_loop() |
||
58 | done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) |
||
59 | for task in done: |
||
60 | try: |
||
61 | 1 | jortel@redhat.com | result = task.result() # This is a DownloadResult |
62 | except aiohttp.ClientError: |
||
63 | # An error occurred. |
||
64 | 5 | jortel@redhat.com | ~~~ |
65 | |||
66 | 6 | jortel@redhat.com | question: How can the connect timeout be set in aiohttp? |
67 | |||
68 | ----- |
||
69 | 5 | jortel@redhat.com | |
70 | 1 | jortel@redhat.com | As an importer, I need to download files concurrently. |
71 | As an importer, I want to validate downloaded files. |
||
72 | As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading. |
||
73 | As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory. |
||
74 | As an importer, I can perform concurrent downloading using a synchronous pattern. |
||
75 | As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads. |
||
76 | As an importer I can customize how downloading is performed. For example, to support mirror lists |
||
77 | As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections. |
||
78 | As an importer, I can terminate concurrent downlading at any point and not leak resources. |
||
79 | 7 | jortel@redhat.com | As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP. |
80 | As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download. |
||
81 | 1 | jortel@redhat.com | |
82 | 2 | jortel@redhat.com | ### Streamer |
83 | 1 | jortel@redhat.com | |
84 | 4 | jortel@redhat.com | As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box. |
85 | 1 | jortel@redhat.com | As the streamer, I can download using any protocol supported by the importer. |
86 | As the streamer, I want to validate downloaded files. |
||
87 | As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things. |
||
88 | As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer. |
||
89 | As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk. |
||
90 | 5 | jortel@redhat.com | As the streamer, I need to forward HTTP headers from the download response to the twisted response. |
91 | As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists |