Downloading » History » Sprint/Milestone 13
jortel@redhat.com, 08/29/2017 10:43 PM
1 | 1 | jortel@redhat.com | # Downloading |
---|---|---|---|
2 | |||
3 | 4 | jortel@redhat.com | In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction. |
4 | 3 | jortel@redhat.com | |
5 | 5 | jortel@redhat.com | The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion. |
6 | 3 | jortel@redhat.com | |
7 | 1 | jortel@redhat.com | ## Use Cases |
8 | |||
9 | 2 | jortel@redhat.com | ### Importer |
10 | 1 | jortel@redhat.com | |
11 | 5 | jortel@redhat.com | As an importer, I need to download single files. |
12 | |||
13 | 9 | jortel@redhat.com | **jupiter**: |
14 | 5 | jortel@redhat.com | |
15 | ~~~ |
||
16 | 6 | jortel@redhat.com | download = HttpDownload( |
17 | url=url, |
||
18 | writer=FileWriter(path), |
||
19 | timeout=Timeout(connect=10, read=15), |
||
20 | user=User(name='elmer', password='...'), |
||
21 | ssl=SSL(ca_certificate='path-to-certificate', |
||
22 | client_certificate='path-to-certificate', |
||
23 | client_key='path-to-key', |
||
24 | validation=True), |
||
25 | proxy_url='http://user:password@gateway.org') |
||
26 | 5 | jortel@redhat.com | |
27 | try: |
||
28 | download() |
||
29 | except DownloadError: |
||
30 | # An error occurred. |
||
31 | else: |
||
32 | # Go read the downloaded file \o/ |
||
33 | ~~~ |
||
34 | |||
35 | 9 | jortel@redhat.com | **saturn**: |
36 | 1 | jortel@redhat.com | |
37 | ~~~ |
||
38 | 6 | jortel@redhat.com | ssl_context = aiohttpSSLContext() |
39 | ssl_context.load_cert_chain('path-to-CA_certificate') |
||
40 | ssl_context.load_cert_chain('path-to-CLIENT_certificate') |
||
41 | ssl_context.load_cert_chain('path-to-CLIENT_key') |
||
42 | |||
43 | connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context) |
||
44 | |||
45 | session = aiohttp.ClientSession( |
||
46 | connector=connector, |
||
47 | read_timeout=15, |
||
48 | auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')) |
||
49 | |||
50 | downloader_obj = HttpDownloader( |
||
51 | session, |
||
52 | url, |
||
53 | proxy='http://gateway.org', |
||
54 | proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8') |
||
55 | |||
56 | 5 | jortel@redhat.com | downloader_coroutine = downloader_obj.run() |
57 | loop = asyncio._get_running_loop() |
||
58 | done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) |
||
59 | for task in done: |
||
60 | try: |
||
61 | 1 | jortel@redhat.com | result = task.result() # This is a DownloadResult |
62 | except aiohttp.ClientError: |
||
63 | # An error occurred. |
||
64 | 5 | jortel@redhat.com | ~~~ |
65 | |||
66 | 6 | jortel@redhat.com | question: How can the connect timeout be set in aiohttp? |
67 | |||
68 | 1 | jortel@redhat.com | ----- |
69 | |||
70 | 9 | jortel@redhat.com | As an importer, I can leverage all settings supported by underlying protocol specific client lib. |
71 | |||
72 | **jupiter**: |
||
73 | |||
74 | Commonly used settings supported by abstraction. Additional settings could be supported by subclassing. |
||
75 | |||
76 | ~~~ |
||
77 | class SpecialDownload(HttpDownload): |
||
78 | |||
79 | def _settings(self): |
||
80 | settings = super()._settings() |
||
81 | settings['special'] = <special value> |
||
82 | return settings |
||
83 | ~~~ |
||
84 | |||
85 | **saturn**: |
||
86 | |||
87 | 10 | jortel@redhat.com | The underlying client lib arguments directly exposed. |
88 | 9 | jortel@redhat.com | |
89 | ----- |
||
90 | 1 | jortel@redhat.com | |
91 | 10 | jortel@redhat.com | As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download. |
92 | |||
93 | **jupiter**: |
||
94 | |||
95 | Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests. |
||
96 | |||
97 | ~~~ |
||
98 | download = HttpDownload(..) |
||
99 | 13 | jortel@redhat.com | facts = DownloadMonitor(download) |
100 | 10 | jortel@redhat.com | ... # perform download. |
101 | 13 | jortel@redhat.com | artifact = Artifact(**facts.dict()) |
102 | 10 | jortel@redhat.com | artifact.save() |
103 | ~~~ |
||
104 | |||
105 | **saturn**: |
||
106 | |||
107 | The *size* and all *digests* always calculated. |
||
108 | |||
109 | ~~~ |
||
110 | downloader_obj = HttpDownloader(...) |
||
111 | ... # perform download. |
||
112 | result = task.result(**result.artifact_attributes) |
||
113 | artifact = Artifact() |
||
114 | artifact.save() |
||
115 | ~~~ |
||
116 | |||
117 | 11 | jortel@redhat.com | ----- |
118 | |||
119 | As an importer, I need to download files concurrently. |
||
120 | |||
121 | **jupiter**: |
||
122 | |||
123 | Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once. |
||
124 | |||
125 | ~~~ |
||
126 | downloads = (HttpDownload(...) for _ in range(10)) |
||
127 | |||
128 | with Batch(downloads, backlog=3) as batch: |
||
129 | for plan in batch(): |
||
130 | try: |
||
131 | plan.result() |
||
132 | except DownloadError: |
||
133 | # An error occurred. |
||
134 | else: |
||
135 | # Use the downloaded file \o/ |
||
136 | ~~~ |
||
137 | |||
138 | **saturn**: |
||
139 | |||
140 | 12 | jortel@redhat.com | Using the asyncio run loop. This example does not restrict the number of downloads in memory at once. |
141 | 11 | jortel@redhat.com | |
142 | ~~~ |
||
143 | downloads = (HttpDownloader...) for _ in range(10)) |
||
144 | |||
145 | loop = asyncio._get_running_loop() |
||
146 | done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloads])) |
||
147 | for task in done: |
||
148 | try: |
||
149 | result = task.result() # This is a DownloadResult |
||
150 | except aiohttp.ClientError: |
||
151 | # An error occurred. |
||
152 | ~~~ |
||
153 | |||
154 | ----- |
||
155 | |||
156 | 1 | jortel@redhat.com | As an importer, I want to validate downloaded files. |
157 | As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading. |
||
158 | As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory. |
||
159 | As an importer, I can perform concurrent downloading using a synchronous pattern. |
||
160 | As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads. |
||
161 | As an importer I can customize how downloading is performed. For example, to support mirror lists |
||
162 | 7 | jortel@redhat.com | As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections. |
163 | 8 | jortel@redhat.com | As an importer, I can terminate concurrent downlading at any point and not leak resources. |
164 | 9 | jortel@redhat.com | As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP. |
165 | 1 | jortel@redhat.com | |
166 | 2 | jortel@redhat.com | ### Streamer |
167 | 1 | jortel@redhat.com | |
168 | 4 | jortel@redhat.com | As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box. |
169 | 1 | jortel@redhat.com | As the streamer, I can download using any protocol supported by the importer. |
170 | As the streamer, I want to validate downloaded files. |
||
171 | As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things. |
||
172 | As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer. |
||
173 | As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk. |
||
174 | 5 | jortel@redhat.com | As the streamer, I need to forward HTTP headers from the download response to the twisted response. |
175 | As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists |