Downloading » History » Sprint/Milestone 15
jortel@redhat.com, 08/29/2017 11:53 PM
1 | 1 | jortel@redhat.com | # Downloading |
---|---|---|---|
2 | |||
3 | 4 | jortel@redhat.com | In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction. |
4 | 3 | jortel@redhat.com | |
5 | 5 | jortel@redhat.com | The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion. |
6 | 3 | jortel@redhat.com | |
7 | 1 | jortel@redhat.com | ## Use Cases |
8 | |||
9 | 2 | jortel@redhat.com | ### Importer |
10 | 1 | jortel@redhat.com | |
11 | 5 | jortel@redhat.com | As an importer, I need to download single files. |
12 | |||
13 | 9 | jortel@redhat.com | **jupiter**: |
14 | 5 | jortel@redhat.com | |
15 | 15 | jortel@redhat.com | ~~~python |
16 | 6 | jortel@redhat.com | download = HttpDownload( |
17 | url=url, |
||
18 | writer=FileWriter(path), |
||
19 | timeout=Timeout(connect=10, read=15), |
||
20 | user=User(name='elmer', password='...'), |
||
21 | ssl=SSL(ca_certificate='path-to-certificate', |
||
22 | client_certificate='path-to-certificate', |
||
23 | client_key='path-to-key', |
||
24 | validation=True), |
||
25 | proxy_url='http://user:password@gateway.org') |
||
26 | 5 | jortel@redhat.com | |
27 | try: |
||
28 | download() |
||
29 | except DownloadError: |
||
30 | # An error occurred. |
||
31 | else: |
||
32 | # Go read the downloaded file \o/ |
||
33 | ~~~ |
||
34 | |||
35 | 9 | jortel@redhat.com | **saturn**: |
36 | 1 | jortel@redhat.com | |
37 | 15 | jortel@redhat.com | ~~~python |
38 | 6 | jortel@redhat.com | ssl_context = aiohttpSSLContext() |
39 | ssl_context.load_cert_chain('path-to-CA_certificate') |
||
40 | ssl_context.load_cert_chain('path-to-CLIENT_certificate') |
||
41 | ssl_context.load_cert_chain('path-to-CLIENT_key') |
||
42 | |||
43 | connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context) |
||
44 | |||
45 | session = aiohttp.ClientSession( |
||
46 | connector=connector, |
||
47 | read_timeout=15, |
||
48 | auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')) |
||
49 | |||
50 | downloader_obj = HttpDownloader( |
||
51 | session, |
||
52 | url, |
||
53 | proxy='http://gateway.org', |
||
54 | proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8') |
||
55 | |||
56 | 5 | jortel@redhat.com | downloader_coroutine = downloader_obj.run() |
57 | loop = asyncio._get_running_loop() |
||
58 | done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) |
||
59 | for task in done: |
||
60 | try: |
||
61 | 1 | jortel@redhat.com | result = task.result() # This is a DownloadResult |
62 | except aiohttp.ClientError: |
||
63 | # An error occurred. |
||
64 | 5 | jortel@redhat.com | ~~~ |
65 | |||
66 | 6 | jortel@redhat.com | question: How can the connect timeout be set in aiohttp? |
67 | |||
68 | 1 | jortel@redhat.com | ----- |
69 | |||
70 | 9 | jortel@redhat.com | As an importer, I can leverage all settings supported by underlying protocol specific client lib. |
71 | |||
72 | **jupiter**: |
||
73 | |||
74 | 1 | jortel@redhat.com | Commonly used settings supported by abstraction. Additional settings could be supported by subclassing. |
75 | 9 | jortel@redhat.com | |
76 | 15 | jortel@redhat.com | ~~~python |
77 | |||
78 | 9 | jortel@redhat.com | class SpecialDownload(HttpDownload): |
79 | |||
80 | def _settings(self): |
||
81 | settings = super()._settings() |
||
82 | settings['special'] = <special value> |
||
83 | return settings |
||
84 | ~~~ |
||
85 | |||
86 | **saturn**: |
||
87 | |||
88 | 10 | jortel@redhat.com | The underlying client lib arguments directly exposed. |
89 | 9 | jortel@redhat.com | |
90 | ----- |
||
91 | 1 | jortel@redhat.com | |
92 | 10 | jortel@redhat.com | As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download. |
93 | |||
94 | 1 | jortel@redhat.com | **jupiter**: |
95 | |||
96 | 10 | jortel@redhat.com | Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests. |
97 | |||
98 | 15 | jortel@redhat.com | ~~~python |
99 | |||
100 | 10 | jortel@redhat.com | download = HttpDownload(..) |
101 | 14 | jortel@redhat.com | monitor = DownloadMonitor(download) |
102 | 10 | jortel@redhat.com | ... # perform download. |
103 | 14 | jortel@redhat.com | artifact = Artifact(**monitor.facts()) |
104 | 10 | jortel@redhat.com | artifact.save() |
105 | ~~~ |
||
106 | 1 | jortel@redhat.com | |
107 | **saturn**: |
||
108 | 10 | jortel@redhat.com | |
109 | The *size* and all *digests* always calculated. |
||
110 | |||
111 | 15 | jortel@redhat.com | ~~~python |
112 | |||
113 | 10 | jortel@redhat.com | downloader_obj = HttpDownloader(...) |
114 | ... # perform download. |
||
115 | result = task.result(**result.artifact_attributes) |
||
116 | artifact = Artifact() |
||
117 | artifact.save() |
||
118 | ~~~ |
||
119 | |||
120 | 11 | jortel@redhat.com | ----- |
121 | |||
122 | 1 | jortel@redhat.com | As an importer, I need to download files concurrently. |
123 | |||
124 | 11 | jortel@redhat.com | **jupiter**: |
125 | |||
126 | Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once. |
||
127 | |||
128 | 15 | jortel@redhat.com | ~~~python |
129 | |||
130 | 11 | jortel@redhat.com | downloads = (HttpDownload(...) for _ in range(10)) |
131 | |||
132 | with Batch(downloads, backlog=3) as batch: |
||
133 | for plan in batch(): |
||
134 | try: |
||
135 | plan.result() |
||
136 | except DownloadError: |
||
137 | # An error occurred. |
||
138 | else: |
||
139 | 1 | jortel@redhat.com | # Use the downloaded file \o/ |
140 | ~~~ |
||
141 | 11 | jortel@redhat.com | |
142 | **saturn**: |
||
143 | |||
144 | Using the asyncio run loop. This example does not restrict the number of downloads in memory at once. |
||
145 | 12 | jortel@redhat.com | |
146 | 15 | jortel@redhat.com | ~~~python |
147 | |||
148 | 11 | jortel@redhat.com | downloads = (HttpDownloader...) for _ in range(10)) |
149 | |||
150 | loop = asyncio._get_running_loop() |
||
151 | done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloads])) |
||
152 | for task in done: |
||
153 | try: |
||
154 | result = task.result() # This is a DownloadResult |
||
155 | except aiohttp.ClientError: |
||
156 | # An error occurred. |
||
157 | ~~~ |
||
158 | |||
159 | ----- |
||
160 | |||
161 | 1 | jortel@redhat.com | As an importer, I want to validate downloaded files. |
162 | As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading. |
||
163 | As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory. |
||
164 | As an importer, I can perform concurrent downloading using a synchronous pattern. |
||
165 | As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads. |
||
166 | As an importer I can customize how downloading is performed. For example, to support mirror lists |
||
167 | 7 | jortel@redhat.com | As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections. |
168 | 8 | jortel@redhat.com | As an importer, I can terminate concurrent downlading at any point and not leak resources. |
169 | 9 | jortel@redhat.com | As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP. |
170 | 1 | jortel@redhat.com | |
171 | 2 | jortel@redhat.com | ### Streamer |
172 | 1 | jortel@redhat.com | |
173 | 4 | jortel@redhat.com | As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box. |
174 | 1 | jortel@redhat.com | As the streamer, I can download using any protocol supported by the importer. |
175 | As the streamer, I want to validate downloaded files. |
||
176 | As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things. |
||
177 | As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer. |
||
178 | As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk. |
||
179 | 5 | jortel@redhat.com | As the streamer, I need to forward HTTP headers from the download response to the twisted response. |
180 | As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists |