Downloading » History » Revision 18
Revision 17 (jortel@redhat.com, 08/30/2017 05:52 PM) → Revision 18/32 (jortel@redhat.com, 08/30/2017 05:57 PM)
# Downloading In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction. The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion. ## Use Cases ### Importer As an importer, I need to download single files. **jupiter**: ~~~python download = HttpDownload( url=url, writer=FileWriter(path), timeout=Timeout(connect=10, read=15), user=User(name='elmer', password='...'), ssl=SSL(ca_certificate='path-to-certificate', client_certificate='path-to-certificate', client_key='path-to-key', validation=True), proxy_url='http://user:password@gateway.org') try: download() except DownloadError: # An error occurred. else: # Go read the downloaded file \o/ ~~~ **saturn**: ~~~python ssl_context = aiohttpSSLContext() ssl_context.load_cert_chain('path-to-CA_certificate') ssl_context.load_cert_chain('path-to-CLIENT_certificate') ssl_context.load_cert_chain('path-to-CLIENT_key') connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context) session = aiohttp.ClientSession( connector=connector, read_timeout=15, auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')) downloader_obj = HttpDownloader( session, url, proxy='http://gateway.org', proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8') downloader_coroutine = downloader_obj.run() loop = asyncio._get_running_loop() done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) for task in done: try: result = task.result() # This is a DownloadResult except aiohttp.ClientError: # An error occurred. ~~~ question: How can the connect timeout be set in aiohttp? ----- As an importer, I can leverage all settings supported by underlying protocol specific client lib. **jupiter**: Commonly used settings supported by abstraction. Additional settings could be supported by subclassing. ~~~python class SpecialDownload(HttpDownload): def _settings(self): settings = super()._settings() settings['special'] = <special value> return settings ~~~ **saturn**: The underlying client lib arguments directly exposed. ----- As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download. **jupiter**: Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests. ~~~python download = HttpDownload(..) monitor = DownloadMonitor(download) ... # perform download. artifact = Artifact(**monitor.facts()) artifact.save() ~~~ **saturn**: The *size* and all *digests* always calculated. ~~~python downloader_obj = HttpDownloader(...) ... # perform download. result = task.result(**result.artifact_attributes) artifact = Artifact() artifact.save() ~~~ ----- As an importer, I need to download files concurrently. **jupiter**: Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once. ~~~python downloads = (HttpDownload(...) for _ in range(10)) with Batch(downloads, backlog=3) as batch: for plan in batch(): try: plan.result() except DownloadError: # An error occurred. else: # Use the downloaded file \o/ ~~~ **saturn**: Using the asyncio run loop. This example does not restrict the number of downloads in memory at once. ~~~python downloaders = (HttpDownloader...) for _ in range(10)) loop = asyncio._get_running_loop() done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloaders])) for task in done: try: result = task.result() # This is a DownloadResult except aiohttp.ClientError: # An error occurred. ~~~ ----- As an importer, I want to validate downloaded files. **jupiter**: Supported by adding provided or custom validations to the download. A validation error raises *ValidationError* which IsA *DownloadError*. ~~~python download = HttpDownload(...) download.append(DigestValidation('sha256', '0x1234')) try: download() except DownloadError: # An error occurred. ~~~ **saturn**: Supported by passing the *expected_digests* dictionary and catching *DigestValidationError*. ~~~python downloader_obj = HttpDownloader(..., expected_digests={'sha256': '0x1234'}) downloader_coroutine = downloader_obj.run() loop = asyncio._get_running_loop() done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) for task in done: try: result = task.result() # This is a DownloadResult except (aiohttp.ClientError, DigestValidationError): # An error occurred. ~~~ ----- As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As an importer, I can perform concurrent downloading using a synchronous pattern. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As an importer I can customize how downloading is performed. For example, to support mirror lists **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As an importer, I can terminate concurrent downlading at any point and not leak resources. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- ### Streamer As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As the streamer, I can download using any protocol supported by the importer. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As the streamer, I want to validate downloaded files. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As the streamer, I need to forward HTTP headers from the download response to the twisted response. **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ ----- As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists ----- ~~~ Template: (removed after document is finished) ~~~ **jupiter**: ~~~python ~~~ **saturn**: ~~~python ~~~ -----