Project

Profile

Help

Downloading » History » Revision 18

Revision 17 (jortel@redhat.com, 08/30/2017 05:52 PM) → Revision 18/32 (jortel@redhat.com, 08/30/2017 05:57 PM)

# Downloading 

 In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction. 

 The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion. 

 ## Use Cases 

 ### Importer 

 As an importer, I need to download single files. 

 **jupiter**: 

 ~~~python 
 download = HttpDownload( 
     url=url, 
     writer=FileWriter(path), 
     timeout=Timeout(connect=10, read=15), 
     user=User(name='elmer', password='...'), 
     ssl=SSL(ca_certificate='path-to-certificate', 
             client_certificate='path-to-certificate', 
             client_key='path-to-key', 
             validation=True), 
     proxy_url='http://user:password@gateway.org') 

 try: 
     download() 
 except DownloadError: 
     # An error occurred. 
 else: 
    # Go read the downloaded file \o/ 
 ~~~ 

 **saturn**: 

 ~~~python 
 ssl_context = aiohttpSSLContext() 
 ssl_context.load_cert_chain('path-to-CA_certificate') 
 ssl_context.load_cert_chain('path-to-CLIENT_certificate') 
 ssl_context.load_cert_chain('path-to-CLIENT_key') 

 connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context) 

 session = aiohttp.ClientSession( 
     connector=connector, 
     read_timeout=15, 
     auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')) 

 downloader_obj = HttpDownloader( 
     session, 
     url, 
     proxy='http://gateway.org', 
     proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8') 

 downloader_coroutine = downloader_obj.run() 
 loop = asyncio._get_running_loop() 
 done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) 
 for task in done: 
     try: 
         result = task.result()    # This is a DownloadResult 
     except aiohttp.ClientError: 
         # An error occurred. 
 ~~~ 

 question: How can the connect timeout be set in aiohttp? 

 ----- 

 As an importer, I can leverage all settings supported by underlying protocol specific client lib. 

 **jupiter**: 

 Commonly used settings supported by abstraction. Additional settings could be supported by subclassing. 

 ~~~python 

 class SpecialDownload(HttpDownload): 

     def _settings(self): 
         settings = super()._settings() 
         settings['special'] = <special value> 
         return settings 
 ~~~ 

 **saturn**: 

 The underlying client lib arguments directly exposed. 

 ----- 

 As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download. 

 **jupiter**: 

 Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests. 

 ~~~python 

 download = HttpDownload(..) 
 monitor = DownloadMonitor(download) 
 ...    # perform download. 
 artifact = Artifact(**monitor.facts()) 
 artifact.save() 
 ~~~ 

 **saturn**: 

 The *size* and all *digests* always calculated. 

 ~~~python 

 downloader_obj = HttpDownloader(...) 
 ...    # perform download. 
 result = task.result(**result.artifact_attributes) 
 artifact = Artifact() 
 artifact.save() 
 ~~~ 

 ----- 

 As an importer, I need to download files concurrently. 

 **jupiter**: 

 Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once. 

 ~~~python 

 downloads = (HttpDownload(...) for _ in range(10)) 

 with Batch(downloads, backlog=3) as batch: 
     for plan in batch(): 
         try: 
             plan.result() 
         except DownloadError: 
             # An error occurred. 
         else: 
             # Use the downloaded file \o/ 
 ~~~ 

 **saturn**: 

 Using the asyncio run loop. This example does not restrict the number of downloads in memory at once. 

 ~~~python 

 downloaders = (HttpDownloader...) for _ in range(10)) 

 loop = asyncio._get_running_loop() 
 done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloaders])) 
 for task in done: 
     try: 
         result = task.result()    # This is a DownloadResult 
     except aiohttp.ClientError: 
         # An error occurred. 
 ~~~ 

 ----- 

 As an importer, I want to validate downloaded files. 

 **jupiter**: 

 Supported by adding provided or custom validations to the download. A validation error raises *ValidationError* which IsA *DownloadError*. 

 ~~~python 

 download = HttpDownload(...) 
 download.append(DigestValidation('sha256', '0x1234')) 

 try: 
     download() 
 except DownloadError: 
     # An error occurred. 
 ~~~ 

 **saturn**: 

 Supported by passing the *expected_digests* dictionary and catching *DigestValidationError*. 

 ~~~python 

 downloader_obj = HttpDownloader(..., expected_digests={'sha256': '0x1234'}) 

 downloader_coroutine = downloader_obj.run() 
 loop = asyncio._get_running_loop() 
 done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine])) 
 for task in done: 
     try: 
         result = task.result()    # This is a DownloadResult 
     except (aiohttp.ClientError, DigestValidationError): 
         # An error occurred. 
 ~~~ 

 ----- 

 As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As an importer, I can perform concurrent downloading using a synchronous pattern. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As an importer I can customize how downloading is performed. For example, to support mirror lists 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As an importer, I can terminate concurrent downlading at any point and not leak resources. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

 ### Streamer 

 As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As the streamer, I can download using any protocol supported by the importer. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As the streamer, I want to validate downloaded files. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As the streamer, I need to forward HTTP headers from the download response to the twisted response. 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 ----- 

   
 As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists 

 ----- 

 ~~~ 
 Template: (removed after document is finished) 
 ~~~ 

 **jupiter**: 

 ~~~python 
 ~~~ 

 **saturn**: 

 ~~~python 
 ~~~ 

 -----