Sprint/Milestone 18 - History - Downloading

Downloading » History » Sprint/Milestone 18

jortel@redhat.com, 08/30/2017 05:57 PM

-jortel@redhat.com
+# Downloading
-jortel@redhat.com
+In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction.
 jortel@redhat.com
-jortel@redhat.com
+The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion.
 jortel@redhat.com
-jortel@redhat.com
+## Use Cases
-jortel@redhat.com
+### Importer
 jortel@redhat.com
-jortel@redhat.com
+As an importer, I need to download single files.
-jortel@redhat.com
+**jupiter**:
 jortel@redhat.com
-jortel@redhat.com
+~~~python
-jortel@redhat.com
+download = HttpDownload(
     url=url,
     writer=FileWriter(path),
     timeout=Timeout(connect=10, read=15),
     user=User(name='elmer', password='...'),
     ssl=SSL(ca_certificate='path-to-certificate',
             client_certificate='path-to-certificate',
             client_key='path-to-key',
             validation=True),
     proxy_url='http://user:password@gateway.org')
 jortel@redhat.com
 try:
     download()
 except DownloadError:
     # An error occurred.
 else:
    # Go read the downloaded file \o/
 ~~~
-jortel@redhat.com
+**saturn**:
 jortel@redhat.com
-jortel@redhat.com
+~~~python
-jortel@redhat.com
+ssl_context = aiohttpSSLContext()
 ssl_context.load_cert_chain('path-to-CA_certificate')
 ssl_context.load_cert_chain('path-to-CLIENT_certificate')
 ssl_context.load_cert_chain('path-to-CLIENT_key')
 connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context)
 session = aiohttp.ClientSession(
     connector=connector,
     read_timeout=15,
     auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8'))
 downloader_obj = HttpDownloader(
     session,
     url,
     proxy='http://gateway.org',
     proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')
-jortel@redhat.com
+downloader_coroutine = downloader_obj.run()
 loop = asyncio._get_running_loop()
 done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
 for task in done:
     try:
-jortel@redhat.com
+        result = task.result()  # This is a DownloadResult
     except aiohttp.ClientError:
         # An error occurred.
-jortel@redhat.com
+~~~
-jortel@redhat.com
+question: How can the connect timeout be set in aiohttp?
-jortel@redhat.com
+-----
-jortel@redhat.com
+As an importer, I can leverage all settings supported by underlying protocol specific client lib.
 **jupiter**:
-jortel@redhat.com
+Commonly used settings supported by abstraction. Additional settings could be supported by subclassing.
 jortel@redhat.com
-jortel@redhat.com
+~~~python
-jortel@redhat.com
+class SpecialDownload(HttpDownload):
     def _settings(self):
         settings = super()._settings()
         settings['special'] = <special value>
         return settings
 ~~~
 **saturn**:
-jortel@redhat.com
+The underlying client lib arguments directly exposed.
 jortel@redhat.com
 -----
 jortel@redhat.com
-jortel@redhat.com
+As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download.
-jortel@redhat.com
+**jupiter**:
-jortel@redhat.com
+Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests.
-jortel@redhat.com
+~~~python
-jortel@redhat.com
+download = HttpDownload(..)
-jortel@redhat.com
+monitor = DownloadMonitor(download)
-jortel@redhat.com
+...  # perform download.
-jortel@redhat.com
+artifact = Artifact(**monitor.facts())
-jortel@redhat.com
+artifact.save()
 ~~~
 jortel@redhat.com
 **saturn**:
 jortel@redhat.com
 The *size* and all *digests* always calculated.
-jortel@redhat.com
+~~~python
-jortel@redhat.com
+downloader_obj = HttpDownloader(...)
 ...  # perform download.
 result = task.result(**result.artifact_attributes)
 artifact = Artifact()
 artifact.save()
 ~~~
-jortel@redhat.com
+-----
-jortel@redhat.com
+As an importer, I need to download files concurrently.
-jortel@redhat.com
+**jupiter**:
 Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once.
-jortel@redhat.com
+~~~python
-jortel@redhat.com
+downloads = (HttpDownload(...) for _ in range(10))
 with Batch(downloads, backlog=3) as batch:
     for plan in batch():
         try:
             plan.result()
         except DownloadError:
             # An error occurred.
         else:
-jortel@redhat.com
+            # Use the downloaded file \o/
 ~~~
 jortel@redhat.com
 **saturn**:
 Using the asyncio run loop. This example does not restrict the number of downloads in memory at once.
 jortel@redhat.com
-jortel@redhat.com
+~~~python
-jortel@redhat.com
+downloaders = (HttpDownloader...) for _ in range(10))
 jortel@redhat.com
 loop = asyncio._get_running_loop()
-jortel@redhat.com
+done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloaders]))
-jortel@redhat.com
+for task in done:
     try:
         result = task.result()  # This is a DownloadResult
     except aiohttp.ClientError:
         # An error occurred.
 ~~~
-jortel@redhat.com
+-----
-jortel@redhat.com
+As an importer, I want to validate downloaded files.
-jortel@redhat.com
+**jupiter**:
-jortel@redhat.com
+Supported by adding provided or custom validations to the download. A validation error raises *ValidationError* which IsA *DownloadError*.
-jortel@redhat.com
+~~~python
 download = HttpDownload(...)
 download.append(DigestValidation('sha256', '0x1234'))
 try:
     download()
 except DownloadError:
     # An error occurred.
 ~~~
 **saturn**:
-jortel@redhat.com
+Supported by passing the *expected_digests* dictionary and catching *DigestValidationError*.
 jortel@redhat.com
 ~~~python
 downloader_obj = HttpDownloader(..., expected_digests={'sha256': '0x1234'})
 downloader_coroutine = downloader_obj.run()
 loop = asyncio._get_running_loop()
 done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
 for task in done:
     try:
         result = task.result()  # This is a DownloadResult
     except (aiohttp.ClientError, DigestValidationError):
         # An error occurred.
 ~~~
 -----
-jortel@redhat.com
+As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As an importer, I can perform concurrent downloading using a synchronous pattern.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As an importer I can customize how downloading is performed. For example, to support mirror lists
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As an importer, I can terminate concurrent downlading at any point and not leak resources.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
-jortel@redhat.com
+As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP.
-jortel@redhat.com
+**jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
-jortel@redhat.com
+### Streamer
-jortel@redhat.com
+As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box.
 jortel@redhat.com
-jortel@redhat.com
+**jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
-jortel@redhat.com
+-----
-jortel@redhat.com
+As the streamer, I can download using any protocol supported by the importer.
 **jupiter**:
 ~~~python
-jortel@redhat.com
+~~~
 jortel@redhat.com
 **saturn**:
 ~~~python
-jortel@redhat.com
+~~~
-jortel@redhat.com
+-----
 As the streamer, I want to validate downloaded files.
-jortel@redhat.com
+**jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 jortel@redhat.com
 -----
 As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As the streamer, I need to forward HTTP headers from the download response to the twisted response.
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----
 As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists
 **jupiter**:
 ~~~python
 ~~~
 **saturn**:
 ~~~python
 ~~~
 -----

Project

Profile

Help

Pulp

Downloading » History » Sprint/Milestone 18