Project

Profile

Help

Downloading » History » Sprint/Milestone 11

jortel@redhat.com, 08/29/2017 10:38 PM

1 1 jortel@redhat.com
# Downloading
2
3 4 jortel@redhat.com
In pulp3, there are two competing technologies and designs being considered. For the purposes of the discussion we'll name them **Jupiter** and **Saturn**. The *Jupiter* solution is based on *concurrent.futures* and the Saturn solution is based on *asyncio*. In addition to the underlying technology difference, the solutions meet the requirements in different ways. The *Jupiter* solution includes more classes, provides more abstraction and supports extension through object composition. The *Saturn* solution meets the requirements with the fewest classes possible and minimum abstraction.
4 3 jortel@redhat.com
5 5 jortel@redhat.com
The three actors for our use cases is the *Importer*, *Streamer* and Plugin Writer. The *ChangeSet* shares a subset of the Streamer requirements but not included in this discussion.
6 3 jortel@redhat.com
7 1 jortel@redhat.com
## Use Cases
8
9 2 jortel@redhat.com
### Importer
10 1 jortel@redhat.com
11 5 jortel@redhat.com
As an importer, I need to download single files.
12
13 9 jortel@redhat.com
**jupiter**:
14 5 jortel@redhat.com
15
~~~
16 6 jortel@redhat.com
download = HttpDownload(
17
    url=url,
18
    writer=FileWriter(path),
19
    timeout=Timeout(connect=10, read=15),
20
    user=User(name='elmer', password='...'),
21
    ssl=SSL(ca_certificate='path-to-certificate',
22
            client_certificate='path-to-certificate',
23
            client_key='path-to-key',
24
            validation=True),
25
    proxy_url='http://user:password@gateway.org')
26 5 jortel@redhat.com
27
try:
28
    download()
29
except DownloadError:
30
    # An error occurred.
31
else:
32
   # Go read the downloaded file \o/
33
~~~
34
35 9 jortel@redhat.com
**saturn**:
36 1 jortel@redhat.com
37
~~~
38 6 jortel@redhat.com
ssl_context = aiohttpSSLContext()
39
ssl_context.load_cert_chain('path-to-CA_certificate')
40
ssl_context.load_cert_chain('path-to-CLIENT_certificate')
41
ssl_context.load_cert_chain('path-to-CLIENT_key')
42
43
connector=aiohttp.TCPConnector(verify_ssl=True, ssl_context=ssl_context)
44
45
session = aiohttp.ClientSession(
46
    connector=connector,
47
    read_timeout=15,
48
    auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8'))
49
50
downloader_obj = HttpDownloader(
51
    session,
52
    url,
53
    proxy='http://gateway.org',
54
    proxy_auth=aiohttp.BasicAuth('elmer', password='...', encoding='utf-8')
55
56 5 jortel@redhat.com
downloader_coroutine = downloader_obj.run()
57
loop = asyncio._get_running_loop()
58
done, not_done = loop.run_until_complete(asyncio.wait([downloader_coroutine]))
59
for task in done:
60
    try:
61 1 jortel@redhat.com
        result = task.result()  # This is a DownloadResult
62
    except aiohttp.ClientError:
63
        # An error occurred.
64 5 jortel@redhat.com
~~~
65
66 6 jortel@redhat.com
question: How can the connect timeout be set in aiohttp?
67
68 1 jortel@redhat.com
-----
69
70 9 jortel@redhat.com
As an importer, I can leverage all settings supported by underlying protocol specific client lib.
71
72
**jupiter**:
73
74
Commonly used settings supported by abstraction. Additional settings could be supported by subclassing.
75
76
~~~
77
class SpecialDownload(HttpDownload):
78
79
    def _settings(self):
80
        settings = super()._settings()
81
        settings['special'] = <special value>
82
        return settings
83
~~~
84
85
**saturn**:
86
87 10 jortel@redhat.com
The underlying client lib arguments directly exposed.
88 9 jortel@redhat.com
89
-----
90 1 jortel@redhat.com
91 10 jortel@redhat.com
As an importer, I can create an Artifact with a downloaded file using the size and digests calculated during the download.
92
93
**jupiter**:
94
95
Using the optional *DownloadMonitor* to collect statistics such as size and calculate digests.
96
97
~~~
98
download = HttpDownload(..)
99
monitor = DownloadMonitor(download)
100
...  # perform download.
101
artifact = Artifact(**monitor.dict())
102
artifact.save()
103
~~~
104
105
**saturn**:
106
107
The *size* and all *digests* always calculated.
108
109
~~~
110
downloader_obj = HttpDownloader(...)
111
...  # perform download.
112
result = task.result(**result.artifact_attributes)
113
artifact = Artifact()
114
artifact.save()
115
~~~
116
117 11 jortel@redhat.com
-----
118
119
As an importer, I need to download files concurrently.
120
121
**jupiter**:
122
123
Using the *Batch* to run the downloads concurrently. Only 3 downloads in memory at once.
124
125
~~~
126
downloads = (HttpDownload(...) for _ in range(10))
127
128
with Batch(downloads, backlog=3) as batch:
129
    for plan in batch():
130
        try:
131
            plan.result()
132
        except DownloadError:
133
            # An error occurred.
134
        else:
135
            # Use the downloaded file \o/
136
~~~
137
138
**saturn**:
139
140
Just uses the asyncio run loop. This example does not restrict the number of downloads in memory at once.
141
142
~~~
143
downloads = (HttpDownloader...) for _ in range(10))
144
145
loop = asyncio._get_running_loop()
146
done, not_done = loop.run_until_complete(asyncio.wait([d.run() for d in downloads]))
147
for task in done:
148
    try:
149
        result = task.result()  # This is a DownloadResult
150
    except aiohttp.ClientError:
151
        # An error occurred.
152
~~~
153
154
-----
155
156 1 jortel@redhat.com
As an importer, I want to validate downloaded files.  
157
As an importer, I am not required to keep all content (units) and artifacts in memory to support concurrent downloading.  
158
As an importer, I need a way to link a downloaded file to an artifact without keeping all content units and artifacts in memory.  
159
As an importer, I can perform concurrent downloading using a synchronous pattern.  
160
As an importer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads.  
161
As an importer I can customize how downloading is performed. For example, to support mirror lists  
162 7 jortel@redhat.com
As an importer, concurrent downloading must limit the number of simultaneous connections. Downloading 5k artifacts cannot open 5k connections.  
163 8 jortel@redhat.com
As an importer, I can terminate concurrent downlading at any point and not leak resources.  
164 9 jortel@redhat.com
As an importer, I can download using any protocol. Starting with HTTP/HTTPS and FTP.
165 1 jortel@redhat.com
166 2 jortel@redhat.com
### Streamer
167 1 jortel@redhat.com
168 4 jortel@redhat.com
As the streamer, I need to download files related to published artifacts and metadata but delegate *the implementation* (protocol, settings, credentials) to the importer. The implementation must be a black-box.  
169 1 jortel@redhat.com
As the streamer, I can download using any protocol supported by the importer.  
170
As the streamer, I want to validate downloaded files.  
171
As the streamer, concurrent downloads must share resources such as sessions,connection pools and auth tokens across individual downloads without having knowledge of such things.  
172
As the streamer, I need to support complex downloading such as mirror lists. This complexity must be delegated to the importer.  
173
As the streamer, I need to bridge the downloaded bit stream to the Twisted response. The file is not written to disk.  
174 5 jortel@redhat.com
As the streamer, I need to forward HTTP headers from the download response to the twisted response.  
175
As the streamer, I can download using (the same) custom logic as the importer such as supporting mirror lists