Task #1195
Updated by jcline@redhat.com about 9 years ago
For lazy-sync to work Squid needs to have another software component act as a client presenting the correct certificate to the feed URL. That streaming component is called the "pulp-streamer" for now. This component is also responsible for dispatching a Celery task defined in task #1181 which causes Pulp to download and save a copy of the unit that the streamer just fetched. This dispatch will need to use apply_async_with_reservation and lock on the "unit id and path" which together guarantee uniqueness in the catalog. See task #1181 for more details on how/why. h2. Configuration The configuration will come from the server.conf file. This is appropriate because the streamer will need many things from server.conf (ie: database). * have the streamer use an entry named 'streamer_port' in the [lazy] section of server.conf to use its port. This needs to have a default too. I'll suggest 8751 which is unused according to an IANA page I looked at. * have the streamer use an entry named 'streamer_interface' in the [lazy] section of server.conf. This field should default to 'lo' which will cause it to listen on localhost only by default. The field accepts a comma separated list which can be used to limit which interfaces it should listen on. * The streamer will use a header to tell squid how long to cache content it is delivering to squid. This should be configurable using the streamer_cache_timeout setting in the [lazy] section. This is expressed in seconds and will default to 86400 (the number of seconds in 1 day). h2. Requirements * Use the "unit catalog" to determine which hostname and URL the incoming request coming in should be translated to * Make a request to the URL determined by the "unit catalog" to present the correct SSL client certificate corresponding with that request * Have the streamer verify the identity of the server side of the SSL connection consistent with Pulp's existing functionality * Pass through the headers from the server as-is. This will require #1179 to be fixed first * Overwrite and set the "Cache Control" header to the streamer_cache_timeout setting specified by server.conf. It also should append "public". The "public" part is not configurable. * Headers must be delivered to the client before any data. * Stream the data to the client as the streamer receives it. This is not a store-and-forward software it should stream in chunks. * Concurrently handle multiple downloads at a time efficiently. * Dispatch a Celery task (from task #1181) that causes Pulp to download and save a copy of the unit. h2. Alternate Content sources The streamer needs to be integrated to allow alternate content sources. In this usage when Pulp has an alternate content source configured a lazy repo can receive content from the alternate content source by the streamer reading the bits from disk instead of the upstream --feed location. A proof of concept stream based was developed (see attachment), use that as a starting point. This story does not do any rpm packaging, init script work, or systemd unit work; that is all part of another story.