Task #1195
closedStory #1150: As a user, I can lazily fetch repositories
Develop the pulp-streamer
100%
Description
For lazy-sync to work Squid needs to have another software component act as a client presenting the correct certificate to the feed URL. That streaming component is called the "pulp-streamer" for now. This component is also responsible for dispatching a Celery task defined in task #1181 which causes Pulp to download and save a copy of the unit that the streamer just fetched. This dispatch will need to use apply_async_with_reservation and lock on the "unit id and path" which together guarantee uniqueness in the catalog. See task #1181 for more details on how/why.
Configuration¶
The configuration will come from the server.conf file. This is appropriate because the streamer will need many things from server.conf (ie: database).
- have the streamer use an entry named 'streamer_port' in the [lazy] section of server.conf to use its port. This needs to have a default too. I'll suggest 8751 which is unused according to an IANA page I looked at.
- have the streamer use an entry named 'streamer_interface' in the [lazy] section of server.conf. This field should default to 'lo' which will cause it to listen on localhost only by default. The field accepts a comma separated list which can be used to limit which interfaces it should listen on.
- The streamer will use a header to tell squid how long to cache content it is delivering to squid. This should be configurable using the streamer_cache_timeout setting in the [lazy] section. This is expressed in seconds and will default to 86400 (the number of seconds in 1 day).
Requirements¶
- Use the "unit catalog" to determine which hostname and URL the incoming request coming in should be translated to
- Make a request to the URL determined by the "unit catalog" to present the correct SSL client certificate corresponding with that request
- Have the streamer verify the identity of the server side of the SSL connection consistent with Pulp's existing functionality
- Pass through the headers from the server as-is. This will require #1179 to be fixed first
- Overwrite and set the "Cache Control" header to the streamer_cache_timeout setting specified by server.conf. It also should append "public". The "public" part is not configurable.
- Headers must be delivered to the client before any data.
- Stream the data to the client as the streamer receives it. This is not a store-and-forward software it should stream in chunks.
- Concurrently handle multiple downloads at a time efficiently.
- Dispatch a Celery task (from task #1181) that causes Pulp to download and save a copy of the unit.
Alternate Content sources¶
The streamer needs to be integrated to allow alternate content sources. In this usage when Pulp has an alternate content source configured a lazy repo can receive content from the alternate content source by the streamer reading the bits from disk instead of the upstream --feed location.
A proof of concept stream based was developed (see attachment), use that as a starting point. This story does not do any rpm packaging, init script work, or systemd unit work; that is all part of another story.
Files
Related issues
ref #1195 - Implement the pulp-streamer.
This commit adds the 'lazy' module inside of pulp.server. It also adds streamer settings to Pulp's server.conf.