Project

Profile

Help

Task #1195

closed

Story #1150: As a user, I can lazily fetch repositories

Develop the pulp-streamer

Added by bmbouter over 9 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
2.8.0
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

For lazy-sync to work Squid needs to have another software component act as a client presenting the correct certificate to the feed URL. That streaming component is called the "pulp-streamer" for now. This component is also responsible for dispatching a Celery task defined in task #1181 which causes Pulp to download and save a copy of the unit that the streamer just fetched. This dispatch will need to use apply_async_with_reservation and lock on the "unit id and path" which together guarantee uniqueness in the catalog. See task #1181 for more details on how/why.

Configuration

The configuration will come from the server.conf file. This is appropriate because the streamer will need many things from server.conf (ie: database).

  • have the streamer use an entry named 'streamer_port' in the [lazy] section of server.conf to use its port. This needs to have a default too. I'll suggest 8751 which is unused according to an IANA page I looked at.
  • have the streamer use an entry named 'streamer_interface' in the [lazy] section of server.conf. This field should default to 'lo' which will cause it to listen on localhost only by default. The field accepts a comma separated list which can be used to limit which interfaces it should listen on.
  • The streamer will use a header to tell squid how long to cache content it is delivering to squid. This should be configurable using the streamer_cache_timeout setting in the [lazy] section. This is expressed in seconds and will default to 86400 (the number of seconds in 1 day).

Requirements

  • Use the "unit catalog" to determine which hostname and URL the incoming request coming in should be translated to
  • Make a request to the URL determined by the "unit catalog" to present the correct SSL client certificate corresponding with that request
  • Have the streamer verify the identity of the server side of the SSL connection consistent with Pulp's existing functionality
  • Pass through the headers from the server as-is. This will require #1179 to be fixed first
  • Overwrite and set the "Cache Control" header to the streamer_cache_timeout setting specified by server.conf. It also should append "public". The "public" part is not configurable.
  • Headers must be delivered to the client before any data.
  • Stream the data to the client as the streamer receives it. This is not a store-and-forward software it should stream in chunks.
  • Concurrently handle multiple downloads at a time efficiently.
  • Dispatch a Celery task (from task #1181) that causes Pulp to download and save a copy of the unit.

Alternate Content sources

The streamer needs to be integrated to allow alternate content sources. In this usage when Pulp has an alternate content source configured a lazy repo can receive content from the alternate content source by the streamer reading the bits from disk instead of the upstream --feed location.

A proof of concept stream based was developed (see attachment), use that as a starting point. This story does not do any rpm packaging, init script work, or systemd unit work; that is all part of another story.


Files

server.py (2.74 KB) server.py bmbouter, 09/17/2015 05:16 PM

Related issues

Blocked by Nectar - Story #1179: As a developer I can receive headers while using download_one()CLOSED - CURRENTRELEASEipanova@redhat.com

Actions

Also available in: Atom PDF