Project

Profile

Help

Story #7832

closed

[EPIC] As a user, I have Alternate Content Sources

Added by bmbouter over 3 years ago. Updated about 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Estimated time:
(Total: 0:00 h)
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 109
Quarter:

Description

Background

In Pulp2 there was a feature called "Alternate Content Sources". In Pulp2 this was supported for pulp_rpm only. Here's how it worked:

  1. User configured an alternate content source, which lists many "paths" each representing a repo available from the remote source. For example, the Alternate Content Source (ACS) could be a local copy of the CDN within an AWS region. Or another would be a locally mounted copy of a portion of the CDN.

  2. Then a user refreshes the alternate content source. This indexes the binary data that is available in the remote source. This avoids every sync operation to have the parse all paths on the alternate content source at runtime.

  3. When sync occurs, it considers the data available from the alternate content source and if an exact content unit is available via the ACS the download occurs from the ACS first.

Here's an example of a Pulp2 local ACS config:

[pulp-content-source]
enabled: 1
priority: 0
expires: 3d
name: Pulp Content Source
type: yum
base_url: http://192.168.1.11/pub/content/
paths:   beta/rhel/server/7/x86_64/satellite/6/os/
  eus/rhel/server/7/7.3/x86_64/os/
  dist/rhel/client/6/6.2/x86_64/kickstart/
  dist/rhel/client/6/6.2/x86_64/kickstart/Client/
  dist/rhel/client/6/6.2/x86_64/os/
  dist/rhel/client/6/6.8/i386/kickstart/
  dist/rhel/client/6/6.8/i386/kickstart/Client/

A few Pulp2 details

ACS in Pulp2 had...

  • a priority but in practice this was not meaningfully used
  • certificates used when fetching content (optional). Fetching content in an AWS region required a cert.
  • a headers to specify headers attached to the requests going to the ACS

Use Cases

CDN connection is low-bandwidth and/or high-latency

As a user, I have a low-bandwidth and/or high-latency connection to the authoritative source of content, e.g. CDN. I also have a local (either local disk or local network), but it's not authoritative, it could be old. There should be a way to fetch the metadata from the authoritative source, and the content from the "near" source whenever it's identical.

Quickly setting up a Pulp server

As a user setting up a new Pulp server, and I already have a local disk or local network of content that should go into that Pulp server, I should be able to use it. The CDN or remote source is still the authoritative one, but if the binary data is the same, I shouldn't need to bring it in over the WAN.

Putting a Pulp server in the cloud

As a user, deploying a Pulp server to the cloud, e.g. Amazon AWS, the CDN should still be authoritative, but there is usually a "nearly up to date" copy of that content also available in the AWS region which is very fast. This is usually faster, but also cheaper since in-region network access does not cost like WAN access does. I want to use ACS to allow the CDN to be authoritative, but use the regional copy for binary data whenever possible.

Connection to CDN is fast, but it's not the authoritative source, that is slow

As a user, I could have fast access to the CDN, but it may not be the authoritative source for my Pulp server. Particularly in cases where I have multiple Pulp servers and this one (edge) is syncing from another Pulp server (central). In that case, the link between the authoritative, central Pulp server and this edge Pulp server is slow, but the connection between this edge Pulp server and the CDN is fast. I want to use ACS to allow the authoritative Pulp server to be authoritative for content, but receive the binary data from the CDN whenever possible.

Pulp3 Alternate Content Source Plan

Plugin writers will have a new pulpcore.plugin.models.AlternateContentSource MasterModel which will define the following fields:

  • name - a string, the name. A required field.
  • enabled - an optional boolean, defaults to True
  • paths - A list of string paths. Each must validate as a string path. It must not include a slash at the beginning of each. An optional field. If unspecified, only the base_path will be used when the AlternateContentSource is refreshed.
  • a ForeignKey to a remote, this is required as the remote defineds how the ACS can sync.

Plugin writers will subclass this. 

Pulp3 Alternate Content Source Usage

Create an Alternate Content Source:

  1. First creating a remote representing the remote source, e.g. a RpmRemote, or a pulp_ansible CollectionRemote.
  2. Then use that remote in an alternate content source by doing: POST /pulp/api/v3/acs/rpm/rpm/ remote=/pulp/api/v3/remotes/.../.../. which would could yield a /pulp/api/v3/acs/rpm/rpm/:uuid/.

Refresh an Alternate Content Source

Then perform a "refresh" of the alternate content source by calling POST /pulp/api/v3/acs/rpm/rpm/:uuid/refresh/. The action endpoint refresh is used here because it's not actually syncing down content. It's like an on-demand sync in the sense that when called it indexes the remote metadata and creates remote artifacts.

Use an AlternateContentSource

At feature launch, each ACS is assumed to be global so, e.g. every RPM sync will check with the RPM typed ACS known content during sync and prefer it over the content from the authoritative source for binary data.

Implementation

An ideal implementation would have the "prefer and use" the alternate content source data transparently in the downloader itself. Since an AlternateContentSource can be from either Http or File sources, it likely should be implemented in BaseDownloader itself.

So BaseDownloader, for each download it attempts should check with the database to determine if an AlternateContentSource for that type exists, and if so, find its RemoteArtifact and use that when downloading Artifact data.


Sub-issues 20 (0 open20 closed)

Task #8606: Create models for ACSCLOSED - CURRENTRELEASE

Actions
Task #8607: Basic views for ACSCLOSED - CURRENTRELEASE

Actions
Task #8748: Implement ACS stage in downloadCLOSED - CURRENTRELEASEppicka

Actions
Task #8749: Add ACS on-demand downloading to content appCLOSED - CURRENTRELEASEdaviddavis

Actions
File Support - Story #8959: As user I can use Alternate Content Source featureCLOSED - CURRENTRELEASE

Actions
RPM Support - Task #9091: Test that as a user I can use RHUI as a remote and/or ACSCLOSED - COMPLETEppicka

Actions
Task #9251: Update ACS workflow docs to use the cliCLOSED - CURRENTRELEASEdaviddavis

Actions
Task #9340: Move path validation to the pluginsCLOSED - CURRENTRELEASEdaviddavis

Actions
File Support - Task #9341: Validate ACS pathsCLOSED - CURRENTRELEASE

Actions
Task #9356: Create a fixture repo that has only metadataCLOSED - COMPLETEdaviddavis

Actions
File Support - Task #9357: Write a functional test that syncs, publishes, and serves content from an ACSCLOSED - CURRENTRELEASEdaviddavis

Actions
RPM Support - Story #9358: As a user I can use Alternate Content SourcesMODIFIEDppicka

Actions
Task #9374: Update and delete calls for the ACS should be asyncCLOSED - CURRENTRELEASEipanova@redhat.com

Actions
File Support - Task #9373: Update docs to use the CLICLOSED - CURRENTRELEASEdaviddavis

Actions
File Support - Story #9377: As a user I can use Alternate Content SourcesCLOSED - CURRENTRELEASEppicka

Actions
Issue #9417: ACS content isn't cleaned up with orphan removalCLOSED - CURRENTRELEASEipanova@redhat.comActions
File Support - Issue #9420: catch 404 when non-exitent ACS id being refreshedCLOSED - CURRENTRELEASEipanova@redhat.comActions
RPM Support - Task #9422: Create a fixture repo that has only metadataCLOSED - COMPLETEppicka

Actions
RPM Support - Task #9435: Update ACS docs to use the CLIMODIFIEDdaviddavis

Actions
Task #9447: Write KCS for ACSCLOSED - DUPLICATEppicka

Actions

Also available in: Atom PDF