Story #7832: [EPIC] As a user, I have Alternate Content Sources

Story #7832

## Background 

 In Pulp2 there was a feature called ["Alternate Content Sources"](https://docs.pulpproject.org/en/2.21/user-guide/content-sources.html). In Pulp2 this was supported for pulp_rpm only. Here's how it worked: 

 1. User configured an alternate content source, which lists many "paths" each representing a repo available from the remote source. For example, the Alternate Content Source (ACS) could be a local copy of the CDN within an AWS region. Or another would be a locally mounted copy of a portion of the CDN. 

 2. Then a user refreshes the alternate content source. This indexes the binary data that is available in the remote source. This avoids every sync operation to have the parse all paths on the alternate content source at runtime. 

 3. When sync occurs, it considers the data available from the alternate content source and if an exact content unit is available via the ACS the download occurs from the ACS first. 

 Here's an example of a Pulp2 local ACS config: 

 ``` 
 [pulp-content-source] 
 enabled: 1 
 priority: 0 
 expires: 3d 
 name: Pulp Content Source 
 type: yum 
 base_url: http://192.168.1.11/pub/content/ 
 paths:     beta/rhel/server/7/x86_64/satellite/6/os/ 
   eus/rhel/server/7/7.3/x86_64/os/ 
   dist/rhel/client/6/6.2/x86_64/kickstart/ 
   dist/rhel/client/6/6.2/x86_64/kickstart/Client/ 
   dist/rhel/client/6/6.2/x86_64/os/ 
   dist/rhel/client/6/6.8/i386/kickstart/ 
   dist/rhel/client/6/6.8/i386/kickstart/Client/ 
 ``` 

 ### A few Pulp2 details 

 ACS in Pulp2 had... 

 * a `priority` but in practice this was not meaningfully used 
 * certificates used when fetching content (optional). Fetching content in an AWS region required a cert. 
 * a `headers` to specify headers attached to the requests going to the ACS 

 ## Use Cases 

 ### CDN connection is low-bandwidth and/or high-latency 

 As a user, I have a low-bandwidth and/or high-latency connection to the authoritative source of content, e.g. CDN. I also have a local (either local disk or local network), but it's not authoritative, it could be old. There should be a way to fetch the metadata from the authoritative source, and the content from the "near" source whenever it's identical. 

 ### Quickly setting up a Pulp server 

 As a user setting up a new Pulp server, and I already have a local disk or local network of content that should go into that Pulp server, I should be able to use it. The CDN or remote source is still the authoritative one, but if the binary data is the same, I shouldn't need to bring it in over the WAN. 

 ### Putting a Pulp server in the cloud 

 As a user, deploying a Pulp server to the cloud, e.g. Amazon AWS, the CDN should still be authoritative, but there is usually a "nearly up to date" copy of that content also available in the AWS region which is very fast. This is usually faster, but also cheaper since in-region network access does not cost like WAN access does. I want to use ACS to allow the CDN to be authoritative, but use the regional copy for binary data whenever possible. 


 ### Connection to CDN is fast, but it's not the authoritative source, that is slow 

 As a user, I could have fast access to the CDN, but it may not be the authoritative source for my Pulp server. Particularly in cases where I have multiple Pulp servers and this one (edge) is syncing from another Pulp server (central). In that case, the link between the authoritative, central Pulp server and this edge Pulp server is slow, but the connection between this edge Pulp server and the CDN is fast. I want to use ACS to allow the authoritative Pulp server to be authoritative for content, but receive the binary data from the CDN whenever possible. 

 ## Pulp3 Alternate Content Source Plan 

 Plugin writers will have a new `pulpcore.plugin.models.AlternateContentSource` MasterModel which will define the following fields: 

 * name - a string, the name. A required field. 
 * enabled - an optional boolean, defaults to True 
 * base path - a url, the base path including the trailing slash. It must include the trailing slash. A required field 
 * paths - A list of string paths. Each must validate as a string path. It must not include a slash at the beginning of each. An optional field. If unspecified, only the base_path will be used when the AlternateContentSource is refreshed. 
 * a ForeignKey to a remote, this is required as the remote defineds how the ACS can sync. 

 Plugin writers will subclass this. 
  
 ## Pulp3 Alternate Content Source Usage 

 ### Create an Alternate Content Source: 

 1. First creating a remote representing the remote source, e.g. a `RpmRemote`, or a pulp_ansible `CollectionRemote`. 
 2. Then use that remote in an alternate content source by doing: `POST /pulp/api/v3/acs/rpm/rpm/ remote=/pulp/api/v3/remotes/.../.../`. which would could yield a `/pulp/api/v3/acs/rpm/rpm/:uuid/`. 

 ### Refresh an Alternate Content Source 

 Then perform a "refresh" of the alternate content source by calling `POST /pulp/api/v3/acs/rpm/rpm/:uuid/refresh/`. The action endpoint `refresh` is used here because it's not actually syncing down content. It's like an on-demand sync in the sense that when called it indexes the remote metadata and creates remote artifacts. 

 ### Use an AlternateContentSource 

 At feature launch, each ACS is assumed to be global so, e.g. every RPM sync will check with the RPM typed ACS known content during sync and prefer it over the content from the authoritative source for binary data. 

 ## Implementation 

 An ideal implementation would have the "prefer and use" the alternate content source data transparently in the downloader itself. Since an AlternateContentSource can be from either Http or File sources, it likely should be implemented in BaseDownloader itself. 

 So BaseDownloader, for each download it attempts should check with the database to determine if an AlternateContentSource for that type exists, and if so, find its RemoteArtifact and use that when downloading Artifact data.
Back
Project

Profile

Help

Pulp

Story #7832