Story #7832
Updated by ttereshc almost 4 years ago
## Background In Pulp2 there was a feature called ["Alternate Content Sources"](https://docs.pulpproject.org/en/2.21/user-guide/content-sources.html). In Pulp2 this was supported for pulp_rpm only. Here's how it worked: 1. User configured an alternate content source, which lists many "paths" each representing a repo available from the remote source. For example, the Alternate Content Source (ACS) could be a local copy of the CDN within an AWS region. Or another would be a locally mounted copy of a portion of the CDN. 2. Then a user refreshes the alternate content source. This indexes the binary data that is available in the remote source. This avoids every sync operation to have the parse all paths on the alternate content source at runtime. 3. When sync occurs, it considers the data available from the alternate content source and if an exact content unit is available via the ACS the download occurs from the ACS first. Here's an example of a Pulp2 local ACS config: ``` [pulp-content-source] enabled: 1 priority: 0 expires: 3d name: Pulp Content Source type: yum base_url: http://192.168.1.11/pub/content/ paths: beta/rhel/server/7/x86_64/satellite/6/os/ eus/rhel/server/7/7.3/x86_64/os/ dist/rhel/client/6/6.2/x86_64/kickstart/ dist/rhel/client/6/6.2/x86_64/kickstart/Client/ dist/rhel/client/6/6.2/x86_64/os/ dist/rhel/client/6/6.8/i386/kickstart/ dist/rhel/client/6/6.8/i386/kickstart/Client/ ``` ### A few Pulp2 details ACS in Pulp2 had... * a `priority` but in practice this was not meaningfully used * certificates used when fetching content (optional). Fetching content in an AWS region required a cert. * a `headers` to specify headers attached to the requests going to the ACS ## Use Cases ### CDN connection is low-bandwidth and/or high-latency As a user, I have a low-bandwidth and/or high-latency connection to the authoritative source of content, e.g. CDN. I also have a local (either local disk or local network), but it's not authoritative, it could be old. There should be a way to fetch the metadata from the authoritative source, and the content from the "near" source whenever it's identical. ### Quickly setting up a Pulp server As a user setting up a new Pulp server, and I already have a local disk or local network of content that should go into that Pulp server, I should be able to use it. The CDN or remote source is still the authoritative one, but if the binary data is the same, I shouldn't need to bring it in over the WAN. ### Putting a Pulp server in the cloud As a user, deploying a Pulp server to the cloud, e.g. Amazon AWS, the CDN should still be authoritative, but there is usually a "nearly up to date" copy of that content also available in the AWS region which is very fast. This is usually faster, but also cheaper since in-region network access does not cost like WAN access does. I want to use ACS to allow the CDN to be authoritative, but use the regional copy for binary data whenever possible. ### Connection to CDN is fast, but it's not the authoritative source, that is slow As a user, I could have fast access to the CDN, but it may not be the authoritative source for my Pulp server. Particularly in cases where I have multiple Pulp servers and this one (edge) is syncing from another Pulp server (central). In that case, the link between the authoritative, central Pulp server and this edge Pulp server is slow, but the connection between this edge Pulp server and the CDN is fast. I want to use ACS to allow the authoritative Pulp server to be authoritative for content, but receive the binary data from the CDN whenever possible. ## Pulp3 Alternate Content Source Feature Plan ### Create an Alternate Content Source: 1. First creating a remote representing the remote source, e.g. a `RpmRemote`, or a pulp_ansible `CollectionRemote`. 2. Then use that remote in an alternate content source by doing: `POST /pulp/api/v3/acs/rpm/rpm/ remote=/pulp/api/v3/remotes/.../.../`. which would could yield a `/pulp/api/v3/acs/rpm/rpm/:uuid/`. ### Refresh an Alternate Content Source Then perform a "refresh" of the alternate content source by calling `POST /pulp/api/v3/acs/rpm/rpm/:uuid/refresh/`. The action endpoint `refresh` is used here because it's not actually syncing down content. It's like an on-demand sync in the sense that when called it indexes the remote metadata and creates remote artifacts. ### Use an AlternateContentSource At feature launch, each ACS is assumed to be global so, e.g. every RPM sync will check with the RPM typed ACS known content during sync and prefer it over the content from the authoritative source for binary data. ## Implementation An ideal implementation would have the "prefer and use" the alternate content source data transparently in the downloader itself. Since an AlternateContentSource can be from either Http or File sources, it likely should be implemented in BaseDownloader itself. So BaseDownloader, for each download it attempts should check with the database to determine if an AlternateContentSource for that type exists, and if so, find its it's RemoteArtifact and use that when downloading Artifact data.