Story #7832: [EPIC] As a user, I have Alternate Content Sources - Pulp

Actions

Send by e-mail Copy link

Story #7832

closed

[EPIC] As a user, I have Alternate Content Sources

Added by bmbouter about 4 years ago. Updated about 3 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

Category:

Sprint/Milestone:

Start date:

Due date:

% Done:

100%

Estimated time:

(Total: 0:00 h)

Platform Release:

Groomed:

Sprint Candidate:

Tags:

Sprint:

Sprint 109

Quarter:

Description

Background¶

In Pulp2 there was a feature called "Alternate Content Sources". In Pulp2 this was supported for pulp_rpm only. Here's how it worked:

User configured an alternate content source, which lists many "paths" each representing a repo available from the remote source. For example, the Alternate Content Source (ACS) could be a local copy of the CDN within an AWS region. Or another would be a locally mounted copy of a portion of the CDN.
Then a user refreshes the alternate content source. This indexes the binary data that is available in the remote source. This avoids every sync operation to have the parse all paths on the alternate content source at runtime.
When sync occurs, it considers the data available from the alternate content source and if an exact content unit is available via the ACS the download occurs from the ACS first.

Here's an example of a Pulp2 local ACS config:

[pulp-content-source]
enabled: 1
priority: 0
expires: 3d
name: Pulp Content Source
type: yum
base_url: http://192.168.1.11/pub/content/
paths:   beta/rhel/server/7/x86_64/satellite/6/os/
  eus/rhel/server/7/7.3/x86_64/os/
  dist/rhel/client/6/6.2/x86_64/kickstart/
  dist/rhel/client/6/6.2/x86_64/kickstart/Client/
  dist/rhel/client/6/6.2/x86_64/os/
  dist/rhel/client/6/6.8/i386/kickstart/
  dist/rhel/client/6/6.8/i386/kickstart/Client/

A few Pulp2 details¶

ACS in Pulp2 had...

a priority but in practice this was not meaningfully used
certificates used when fetching content (optional). Fetching content in an AWS region required a cert.
a headers to specify headers attached to the requests going to the ACS

Use Cases¶

CDN connection is low-bandwidth and/or high-latency¶

As a user, I have a low-bandwidth and/or high-latency connection to the authoritative source of content, e.g. CDN. I also have a local (either local disk or local network), but it's not authoritative, it could be old. There should be a way to fetch the metadata from the authoritative source, and the content from the "near" source whenever it's identical.

Quickly setting up a Pulp server¶

As a user setting up a new Pulp server, and I already have a local disk or local network of content that should go into that Pulp server, I should be able to use it. The CDN or remote source is still the authoritative one, but if the binary data is the same, I shouldn't need to bring it in over the WAN.

Putting a Pulp server in the cloud¶

As a user, deploying a Pulp server to the cloud, e.g. Amazon AWS, the CDN should still be authoritative, but there is usually a "nearly up to date" copy of that content also available in the AWS region which is very fast. This is usually faster, but also cheaper since in-region network access does not cost like WAN access does. I want to use ACS to allow the CDN to be authoritative, but use the regional copy for binary data whenever possible.

Connection to CDN is fast, but it's not the authoritative source, that is slow¶

As a user, I could have fast access to the CDN, but it may not be the authoritative source for my Pulp server. Particularly in cases where I have multiple Pulp servers and this one (edge) is syncing from another Pulp server (central). In that case, the link between the authoritative, central Pulp server and this edge Pulp server is slow, but the connection between this edge Pulp server and the CDN is fast. I want to use ACS to allow the authoritative Pulp server to be authoritative for content, but receive the binary data from the CDN whenever possible.

Pulp3 Alternate Content Source Plan¶

Plugin writers will have a new pulpcore.plugin.models.AlternateContentSource MasterModel which will define the following fields:

name - a string, the name. A required field.
enabled - an optional boolean, defaults to True
paths - A list of string paths. Each must validate as a string path. It must not include a slash at the beginning of each. An optional field. If unspecified, only the base_path will be used when the AlternateContentSource is refreshed.
a ForeignKey to a remote, this is required as the remote defineds how the ACS can sync.

Plugin writers will subclass this.

Pulp3 Alternate Content Source Usage¶

Create an Alternate Content Source:¶

First creating a remote representing the remote source, e.g. a RpmRemote, or a pulp_ansible CollectionRemote.
Then use that remote in an alternate content source by doing: POST /pulp/api/v3/acs/rpm/rpm/ remote=/pulp/api/v3/remotes/.../.../. which would could yield a /pulp/api/v3/acs/rpm/rpm/:uuid/.

Refresh an Alternate Content Source¶

Then perform a "refresh" of the alternate content source by calling POST /pulp/api/v3/acs/rpm/rpm/:uuid/refresh/. The action endpoint refresh is used here because it's not actually syncing down content. It's like an on-demand sync in the sense that when called it indexes the remote metadata and creates remote artifacts.

Use an AlternateContentSource¶

At feature launch, each ACS is assumed to be global so, e.g. every RPM sync will check with the RPM typed ACS known content during sync and prefer it over the content from the authoritative source for binary data.

Implementation¶

An ideal implementation would have the "prefer and use" the alternate content source data transparently in the downloader itself. Since an AlternateContentSource can be from either Http or File sources, it likely should be implemented in BaseDownloader itself.

So BaseDownloader, for each download it attempts should check with the database to determine if an AlternateContentSource for that type exists, and if so, find its RemoteArtifact and use that when downloading Artifact data.

Actions

Project

Profile

Help

Pulp

Agile boards

Custom queries

Story #7832

[EPIC] As a user, I have Alternate Content Sources

Background¶

A few Pulp2 details¶

Use Cases¶

CDN connection is low-bandwidth and/or high-latency¶

Quickly setting up a Pulp server¶

Putting a Pulp server in the cloud¶

Connection to CDN is fast, but it's not the authoritative source, that is slow¶

Pulp3 Alternate Content Source Plan¶

Pulp3 Alternate Content Source Usage¶

Create an Alternate Content Source:¶

Refresh an Alternate Content Source¶

Use an AlternateContentSource¶

Implementation¶

Updated by bmbouter about 4 years ago

Updated by ttereshc about 4 years ago

Updated by ttereshc about 4 years ago

Updated by bmbouter about 4 years ago

Updated by ipanova@redhat.com about 4 years ago

Updated by bmbouter almost 4 years ago

Updated by bmbouter almost 4 years ago

Updated by bmbouter almost 4 years ago

Updated by bmbouter almost 4 years ago

Updated by bmbouter almost 4 years ago

Updated by jsherril@redhat.com over 3 years ago

Updated by bmbouter over 3 years ago

Updated by dalley over 3 years ago

Updated by ipanova@redhat.com over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan about 3 years ago

Updated by ppicka about 3 years ago