Story #3693: Lazy for Pulp3
As a user, I can use Pulp as a pass-through cache
Lazy sync requires the plugin to create *Content, ContentArtifacts, and RemoteArtifact objects for every piece of content discovered at the remote. This kind of indexing is resource utilization prohibitive for repositories with millions of artifacts.
Data model change¶
Make content indexing optional by enabling users to configure a Distribution with a Remote. This would be a new relation field added to "Distribution" and called "'remote'".
Integration with the Content serving app and Streamer¶
The content app would then do the following to find an artifact to serve to the user:
1. Match path to a distribution.
2. Try to find an artifact by looking at the Publication associated with the Distribution.
3. If an Artifact is found, send the content of it as a response.
4. If an Artifact is not found and a RemoteArtifact is found, fetch the artifact using the 'remote' and stream the response back to the user.
5. If remote's policy is 'on_demand' or 'immediate', create an Artifact.
6. If neither an Artifact nor a RemoteArtifact is found, check if there is a 'remote' associated with the Distribution.
7. If a 'remote' is not set for the Distribution, return a 404.
8. If a 'remote' is associated, fetch the artifact using the 'remote' and stream the response back to the user.
9. Create a RemoteArtifact.
10. If remote's policy is 'on_demand' or 'immediate', create an Artifact.
Example use case with Maven¶
Maven central hosts millions of Maven Artifacts. It's hosted at https://repo1.maven.org/maven2/. The user will take the following steps to create pass-through cache of Maven Central in Pulp.
1. Create a Maven Remote that points to "https://repo1.maven.org/maven2/".
2. Create a Distribution that has ''remote'' set to the remote created in step 1 and relative path set to 'foo'.
When mvn is building a project and requests an artifact from Pulp at "http://hostname/pulp/content/foo/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar", the Content app is going to ask the streamer to fetch the artifact from "https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar" and then create an Artifact and a RemoteArtifact for this artifact. The next time the Content app should be able to locate the Artifact via the remote artifact's URL.
Updated by firstname.lastname@example.org over 4 years ago
An optional association between the Distribution and a Remote seems reasonable. I think the fallback prefix is unnecessary and a little confusing. Let's just name it: Distribution.remote and document as:
"The distribution may be optionally associated with a Remote to support Passthru Caching."
The Passthru Caching concept/feature can be explained in detail elsewhere and linked.
Looking at the description, I think the design can be described more concisely by describing only the changes instead of all of the logic steps. (I don't think the steps are exactly correct anyway).
@dkliban, what do you think of:
Add Distribution.remote as (blank=True, null=True, db_index=True, on_delete=SET_NULL)
The content application will redirect to the streamer when: the publication cannot be matched; neither a published-artifact or published-metadata can be matched; redirect is enabled; Distribution.remote (is set).
The streamer will get the downloader from the remote directly when: the publication cannot be matched; neither a published-artifact or published-metadata can be matched; Distribution.remote (is set). Else, 404.
Added by email@example.com almost 4 years ago
Problem: pulpcore does not have pull-through cache feature
Solution: extend the content app to support pull-through cache
This patch enables the content app handler to look for a remote on a distribution. When a remote is associated with a distribution, that remote is used to download content missing from Pulp.